Project context

At The Embedded Kit, we develop a custom Linux distribution called Welma, built on Yocto, for industrial embedded systems. Welma supports a wide range of configurations and includes advanced cybersecurity features.

This context creates a dual validation need:

Ensuring the technical stability of the Yocto images generated by our distribution,
Verifying functional compliance with the specific business use cases of our customers.

We needed a test automation infrastructure that could integrate seamlessly into our CI/CD pipeline, control physical hardware targets remotely, detect regressions quickly, and provide fast, reliable feedback to developers.

The goal was clear: automate validation as much as possible while maintaining deep diagnostic capabilities in case of failure.

Why a test bench? Meeting developer needs without slowing them down

One of the key challenges we identified early on was addressing the specific needs of engineering teams. Developers must be able to test the entire system, meaning the software running on real hardware. This involves building the Yocto image, installing it on the target board, and running a full test suite.

In a multi-machine, multi-variant, multi-branch environment, this process quickly becomes time-consuming and error prone.

Depending on the configuration, a single Yocto build might take anywhere from a few minutes to several hours. Similarly, test campaigns can range from 20 minutes to a few hours based on the project. During this time, the engineer’s machine is heavily taxed: Yocto builds are CPU and RAM intensive, and running tests often monopolize resources like the serial port. This can significantly slow down the workstation or render it unusable for the duration of the test.

This is precisely where the test bench proves its value. By automating the entire process – from build to test execution – and offloading it to a dedicated hardware, software developers are freed from repetitive, time-consuming tasks. They can focus on functional embedded development while receiving fast, reliable feedback on the quality of their deliverables.

For sure, this separation between development and validation in our test automation infrastructure has been a major driver of efficiency and peace of mind across our engineering team.

Now, let’s break down the four key layers of our test automation infrastructure:

The Device Under Test (DUT)
Test tools and scenarios
The physical test bench
The remote access infrastructure

1. Device Under Test (DUT): a system equipped for deep testing

We begin with the DUT (Device Under Test), the core of the system under test.

Test all the dimensions of the device

The goal is to have a device that can be thoroughly tested in all dimensions. This means not only interacting with the system as an end user (black-box testing) but also inspecting and manipulating its internal behavior (white-box testing).
Read more: Black box vs white box testing

In white-box mode, we validate critical internal components. For example, we check whether the watchdog is properly configured, whether log rotation is functioning and not filling up storage, and whether the system is resilient to time jumps. These checks require minimal instrumentation of the system to observe what’s happening beyond the normal usage scope.

Build two Yocto images

To support this, we build two distinct Yocto images:

The production image is hardened and secure, intended for final deployment.
The development/test image is lightly instrumented to enable specific access for validation campaigns.

This image includes superuser access via SSH, with options like allow-empty-password, allow-root-login, or predefined keys in /root/.ssh/authorized_keys.

Reinstall the system before each test campaign

Another essential requirement is the ability to fully reinstall the system before each test campaign including the bootloader, kernel, and all filesystems. We use U-Boot with command-line instructions to send images via TFTP (tftp) and flash them using gzwrite. For boards with serial flash memory, we use sf update. This ensures every test starts in a clean, reproducible environment. Again, this level of instrumentation is absent from the production image, adding the separation between test and deployment environments.

2. Test tools and scenarios: orchestrating system interaction

Once the product is ready for test automation, we focus on the peripheral tools and test scenarios. These components must interact with the DUT through various hardware interfaces: network, serial port, power supply, and sometimes graphical output, etc.

Remote power control

One of the first requirements is remote power control. This allows us to simulate power outages or recover a frozen system, especially to re-enter U-Boot. For instance, our team uses remotely controlled power units like the IP Power 9258, or USB hubs with switchable ports for USB-powered boards.

Testing framework

Test execution is handled by an internally developed test tool called Pluma. It orchestrates actions on the target, performs checks, and collects results. While Pluma is our primary choice, the infrastructure remains compatible with other open-source or proprietary tools. Tests are mainly run by Raspberry Pi units, which offer a good balance of compactness, cost, and flexibility.

Displays

Some applications require a display to start properly. To avoid connecting physical monitors to each board, our developers use headless HDMI emulators that simulate a screen presence. This allows graphical applications to launch normally. For systems using Weston as the graphical compositor, we’ve added instrumentation to capture screenshots. These are currently reviewed manually, but we plan to automate this using perceptual image comparison tools like ImageMagick, especially for dynamic or animated interfaces, where robustness remains a challenge.

3. Wiring and organization of the test bench

Once the software components were in place, our engineers designed a physical test bench capable of supporting multiple DUTs in parallel, with reliable and reproducible interconnections. Each DUT is paired with a runner (typically a Raspberry Pi) that executes test scenarios and interacts directly with the target.

The runner communicates with the DUT through several channels:

a serial link to capture boot logs and for low-level control,
a network connection for SSH actions,
and a power interface to reboot the board remotely.

Our current setup includes ten DUTs connected to ten runners, each positioned opposite its target for easier wiring and maintenance.

Initially, the setup resembled a spaghetti mess of cables. Over time, we organized the layout, cleaned up the connections, and structured the wiring to create a clean, stable, and scalable bench. This infrastructure now allows us to run test campaigns in parallel across multiple targets, while maintaining physical and logical isolation between environments.

4. Remote access infrastructure: orchestrating tests from anywhere

Our network architecture is built around a distributed orchestration chain, with GitLab at its center. GitLab manages Git repositories, drives CI/CD pipelines, and distributes build and test tasks across available resources.

Yocto builds are delegated to runners located both on-premises and in the cloud (AWS). Once an image is built, it’s transferred to GitLab, then routed to the runner associated with the correct target board. That runner installs the image on the DUT and executes the test campaign.

Developers can connect remotely to runners to manage machines or investigate failures. If the DUT is accessible via SSH, they can connect directly to it as well. To simplify remote management, we’ve assigned fixed IP addresses and implemented stable DHCP identification, independent of the machine-id, which is regenerated with each new image.

Lessons from the field after two years of operation

Two years into operating our test automation infrastructure, the results are overwhelmingly positive.

Weekly test campaigns, along with those triggered before branch merges or software releases, have helped us catch functional regressions, misconfigured Yocto tasks, and kernel crashes on x86 and ARM platforms (sometimes caused by rare conditions like random MMC interface enumeration).

Beyond technical benefits, the infrastructure has brought real comfort to our development team. It has reduced software engineers’ cognitive load by providing an automatic safety net. Once a board is integrated into the bench, it becomes remotely testable and accessible: no need for each developer to have physical hardware on their desk!

Test automation infrastructure limitations

Despite the success, some limitations remain:

We’ve encountered intermittent network errors, sometimes stemming from GitLab, the cloud, or our own IT infrastructure, which can be difficult to diagnose.
On the hardware side, some of our Raspberry Pi runners still use 32-bit architecture, which limits compatibility with certain Yocto SDKs that require ARM64. A gradual migration to more powerful machines is underway.
Finally, the DUT flashing phase remains a sensitive point. A power outage during a gzwrite operation in U-Boot can corrupt the bootloader, requiring manual intervention (such as reconnecting a serial cable, configuring a DIP switch). These incidents are rare but highlight that even in a highly automated system, some manual steps are occasionally unavoidable.

In conclusion, implementing an test automation infrastructure for Yocto-based systems is a strategic investment. It improves delivery reliability, speeds up development cycles, and strengthens collaboration between software developers and testers.

Choosing the right tools, like Pluma testing tool, mastering the physical test bench, and integrating everything into CI/CD are critical success factors. In return, the gains in quality, responsiveness, and peace of mind are substantial.

Building a Yocto test automation infrastructure: what we learned

Project context

Why a test bench? Meeting developer needs without slowing them down