Chaos engineering by sportsgiant

Xplore Group Logistics-food Multi Nationals Performance testing

During the Innovation Days at a leading sports A-brand, we worked together with Gluo on the company’s first chaos engineering experiments — a real first in Belgium. Chaos engineering extends the non-functional testing approach: we deliberately and safely disrupt the normal behavior of a system to build confidence in its resilience and recovery. The impact on end users must remain as small as possible.  

With Black Friday approaching, the organization wanted to leave nothing to chance and be fully prepared to handle a surge in online traffic so that everyone could purchase their wish list without issues.  

Our customer’s focus was clear: uncover system pain points, define and implement improvements, and train the team to perform rapid analysis and pinpoint problems. Through an intensive four-day workshop, we introduced chaos engineering into the logistics ticket-printing system.

Share this case

Brainstorming

After an introduction to the team, we started immediately. We received a brief overview of the system, which consists of cloud components, third-party software, and on-premises subsystems.  

The combination of chaos experiment and system component is crucial. Because these were the first steps into chaos engineering for this sports brand, we applied the principle “start small, end big.” Which experiments on which components would yield the most insight? With that mindset, we identified the most critical components of the system and designed experiments such as hard shutdowns and increased CPU usage. We targeted standalone components so we could clearly observe the impact of each experiment.  

For every experiment–component combination, we analyzed normal behavior, the expected impact of the experiment, the expected recovery process, any fallback mechanism to abort the test, and the monitoring needed to study impact and results.  

To avoid meaningless results and to truly build confidence in the system, the load and infrastructure had to mirror Black Friday conditions as closely as possible. This preparation included a performance script to generate representative load and an extension of the end component: virtual printers.

Implementing

The day had finally arrived: time to run the first chaos experiments. Our analyses were projected on large screens and the initial tests were started.  

Because this was the organization’s first encounter with chaos engineering, we did not yet automate the process and deliberately chose a manual approach. We needed to build confidence in the system, and time was limited. The focus therefore lay on the content, execution, monitoring, and impact of each test on the system.  

Looking ahead, we recommend automating the successful experiments and running them at random intervals on the system to obtain fast feedback during the development process.

Lessons learned

The Chaos Game Days delivered clear value. Several technical action points and improvements were identified around infrastructure and monitoring. The team gained a much better understanding of weak spots in the system and how they manifest, which leaves them better prepared for high-traffic days like Black Friday.  

Key lessons learned: solid preparation and a robust set of non-functional performance scripts are invaluable.  

The next Game Days are already on our calendar.

Share this case

milan

Milan Meuleman

Business development & sales

Contact Refleqt today

Would you like more control over software quality, test automation, or performance? We are happy to explore together how we can support your team with an approach that works in practice.