Run performance tests on a Real-time system without leaving the office

Here at Greeneye, our system is composed of many processes and jobs responsible for the entire pipeline, from taking pictures in Real-time to deciding whether we want to spray the pictured spot.

The pipeline itself consists of multiple hardware components and physical constraints - from the tractor itself, which carries the entire system, through the speed sensor for detecting and sampling speed, the cameras, nozzles, and more.

Step 1: Get rid of hardware dependencies.

Running performance tests for a hardware-dependent system will require setting up a complicated, expansive, and not-scalable environment.
The second step in our progress was to design our system to be hardware-injected. What is that mean?
Imagine your environment includes a speed sensor. Whenever the speed sensor indicates a positive speed, the system starts working.

Mocking the system’s speed will grant us two essential benefits:

  1. Starting the system without having to actually drive a vehicle.

  2. Test and benchmark your system at a dynamic speed.

Implementing the hardware-injection will be as described in figure 2.
A configurable variable will determine whether we will use the real or mock hardware. The mocked hardware return-value will be configurable as well.
Both real and mock hardware write their output to the exact same shared-object, it prevents the hardware consumers from being affected by setting real/mock hardware source.

“It does not matter how intelligent you are, if you guess and that guess cannot be backed up by experimental evidence–then it is still a guess.”

-Richard Feynman

One of the most important principles in developing a performant system is the ability to measure and monitor your performance. Metrics, benchmarking and profiling representation will be the best indicators for your development efforts.

In order to make our system testable, we marked few goals:

  1. Test the influence of a new detection/classification model on the prediction accuracy.

  2. Benchmark every new model.

  3. Test every model with a large set of parameters and thresholds, in order to find the set that maximizes our prediction accuracy. i.e: Hypterparameter tunning.

  4. Profiling our system’s bottlenecks.

  5. Be able to run the tests on a custom independent cloud agent.

Before running into implementation, we encountered two fundamentals issues:

  1. The system must be configurable, in order to be able to inject different and multiple sets of configurations without changing the code itself.

  2. Our system depends on many hardware components. we must get rid of these dependencies to be able to run performance tests at scale.

Step 2: Make your system configurable

The first step towards running performance tests was converting all configurable variables to be injected from a configuration file.

As described in figure 1, for giving us the ability to define a different and meaningful set of parameters for every test, we use a configuration file and a runtime process to handle edit requests for that configuration file.

figure 1.

figure 1.

figure 2.

figure 2.

Step 3: Simulating the system

Having the ability to set configurable variables at runtime and mock the hardware dependencies, the last piece of the puzzle will be completed by orchestrating the sets of configuration and test their outputs.
Implementing the simulation process will be described in figure 3.
The simulator process loads the desired test configurations. For every test permutation, it will simulate the system and save the relevant output and configuration for a unique location.
After simulating the system for all the different sets of configurations, the only thing left is to post-process and test the output corresponding to the attached configuration.

figure 3.

figure 3.

Step 4: Tests and Visualization

The last part of our process will be handling the system’s output and opening a window into our system’s performance. Having multiple sets of results, including predictions/benchmarks/logs data and the specific configuration that produced those performances, can be leveraged quickly to find the best configuration for our system. 

Here at Greeneye, we use the ClearML platform to analyze, research, and visualize our tests and performances. A cool feature in ClearML is the ability to compare different runs. Figure 4 shows a Comparison between multiple sets of configuration outputs, helping us choose the best detection models for our system. Using the configuration attached for each experiment, we can quickly reproduce the best parameters for our system and use them in our production environment. Another bonus in ClearML is that each run saves not only the output but also the input i.e, the configuration file that produced that results

figure 4

figure 4

In addition, we used the simulator to efficiently operate a benchmarking test. The simulator sends different speed values for our system and analyzes the performances for every single speed. That helped us discovering our system’s weaknesses, points of failure, and limits. At figure 5, you can see a graph of dropped frames as a function of the tractor’s speed. Thanks to step 2, The tractor is still parking inside the garage, the tests are running on the cloud.

Figure 5.

Figure 5.

Summary

Developing a Real-time system will always require testing its performance. To do so, and do it fast, elegant, scalable, and without requiring many resources, we must design our code to be configurable and capable of mocking hardware dependencies. Keeps those in mind will give you one of the best gifts a complex-system engineer can ask- the ability to run your system independently, in a lightweight mode, and configured exactly the way you want it.

Livne Rosenblum,
Tractor Team Lead @ Greeneye Technology

Kubernetes Liveness and Readiness Probes in a Real-Time System

Expectation vs. Reality

Everyone who got to work with Kubernetes, especially as a deployment system, probably noticed the option of `liveness` checks to their system, got very excited for the opportunity to make the system even more robust, but then realized it is not always amazing. Liveness and readiness failures can be confusing and frustrating while trying to understand what causes your system to keep crashing constantly.

The probes Kubernetes offers are necessary but need to be an exact match with your system. It takes time and effort to implement the right for your needs and keep it suitable while the program is changing. As discussed in many blogs, you have to be familiar with the difference between the probes and how they work to get started. One blog post by Colin Breck has helped me understand how to continue working until I reach my goal.

Let’s talk Real-Time

The blog I mentioned before, and many others I could find, introduce implementations for health checks for a web application. In Greeneye Technology, we are developing a Real-Time system that, by definition, is completely different than a web application, with obvious differences in properties and limitations. Most importantly, in a real-time application, we can’t afford to have any of our containers down or stuck for a long time.

Kubernetes has three types of probes: an HTTP GET request on the container’s IP, a TCP connection to the specified container, or running an “exec probe” that as a command inside the container. For a system that has to be efficient in memory and resources, running an internal server specifically for probing is less wanted. Moreover, the realtime

From the description in Kubernetes docs, the “exec command” works as follows:

Command is the command line to execute inside the container, the 
working directory for the command is root ('/') in the container's 
filesystem. The command is simply exec'd, it is not run inside a shell, 
so traditional shell instructions ('|', etc) won't work. To use a 
shell, you need to explicitly call out to that shell. Exit status of 0 
is treated as live/healthy and non-zero is unhealthy.

Now all is left is to decide what command will be running inside the container. This command will decide if the container is healthy or not. The most common example you can find, is the one kubernetes suggests - the container writes (touch) to a file every period time, and the command kubernetes runs is to check if this file exists.

spec:
  containers:
  - name: container-1
    ...
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy

This can be a neat solution, but is it a solution for a real-time system? Definitely not. We can’t afford an additional I\O command from the container side, especially if it needs to be run every few seconds.

So our command for aliveness check has to be: 

  1. Fast with no additional I\O operations.

  2. Executed every second or even less.

  3. Reliable, meaning it won’t fail when the container is actually healthy.

Our solution: Shared memory to the rescue!

Our multi-process system communicates between each process by a shared-memory object, memory saved in a specific place in the system’s memory, where every process can read and write to. Access to an object in the shared memory cost as accessing any variable in the global scope. 

How can shared memory help us with liveness and readiness checks? Exactly as writing to a file, but instead, writing to a shared memory object dedicated to this purpose. Implementing a short program that reads from that memory can be our command to execute inside the container.

The full flow:

Every process (aka container in our program) is responsible for reporting liveness to the shared memory object, every period of time, for example, every cycle in the main loop. Kubernetes runs a short program that does the following:

  1. It checks the current state in the shared memory.

  2. According to that-the probing command decides to exit with code 0 (healthy) or otherwise (not healthy).
    The decision can be simply comparing the timestamp of the last live report to the timestamp now.

Conclusion

With the exec command type Kubernetes has, there are almost no limits to what we can run as a liveness check. You can use any script or command, as long as it can be executed inside the container, and that’s it!

We can use this necessary feature with shared memory, make the program more robust, without major run time costs, and still be precise about the container’s status.

Make sure you understand the limits of your system and how these checks can beneficial instead of harmful. 

Some tips:

  • First, define what it means that a process is alive or ready, and after you understand that, add in each place in the program a liveness report.

  • Add logs to the command! They will help you understand quickly where it fails and on which container. You can see if a health check fails by seeing the events in the pod by one of these commands:

- kubectl get events
- kubectl describe pod
  • Get to know where these checks can fail and whether there’s a difference between the containers. You can try to find loops or places the process can be stuck for a long time, as the processor won’t report live, and therefore Kubernetes will restart your container.

  • Use all the parameters Kubernetes has for health probes, and define them carefully: initialDelaySeconds, periodSeconds, successThreshold and more. Full description here.

Good Luck!

Shelly Bekhor,
Realtime Developer @ Greeneye Technology