Ericsson Using Virtual Platforms for Dynamic Analysis

13 Jun 2019 • 4 minute read

At CDNLive EMEA last month, Ola Dahl of Ericsson presented Dynamic Software Analysis in Virtual Platforms. He described his job as "I work in the 5G network business doing basestations and radios."

They have a virtual platform for one of their basestations that contains models of the ASICs and board components, and runs the software load for the baseband and the radio. The basic architecture is known as EMCA for Ericsson Multi-Core Architecture. The platforms end up containing a lot of components:

Processors (communication processor, DSP)
Baseband accelerators
Radio processing
Various peripheral I/Os such as I2C
Ethernet, CPRI (common platform radio interface)
Interconnect and board components

The software is connected up to:

Debugger for the target software
Test environments
Logging and tracing

By running the software in the platform, they can catch bugs early on. In fact, "platforms tend to stick around since the software people like them even when hardware is available." But when they get bug reports, they have to decide whether it is a bug in the virtual platform or in the target software. A lot of these bugs are caused by a couple of issues: uninitialized memory, and concurrency problems.

As Ola put it:

For host applications, we have tools like Valgrind memcheck and different sanitizers (e.g., MemorySanitizer, AddressSanitizer)

You can't just take Valgrind and run it on a real-time platform for all sorts of reasons, such as it slowing down the software so much it wouldn't work, and completely changing the memory usage. So they thought about instrumenting the virtual platform with similar functionality. Since it would be in the platform, not part of the target code, it would not slow down simulated time. It would, of course, slow down the simulation, but to the platform, it would be invisible since the platform controls time as seen by the code.

Similarly, for host applications they have tools like ThreadSanitizer and so they wondered if they could take this same idea and instrument the virtual platform by adding ThreadSantizer-like functionality. Then they would be able to detect concurrency problems in a non-intrusive way.

The way Valgrind works is that it shadows memory with V-bits (V for validity), which tell whether a byte is initialized or not, and an A-bit (A for accessibility), whether it is accessible or not. As the program runs (Valgrind actually interprets the code), these bits are updated. This needs to be done in an intelligent manner, or there will be a lot of false positives when a region of memory is copied. An uninitialized value only becomes problematic when the code tries to do something like make a decision based on its value. So you need to wait until the user does something with the uninitialized value that has externally visible effects. One particular problem is that structs often contain uninitialized padding (for example, to round them up to a multiple of 64-bits), which gets copied when the struct is copied, but normally is never accessed more directly.

For concurrency analysis, they implemented detection and reporting mechansims, taking their insipration from ThreadSanitizer.

This shows the type of problem that gets caught. Two DSPs are booted. One allocates a buffer and passes it to the other. The second DSP does something and then frees the buffer, but back in the first DSP it tries to look at the buffer. This type of error is very hard to detect since often the buffer will be unchanged and the code will work fine...until it doesn't. Until the day that memory is tight and the buffer has already been reused and the old values have been overwritten already.

Here's another example, perhaps the most common error in all of programming, an "off-by-1" error. The "<=" in the first line should be "<" or, if the value is not in the array, the code will access the value just off the end of the allocated array (which is considered okay, just copying) and then see whether that value matches in the if-statement (which is not okay, and so produces an error):

It is harder to give a simple example of concurrency problems. The code below is pretty contrived, but shows one process sending a message to another. The message itself is protected by a semaphore (mutex) but the first process then writes to the message without getting the lock.

There is a significant penalty for all this checking, with run times increased by 2-4X for uninitialized checking and 5X for concurrency detection. It is worth emphasizing that since this is a virtual platform, and the checking is in the platform, that the simulated time of all events is unaffected.

Ola pointed out that one problem with dynamic analysis like this is that only the code that actually gets run gets checked. So you still need static analysis tools, the two complement each other. Static tools can detect many issues even if the test data doesn't happen to exercise that branch of the code.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email