Code size optimizations

During the last two years I have had the opportunity to write embedded software for the SPARC ERC32 architecture. The first piece of software we have written is an application that needs to fit in a 64 Kbytes PROM with a minimal running system that can execute periodic tasks, send packets over a network, load another application stored in an EEPROM that can be remotely patched and many other things. Before starting to write it from scratch we tried to fit RTEMS (a Real-Time Operating System) in the 64 Kbytes. We achieved a running RTEMS in 55 Kbytes, but there was only 10 Kbytes left (note that modern versions of RTEMS are smaller).

The following tips are the ones I have found useful to decrease my application size (they might not work in all architectures). Note that all the tips are C oriented and that gcc has been used.

I would really appreciate if you can share your tricks and I will be glad to include them in the list.

Compress application code

As the application is stored in a PROM, you will normally have a minimal code to perform some initializations and checks (e.g. RAM tests), and the rest of the code for the application itself. The main application will run from RAM, so you could compress the application using a simple compression algorithm such as LZSS, uncompress the application to RAM and run it. The decompression algorithm will be obviously uncompressed.

If you are working with an ERC32 processor check the mkprom-erc32 tool.

Use inline functions judiciously

As it is well-known, performance issues should not be treated while developing. So, writing a lot of inline functions might not have great performance effects, but could have big size penalties if you don’t use them judiciously.

Compiler optimization flags

If performance is not important in your application (i.e. you have no hard real-time requirement), you can use the size optimization options provided by your compiler. For example, in gcc, you can use the -Os option to decrease the executable size. We reduced 5 Kbytes.

Note, that this will avoid inline functions optimization, but you can still tell gcc to include them passing the -finline-functions option.

Split functions into separate files

If you are developing different applications that might share most of the code, you may think on writing an static library. The point is that having a reusable library might affect the executable size. This is because, probably, not all the functions compiled will be used in all the applications. When gcc compiles a source file, all the functions in it will be linked in the final binary even if your code does not use them. Of course, if non of the functions of the source file is being used, they will not be linked. Splitting functions into separate files might solve the problem.

There is one way to avoid this (I haven’t tried it) with gcc using the -ffunction-sections option. This creates a separate section for every function. The linker can then use the –gc-sections option to discard unused sections.

Separate debug and release code

While writing embedded software you may need to have debug and release code. Clear examples are the printing or logging functions that might not be necessary in production code. If this is the case, separate the debug and release code. In C, you could have a serial port printing function in debug and an empty macro in release. This can be achieved via a building system, separating the definitions in different files (which I recommend), or using macros.

Do not use packed structures extensibly

By definition, packed structures are not aligned, this means that, depending on the architecture, the compiler needs to generate extra code to get members of structures. So, using packed structures extensibly may increase your application size and may decrease the performance.

Another reason for not using packed structures is portability.

Enumerations vs. constant objects

As described in this article, constant objects, such as:

unsigned int const ITEM_ID = 0x01200020;

may increase your application size. So, if you really have a few bytes left, you might be interested in changing the definition by using an enumeration:

enum { ITEM_ID = 0x01200020 };


5 Responses to “Code size optimizations”

  1. bi Says:

    Re splitting along functions: dev86’s libc uses this weird trick where a .c file contains several functions, but each function is surrounded with an #ifdef…#endif, and several separate .o files are compiled from this single .c file by selectively turning on each of the macros referenced in the .c file.

  2. aleix Says:

    Wow! This might be something really scary to look at. 😉


  3. Lluís Says:

    Another best practice to reduce size is implement the code with a best compromise in the procedural abstraction with code motion. Implement a hard procedural abstraction has penalitzation in size with unnecessary procedure calls. This issue can be solved apply another algorithm, inlining the functions ( judiciously , of course 😉 ) or applying code motion techniques for example.

    In another side, in embedded systems where, as you explain, size is to critical, the parameterization code technique could be useful. That is, if we have these two code fragments :

    // … //
    for (int i = 0; i < 10; ++i)
    v[i] = 0;

    for (int j = 0; j < 200; ++j)
    a[j] = 0xFF;

    then, we can replace it with a new procedure where variants are the parameters :

    loop (int *vector, int size, int value)
    for (int i = 0; i < size; ++j)
    vector[j] = value;

    This is a trivial example, but basically the strategy is to use code motion to generate code-blocks with the same semantic and after, apply parameterization.

  4. abhishekiitk Says:

    Can you please help me understand the trick mentioned in “Enumerations vs. constant objects” topic. How using an enum helps in saving memory space?

    • aleix Says:

      Constant objects have an address so you might get extra instructions to access to the address and get the value into a register. Enums are not variables so they can be substituted directly for the corresponding value at compilation time. At least, this is how I understood it and the tests I did at that time were coherent with these assumptions.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: