Archive for the ‘C’ Category

Integer promotion: comparisons

September 16, 2007

This may be something basic, but I lost a bit of time last week trying to find a bug at work, so I thought it was worth mentioning it.

When comparing signed and unsigned expressions of the same size, the compiler produces what it might be unexpected results. Suppose you have this code:

#include <stdio.h>

int
main (void)
{
  unsigned short int a = -12;
  signed short int b = -12;

  printf ("%s\n", (a == b) ? "OK" : "failed");

  return 0;
}

Here, you would probably expect an “OK” output, but surprisingly you get “failed“. Why is this? If you know about integer promotion, you may skip to the end if you don’t want to follow all the process.

Fortunately, I received a copy of K&R last week, so I started digging into the issue to try to understand why this was happening.

About equality operators, K&R section A7.10 reads

The equality operators follow the same rules as the relational operators…

Section A7.9, about relational operator, reads

The usual arithmetic conversions are performed on arithmetic operands…

So, how these arithmetic conversions work?

This is also clearly explained in K&R section A6.5 (the same rules apply to the C99 standard, section 6.3.1.8, Usual arithmetic conversions). The part that interests us is after having evaluated all the real type conversions, when it says

Otherwise, the integer promotions are performed on both operands…

So, before performing the comparison of our operands, both undergo integer promotion. Integer promotion (K&R, section A6.2) says that

If an int can represent all the values of the original type, then the value is converted to int; otherwise the value is converted to unsigned int.

An int can represent all the values of our operands, so after the integer promotion is performed, both operands have int type. Having a look back to our operands, we know that the a variable holds the value 0xFFF4, and after applying the integer promotion, it maintains the value. The same happens with variable b, that holds 0xFFFFFFF4 to represent -12. Clearly, both values are different and the check fails.

At the end of K&R section A6.5 this is explicitly explained

The new rules are slightly more complicated, but reduce somewhat the suprises that may occur when an unsigned quantity meets signed. Unexpected results may still occur whan an unsigned expression is compared to a signed expression of the same size.

Basically, what’s going on here, is that both variables undergo integer promotion. b is signed, and it is sign-extended. This sign-extension is maintained due to the integer promotion.

Note, that this issue does not occur with 32-bit values, as both operands would be “0xFFFFFFF4″.

So, be careful when comparing unsigned and signed types.

Helper macros for Check

August 6, 2007

During my vacation I have had the opportunity to add unit testing to SCEW (Simple C Expat Wrapper). I looked at various C unit testing frameworks and I decided to use Check. Most of them follow the xUnit approach, but I chose Check because tests run in a separate address space other than the test runner.

I found that writing test cases was a bit hard using Check’s syntax, for example following the manual you can write this integer check:

fail_unless (money_amount (m) == 5,
             "Amount not set correctly on creation");

This is fine if you are reading the code, but if the check fails the output doesn’t show you what the actual or expected values are, so the manual suggests changing it for:

fail_unless(money_amount (m) == 5,
            "Amount was %d, instead of 5", money_amount (m));

which is quite better than the first one, but to painful if you have to write it for every check. So, why not write a helper macro that checks for integers, prints the actual and expected values and also shows you the check that is being done?

#define CHECK_U_INT(A, B, MSG, ...)                                     \\
  do                                                                    \\
    {                                                                   \\
      enum { MAX_BUFFER = 250 };                                        \\
      static char buffer[MAX_BUFFER];                                   \\
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \\
      unsigned int v_a = (A);                                           \\
      unsigned int v_b = (B);                                           \\
      fail_unless (v_a == v_b,                                          \\
                   "(%s) == (%s) \\n  Actual: %d \\n  Expected: %d \\n  %s", \\
                   #A, #B, v_a, v_b, buffer);                           \\
    }                                                                   \\
  while (0)

With this macro you can now write code like this:

CHECK_U_INT (money_amount (m), 5, "Money amount mismatch");

which is really easy to read and in the test’s output you can see the actual and expected values, the performed test and the user message clarifying the intention of the check.

check.c:97:F:Core:test_amount:0: (money_amount (m)) == (5)
  Actual: 2
  Expected: 5
  Money amount mismatch

The same happens with strings, so instead of writing this:

fail_if (strcmp (money_currency (m), "USD") != 0,
         "Currency not set correctly on creation");

or this:

if (strcmp (money_currency (m), "USD") != 0)
  {
    fail ("Currency not set correctly on creation");
  }

we can write a string checking macro that shows the actual and expected strings and the check being done:

#define CHECK_STR(A, B, MSG, ...)                                       \\
  do                                                                    \\
    {                                                                   \\
      char const *str_a = (A);                                          \\
      char const *str_b = (B);                                          \\
      if (strcmp (str_a, str_b) != 0)                                   \\
        {                                                               \\
          enum { CHECK_MAX_BUFFER = 250 };                              \\
          static char buffer[CHECK_MAX_BUFFER];                         \\
          sprintf (buffer, MSG, ##__VA_ARGS__);                         \\
          fail ("(%s) == (%s) \\n  Actual: %s \\n  Expected: %s \\n  %s",  \\
                #A, #B, str_a, str_b, buffer);                          \\
        }                                                               \\
    }                                                                   \\
  while (0)

As before, this would be the new output:

check.c:205:F:Core:test_currency:0: (money_currency (m)) == (USD)
  Actual: EUR
  Expected: USD
  Currency not set correctly on creation

Well, this is not a big deal, but I have found it quite useful. Below, is the list of macros I am using right now:

#define CHECK_U_INT(A, B, MSG, ...)                                     \\
  do                                                                    \\
    {                                                                   \\
      enum { MAX_BUFFER = 250 };                                        \\
      static char buffer[MAX_BUFFER];                                   \\
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \\
      unsigned int v_a = (A);                                           \\
      unsigned int v_b = (B);                                           \\
      fail_unless (v_a == v_b,                                          \\
                   "(%s) == (%s) \\n  Actual: %d \\n  Expected: %d \\n  %s", \\
                   #A, #B, v_a, v_b, buffer);                           \\
    }                                                                   \\
  while (0)

#define CHECK_S_INT(A, B, MSG, ...)                                     \\
  do                                                                    \\
    {                                                                   \\
      enum { MAX_BUFFER = 250 };                                        \\
      static char buffer[MAX_BUFFER];                                   \\
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \\
      int v_a = (A);                                                    \\
      int v_b = (B);                                                    \\
      fail_unless (v_a == v_b,                                          \\
                   "(%s) == (%s) \\n  Actual: %d \\n  Expected: %d \\n  %s", \\
                   #A, #B, v_a, v_b, buffer);                           \\
    }                                                                   \\
  while (0)

#define CHECK_BOOL(A, B, MSG, ...) CHECK_U_INT (A, B, MSG, ##__VA_ARGS__)

#define CHECK_STR(A, B, MSG, ...)                                       \\
  do                                                                    \\
    {                                                                   \\
      char const *str_a = (A);                                          \\
      char const *str_b = (B);                                          \\
      if (strcmp (str_a, str_b) != 0)                                   \\
        {                                                               \\
          enum { CHECK_MAX_BUFFER = 250 };                              \\
          static char buffer[CHECK_MAX_BUFFER];                         \\
          sprintf (buffer, MSG, ##__VA_ARGS__);                         \\
          fail ("(%s) == (%s) \\n  Actual: %s \\n  Expected: %s \\n  %s",  \\
                #A, #B, str_a, str_b, buffer);                          \\
        }                                                               \\
    }                                                                   \\
  while (0)

#define CHECK_PTR(A, MSG, ...)                                          \\
  do                                                                    \\
    {                                                                   \\
      enum { MAX_BUFFER = 250 };                                        \\
      static char buffer[MAX_BUFFER];                                   \\
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \\
      fail_unless ((A) != NULL, "(%s) != NULL \\n  %s", #A, buffer);     \\
    }                                                                   \\
  while (0)

#define CHECK_NULL_PTR(A, MSG, ...)                                     \\
  do                                                                    \\
    {                                                                   \\
      enum { MAX_BUFFER = 250 };                                        \\
      static char buffer[MAX_BUFFER];                                   \\
      sprintf (buffer, MSG, ##__VA_ARGS__);                             \\
      fail_unless ((A) == NULL, "(%s) == NULL \\n  %s", #A, buffer);     \\
    }                                                                   \\
  while (0)

Update 2007/10/05: Fix: using macro parameters more than once might cause multiple unnecessary function calls.

Update 2007/08/11: I have updated the macros so variable number of arguments are allowed (see variadic macros).

Be careful with packed structures!

July 24, 2007

If you use the C language, you may have probably wanted to pack your structures so no alignment is done by the compiler. This can be useful for example to build network packets. You can find the basics of packed structures using GCC, here or here.

If everything seems clear, why am I writing this? Today, a coworker has found a bug related to packed structures. The issue was with internal structures (i.e. a substructure). A year ago, or so, I knew that substructures were not packed even if its enclosing structure is packed, but it seems I forgot about it, so I have decided that it was worth writing it here so I do not forget it again (I’m sure I will).

I will take the example found in GCC documentation (only since version 3.4.0). Suppose you have the following code:

struct my_unpacked_struct
{
  char c;
  int i;
};

struct my_packed_struct
{
  char c;
  int  i;
  struct my_unpacked_struct s;
} __attribute__ ((__packed__));

struct my_packed_struct my = {
  .c = 10,
  .i = 20,
  .s.c = 30,
  .s.i = 40
};

If we generate the assembly for this (I have omitted some things not needed for the example), we will get:

        .globl _my
        .data
_my:
        .byte   10  <--- c
        .long   20  <--- i
        .byte   30  <--- s.c
        .space 3    <--- 3 bytes of alignment
        .long   40  <--- s.i

As you can see, the compiler has not aligned the internal structure, but the enclosing one. So, what you need to do if you want the internal structure also packed is to pack my_unpacked_struct:

struct my_unpacked_struct
{
  char c;
  int i;
} __attribute__ ((__packed__));

Now, we get what we initially expected:

        .globl _my
        .data
_my:
        .byte   10  <--- c
        .long   20  <--- i
        .byte   30  <--- s.c
        .long   40  <--- s.i

Packing the whole structure my_unpacked_struct is fine if you do not use it anywhere else, but it would be great to use variable attributes (we have used type attributes so far), so we could only pack the internal substructure variable like this (it doesn’t work):

struct my_packed_struct
{
  char c;
  int  i;
  struct my_unpacked_struct s __attribute__ ((__packed__));
} __attribute__ ((__packed__));

Update 2007/07/25: read the first comment to understand why the variable attribute is not working in this case.

By the way, in the example I have initialized the structure my using designated initializers.

And remember, be careful with packed structures!

Designated initializers

February 19, 2007

Last year, I discovered, thanks to the book “C, A Reference Manual“, a great C99 feature: designated initializers. Designated initializers allow you to initialize components of an aggregate (structure, union or array) by specifying their names within an initializer list.

Arrays initialization

What most people normally use to initialize an array is the following idiom:

int v[4] = { 4, 2, 1, -5 };

in which you need to initialize each component of the array sequentially. Designated initializers allow you to specify which component of the array you want to initialize. Thus, we could write the line above as:

int v[4] = { [1] = 2, [2] = 1, [0] = 4, [3] = -5 };

Note that we have specified the component indexes which has allowed us to initialize the array with our desired order. If we do not initialize all the components, those not initialized will get 0 values. We can also mix both methods, so the line below would be also correct:

int v[4] = { [1] = 2, 1, [3] = -5 };

in which the component not referenced goes right after the named one.

A possible use of this kind of initializations would be a mapping between a list of identifiers and a list of strings.

// The public interface

typedef enum {
  id_one,
  id_two,
  id_three
} id_t;

extern char const* string_by_id (id_t id);

// The private implementation

static char const* strings[] =
{
  [id_one] = "identifier one",
  [id_two] = "identifier two",
  [id_three] = "identifier three"
};

char const*
string_by_id (id_t id)
{
  return strings[id];
}

Structures and unions initialization

Designated initializers are also useful to initialize components of structures and unions by their name. In this case, the component to be initialized takes the form .c, where c is the name of the component. So, suppose we have the following structure:

struct point { float x; float y; float z; };

we could initialize each component of a struct point variable like this:

struct point my_point =
{
  .x = 0.34,
  .y = 0.98,
  .z = 1.56
};

With unions, we will use the same method, so having the following union:

union integer
{
  unsigned char int_8;
  unsigned short int int_16;
  unsigned long int_32;
};

we can initialize it by any of its components:

union integer value = { .int_16 = 24000 };

Finally, we can merge both cases, so we can have arrays of structures or unions that can be initialized using designated initializers:

struct point pointvector[3] =
{
  [0].x = 0.34, [0].y = 1.78, [0].z = 3.18,
  [1] = { .x = 3.5, .y = 6.89 },
  [2] = { .y = 2.8, 1.23 }
};