this is a modified version of http://www.andromeda.com/people/ddyer/topten.html. the document is about some nasty things that can happen in C. i am interested in C programming and have tried to respond to each issue with a suggestion for mitigating unexpected/wrong behavior. my responses are in orange. this is a work in progress.

The Top 10 Ways to get screwed by the "C" programming language

Last modified Dec 1,  2003.

To get on this list, a bug has to be able to cause at least half a day of futile head scratching, and has to be aggravated by the poor design of the "C" language.  In the interests of equal time, and to see how the world has progressed in the 20-odd years since "C" escaped from its spawning ground, see my Top 10 Ways to be Screwed by the Java programming language, and for more general ways to wase a lot of time due to bad software, try my Adventures in Hell page.

A better language would allow fallible programmers to be more productive. Infallible programmers, of the type unix' and "C" designers anticipated, need read no further.  In fairness, I have to admit that the writers of compilers have improved on the situation in recent years, by detecting and warning about potentially bad code in many cases.

  1. Non-terminated comment, "accidentally" terminated by some subsequent comment, with the code in between swallowed.
  2.         a=b; /* this is a bug
            c=d; /* c=d will never happen */
    any decent editor with syntax highlighting (vim, visual studio, more?) will catch this
  3. Accidental assignment/Accidental Booleans
  4.         if(a=b) c;      /* a always equals b, but c will be executed if b!=0 */
    Depending on your viewpoint, the bug in the language is that the assignment operator is too easy to confuse with the equality operator; or maybe the bug is that C doesn't much care what constitutes a boolean expression: (a=b) is not a boolean expression! (but C doesn't care).

    Closely related to this lack of rigor in booleans, consider this construction:

            if( 0 < a < 5) c;      /* this "boolean" is always true! */
    Always true because (0<a) generates either 0 or 1 depending on if (0<a), then compares the result to 5, which is always true, of course.  C doesn't really have boolean expressions, it only pretends to.

    Or consider this:

            if( a =! b) c;      /* this is compiled as (a = !b), an assignment, rather than (a != b) or (a == !b) */
    yup, this one sucks. if either a or b happens to be a constant value then you can protect yourself by always using it on the leftside:
    if (5 == a){ ... }
    
    because if you ever do
    if (5 = a){ ... }
    
    the compiler will catch it. but if you're comparing 2 variables then you need to be careful

  5. Unhygienic macros
  6.         #define assign(a,b) a=(char)b
            assign(x,y>>8)
    becomes
                   x=(char)y>>8    /* probably not what you want */
    yup, because macros are just text manipulation and because different operators have differing precedence you need to lovingly slather macro tokens with a healthy dose of ():
    #define assign(a,b) ((a) = ((char)b))
    
    as an aside, it is good practice to uppercase all macros and lowercase all functions so no one makes the mistake of calling ASSIGN(a, a++)
     
  7. Mismatched header files

  8. Suppose foo.h contains:
            struct foo { BOOL a};

      file F1.c  contains
            #define BOOL char
            #include "foo.h"

      file F2.c contains 
            #define BOOL int
            #include "foo.h"
    now, F1. and F2 disagree about the fundamental attributes of structure "foo". If they talk to each other, You Lose!
     
  9. Phantom returned values

  10. Suppose you write this
        int foo (a)
        { if (a) return(1); } /* buggy, because sometimes no value is returned  */
    Generally speaking, C compilers, and C runtimes either can't or don't tell you there is anything wrong. What actually happens depends on the particular C compiler and what trash happened to be left lying around wherever the caller is going to look for the returned value. Depending on how unlucky you are, the program may even appear to work for a while.

    Now, imagine the havoc that can ensue if "foo" was thought to return a pointer!

     

  11. Unpredictable struct construction

  12. Consider this bit packing struct:
        struct eeh_type
        {
                uint16 size:          10;   /* 10 bits */
                uint16 code:           6;   /* 6 bits */
        };
    Depending on which C compiler, and which "endian" flavor of machine you are on, this might actually be implemented as
            <10-bits><6-bits>
    or as
            <6-bits><10-bits>
    So what matters? If you are trying to match bits in a real world file, everything!
    Given the likelihood that code will run on a machine other than te one its being written for, one has to take endianness issues into account. In this case, you simply cannot assume that a binary file will map directly to a struct; the struct needs to be build piece by piece. Endian-assumptions like this can be discovered by a basic unit test that can be included in the program's build process.

  13. Indefinite order of evaluation (contributed by xavier@triple-i.com)
  14.         foo(pointer->member, pointer = &buffer[0]);
    Works with gcc (and other compilers I used until I tried acc) and does not with acc. The reason is that gcc evaluates function arguments from left to right, while acc evaluates arguments from right to left.

    K&R and ANSI/ISO C specifications do not define the order of evaluation for function arguments. It can be left-to-right, right-to-left or anything else and is "unspecified". Thus any code which relies on this order of evaluation is doomed to be non portable, even across compilers on the same platform.

    This isn't an entirely non controversial point of view. Read the supplementary dialog on the subject.
     

  15. Easily changed block scope (Suggested by Marcel van der Peijl <bigmac@digicash.com>)
  16.     if( ... ) 
            foo(); 
        else 
            bar();
    which, when adding debugging statements, becomes
        if( ... ) 
            foo();          /* the importance of this semicolon can't be overstated */
        else 
            printf( "Calling bar()" );      /* oops! the else stops here */
            bar();                          /* oops! bar is always executed */
    There is a large class of similar errors, involving misplaced semicolons and brackets.
    i've never personally had trouble with this one, but you can overcome it with explicit braces. also it's good practice to wrap one's macros in do { } while (0)
    #define BAR() do { printf("Calling bar()"); bar(); } while (0)
    
    if we didn't then:
    if (0 == x)
    	BAR();
    
    would evaluate the second expression always
     
  17. Permissive compilation (suggested by James M. Stern <jstern@world.nad.northrop.com>)

  18. I once modified some code that called a function via a macro:
            CALLIT(functionName,(arg1,arg2,arg3));
    CALLIT did more than just call the function. I didn't want to do the extra stuff so I removed the macro invocation, yielding:
            functionName,(arg1,arg2,arg3);
    Oops. This does not call the function. It's a comma expression that:
    1. Evaluates and then discards the address of functionName
    2. Evaluates the parenthesized comma expression (arg1,arg2,arg3)
    C's motto: who cares what it means? I just compile it! My own favorite in this vein is this:
            switch (a) {
            int var = 1;    /* This initialization typically does not happen. */
                            /* The compiler doesn't complain, but it sure screws things up! */
            case A: ...
            case B: ...
            }
    Still not convinced? Try this one (suggested by Mark Scarbrough <mes@triple-i.com>):
    #define DEVICE_COUNT 4 
    uint8 *szDevNames[DEVICE_COUNT] = {
            "SelectSet 5000",
            "SelectSet 7000"}; /* table has two entries of junk */
    i always avoid hard-coding array bounds, because they often change. the above code could be rewritten:
    uint8 *szDevNames[] = {
    	"SelectSet 5000",
    	"SelectSet 7000"
    };
    #define DEVICE_COUNT (sizeof szDevNames / sizeof szDevNames[0])
    
  19. Unsafe returned values (suggested by Bill Davis <wdavis@dw3f.ess.harris.com>) 
  20. char *f() {
       char result[80];
       sprintf(result,"anything will do");
       return(result);    /* Oops! result is allocated on the stack. */
     }

    int g()
    {
       char *p;
       p = f();
       printf("f() returns: %s\n",p);
    }
    The "wonderful" thing about this bug is that it sometimes seems to be a correct program; As long as nothing has reused the particular piece of stack occupied by result.

    gcc 3.3.5 warns function returns address of local variable
     

  21. Undefined order of side effects. (suggested by michaelg@owl.WPI.EDU and others) 
  22. Even within a single expression, even with only strictly manifest side effects, C doesn't define the order of the side effects. Therefore, depending on your compiler, I/++I might be either 0 or 1. Try this:

    #include <stdio .h>

    int foo(int n) {printf("Foo got %d\n", n); return(0);}

    int bar(int n) {printf("Bar got %d\n", n); return(0);}

    int main(int argc, char *argv[]) 
    {
      int m = 0;
      int (*(fun_array[3]))();

      int i = 1;
      int ii = i/++i;

      printf("\ni/++i = %d, ",ii);

      fun_array[1] = foo; fun_array[2] = bar;

      (fun_array[++m])(++m);        
    }

    Prints either i/++i = 1 or i/++i=0;
    Prints either "Foo got 2", or "Bar got 2"
    Undefined result yes, but the fact that it's undefined is defined. The best practice is to read about Sequence Points in ISO 9899 Annex C
     
  23. Uninitialized local variables 
  24. Actually, this bug is so well known, it didn't even make the list! That doesn't make it less deadly when it strikes. Consider the simplest case:

    void foo(a)
    { int b;
      if(b) {/* bug! b is not initialized! */ }
    }
    and in truth, modern compilers will usually flag an error as blatant as the above. However, you just have to be a little more clever to outsmart the compiler. Consider:
    void foo(int a) 
    { BYTE *B;
       if(a) B=Malloc(a);
              if(B) { /* BUG! B may or may not be initialized */ *b=a; } 
    }
    GNU code I've read mitigates this by initializing pointers to NULL upon declaration
  25. Cluttered compile time environment 
  26. The compile-time environment of a typical compilation is cluttered with hundreds (or thousands!) of things that you typically have little or no awareness of.  These things sometimes have dangerously common names, leading to accidents that can be virtually impossible to spot.

    #include <stdio.h>
    #define BUFFSIZE 2048
    long foo[BUFSIZ];                //note spelling of BUFSIZ != BUFFSIZE

    This compiles without error, but will fail in predictably awful and mysterious ways, because BUFSIZ is a symbol defined by stdio.h.  A typo/braino like this can be virtually impossible to find if the distance between the the #define and the error is greater than in this trivial example.

    Yup, this one has bitten me. It can be mitigated by using non-generic names, like MYAPP_BUFSIZE
     

  27. Underconstrained fundamental types

  28. I've been seriously burned because different compilers, or even different options of the same compiler, define the fundamental type intas either 16 or 32 bits..  In the same vein, name any other language in which boolean might be defined or undefined, or might be defined by a compiler option, a runtime pragma (yes! we have booleans!), or just about any way the user decided would work ok.
    C99 finally gave us uintNN_t and stdbool.
     
  29. Utterly unsafe arrays 
  30. This is so obvious it didn't even make the list for the first 5 years, but C's arrays and associated memory management are completely, utterly unsafe, and even obvious cases of error are not detected.

     int thisIsNuts[4]; int i;
      for ( i = 0; i < 10; ++i )
      {
        thisIsNuts[ i ] = 0;     /* Isn't it great ?  I can use elements 1-10 of a 4 element array, and no one cares */
      }

    Of course, there are infinitely many ways to do things like this in C.

    It's best to always calculate array size via (sizeof thisIsNuts / sizeof thisIsNuts[0])
     

  31. Octal numbers (suggested by Paul C. Anagnostopoulos) 
  32. In C, numbers beginning with a zero are evaluated in base 8.  If there are no 8's or 9's in the numbers, then there will be no complaints from the compiler, only screams from the programmer when he finally discovers the nature of the problem.

     int numbers[] = { 001,        // line up numbers for typographical clarity, lose big time
                               010,        // 8 not 10
                               014 };     // 12, not 14

    Always use hex! ;)
  33. Signed Characters/Unsigned bytes.
    C was forced into a consistancy trap by including  unsigned as a modifier for al integer types.  On one hand, the fact that types char and byte are signed causes all kinds of problems -  It is never intuitive that 128 is a negative number, and so very easy to forget.  On the other hand,  any arithmetic using low precision integers must be done very carefully, and C makes it much too easy to ignore this.

    char s = 127;
    unsigned char u = 127;
    s++;      /* the result is a negative number!  Effectively overflow occurs, but no trap */
    if (s<u) { /* true!*/ }
    if(s>127) { /* this can never be true */  }
    if(u<0) {  /* this can never be true*/  }

    gcc 3.3.5 will complain about unsigned/signed comparison and comparison is always false due to limited range of data type

  34. Reserved for future expansion. Send email to ddyer@real-me.net
Back to my home page Visitor Map