DEFENSIVE C PROGRAMMING TIPS

Well we all know there's a ton of great c and c++ textbooks out there. But one thing they usually don't cover is how to write bullet proof c code. Often error handling and methods of debugging your code is glossed over to make example programs more readable. At best if your lucky you'll get maybe a chapter describing the debug faculties in c, at worst practically nothing.


If your coming from a higher level language (especially one borrowing the c syntax, ex Java) the first thing you need to realize is that c is a low level language and far less forgiving then higher level languages. For example array overloads that would be caught in Java will simply execute, often subtly breaking your code and can be hard to catch if they go unnoticed. The best case is that they crash your program, the worst case your program will run but strange problems may appear as you may have overwritten data or taken data from the wrong place.


The first thing you can do is get a good bounds checker program. Their expensive but can be worth your wait in gold, it's a good idea to get one if you can afford it. They often can catch errors you didn't even now existed.


Another good thing to do, is to find out just what type of errors your compiler will catch in DEBUG mode. This will give you a good idea of what errors your compiler can and can't catch, so your not wasting time chasing down bugs your compiler would have caught if they did exist. But don't let it lull you into a false sense of security, some c compilers have practically no debug facilities built in and simply just compile your programs. So be aware that when you port programs the debugger on the new machine may be better or worse then your old one. So keep this in mind when you code, it can really help out in the long run. The best practice in my opinion is to take advantage of the debugging tools you have on hand, but write your code as if you had no such tools at your disposal.


Another common mistake is not checking return values and error functionality of library functions. This goes for standard libraries and other libraries as well. When writing your debug code be sure to respond to them all if only with a assert. As a example, always at least assert if malloc returns void. But do this for any error message a library gives. They often point out a error that would not have been obvious otherwise. This saves time in the long run and you will always know when a library complains. I will describe later when to "HARD CODE" error handlers so just be sure when you initially write a program you always at least assert library errors.


Get a memory leak handling program, either buy one, write one yourself or steal mine it's posted on my site. These are easy errors to miss as your program will still run despite the fact that it leaks. Use a tool like this from the moment you start a project until the very end and use it religiously.


Okay all of the above helps but it would be nice to have c programs run as safely as higher level languages can and as I said earlier you may have not have some of the extras described above. There's a way it's called assert and most coders don't use it enough. Any program I ship often has a ton of functions that contain more asserts then they do code. So when you start to get totally sick of writing assert 50 million times your well on your way. Why assert because you can use it as a standard c way of stating when something is wrong, not merely when your compiler thinks it's wrong. So even if you have the best debugger in the world assert will still catch ton's of logical errors even the best tools will miss.


Consider the following simple program it copies a string into a object, the code looks simple enough but it is a time bomb waiting to go off. A function written like this is the reason so many c programs crash users computers.



struct object{
char * my_string;
int string_len;
};



void Copy_String_To_object(object * object, char * string){
strcpy(object->my_string, string);
}


This function will work fine as long as it's user passes a valid string and a valid object, the string passed must be small enough to fit in the memory allocated for my_string. my_string must also contain valid memory as well. Also the char * string must contain a null terminator at the end of the string or the function will not function as intended.


So the innocent little function above is in fact a wolf in sheep's clothing, we must consider the c ethic here: That the programmer knows exactly what he is doing and has stated his intentions exactly. So as long as we are sure that all the conditions in the above paragraph have been met this function will work just fine.


The problem is that this is a lot of conditions to remember for such a small function and of course larger functions will get more complicated(A GOOD REASON TO KEEP FUNCTIONS SMALL)


Lets rewrite the function but we won't change a single line of code, so when this function is compiled in RELEASE mode it will have exactly the same effect as the old one.



void Copy_String_To_object(object * object, char * string){
assert(object != NULL);
assert(object->my_string != NULL);
assert(string != NULL);
strcpy(object->my_string, string);
}


OKAY WE KNOW HAVE 3 ASSERTS FOR ONE LINE OF CODE!!!! And were just getting started!!! Now most compilers set uninitialized pointers to NULL in debug mode so now if I passed the function a uninitialized member it will barf. By the way it's a good idea when you free memory to set the pointer to NULL then if you call this function with a pointer to that memory it will barf.


SO it's now a little safer and we've added no run time penalty to the release mode as the assert calls are stripped out.



struct object{
char * my_string;
int string_len;
};


Remember from the struct definition that the object happens to know how long the string is so lets take advantage of this fact without incurring any run time penalty.



void Copy_String_To_object(object * object, char * string){
assert(object != NULL);
assert(object->my_string != NULL);
assert(string != NULL);
assert(object->string_len <= strlen(string)); strcpy(object->my_string, string);
strcpy(object->my_string, string);
}


Now we know whenever we call this function that the objects are at least not null, if you did what I said and set all freed memory to null you know that all the objects at least contain valid memory.



assert(object->string_len <= strlen(string)); The assert above makes sure that the string being copied is small enough to fit in the object->my_string


Even sweeter if char * string contains no NULL terminator at the end, the above assert may still be thrown in some cases. Better then nothing, better still would be to use assert in another function to ensure that char * string always contains a NULL terminated string.


You must agree that the function is at least far safer then it was before. Do the same thing for all the functions in your program. Use assert whenever you know some condition can be true. For example if you know a integer will only contains values between one and five assert this to be true anytime you use the variable.


assert(object->string_len <= strlen(string));


Now I could have changed the function to handle this error at run time but I didn't. Your using c so you may have performance issues where you simply cannot afford a bound check in this routine. Remember despite what the high level language advocates say algorithms only get you so far. If you can't find a faster algorithm you must trim cycles and c and c++ are great at trimming cycles.


Now just because your trimming cycles you still want your program to function perfectly to avoid pissed off clients. What happens is and i'm not sure how to explain this is after you begin to use this programming style you begin to see that the safety of asserts bubble up within the nest of functions that make up a section of a program. You can often follow them up to find the perfect location to put the run time error handler and avoid extra checks in functions below the run time handler. If you botch it you just get a assert thrown most of the time for your trouble. This is subjective and takes some practice but often I can get programs i'm damn sure are correct with hardly any run time penalty.


Use the techniques above they are invaluable and will make your software far more stable and more secure as well(think buffer overruns!). Learning where to place run time error handlers takes some practice but becomes second nature in no time. These techniques really pay of in large complicated projects and help a lot towards verifying the correctness of programs. It's the next best thing to the impractical formal proofs which no one seems to use in practice anyways.



GOTCHA'S!!!!!!!!!!!!!




Yep the techniques described above have a few problems of their own. I find these issues pop up from time to time in my programs.


1: Assert is thrown on a correct program.


It happens, sometimes you get carried away and your correct program begins to throw asserts confusing you to no end. The best solution to this one is just to make sure that you know exactly what you want your functions to do. A well defined modular structure and well defined requirements for the project also help. Just be aware this will happen to you at some point so keep it in mind.


2: Calling program functions inside a assert.


What's the problem? Well when you create the release build the asserts are striped out, which means that you're function calls inside asserts are striped out. The implication is that you're release build will differ from your debug build.


There are two solutions, The first is simple don't do it.


But this sucks because there's times it would be damn nice and you may have some debugging functions you'd like to call during a assert to check complicated objects. What I do is only call functions with no side effects, that is I don't call functions that change data inside assert blocks, I only call functions that simply read data and don't change it. This prevents the release build problems but be SURE that the functions contain no side effects before you use them in a assert block. Double and Triple check this is a hard bug to find if you get hit with it.


I would go with option 2 as a feel I catch more errors then I create but I would wait until you get used to programming in this style before you start.

PS I WROTE THIS TUTORIAL FOR ALL OF YOU AND MYSELF, IF YOU KNOW ANY GREAT TECHNIQUES I MISSED PLEASE LEAVE ME A COMMENT AND I WILL ADD IT TO THE TUTORIAL.

More Programming Tutorials