Thursday, May 24, 2007

The Case of the Horrible Include Dependencies

One of my favorite programming books is Working Effectively with Legacy Code written by Michael Feathers. The book provides an arsenal of techniques to refactor legacy code and get it into a test harness.

One specific situation the author addresses is testing a C++ class that has a large number of include dependencies. Our product has many classes that suffer from this problem. Our main library consists of over 600 source (.cpp) files and changes to a single header file often result in a large number of files having to be recompiled. A full rebuild takes over thirty minutes and is a productivity and motivation killer. I've decided to attack this problem at its root and try to eliminate as many of the dependencies as possible. At the very least, I will reduce the number of times a full-rebuild is necessary and will hopefully see a noticeable reduction in compilation time in the future.

I started off by writing three Ruby scripts to collect data about the include dependencies for our product. The first script traverses our source tree, inserting the following line into each header file after the header guard:

#pragma message("including __FILE__")

The second script parses the compilation output, tabulating the number of times each header file is included and how many header files each .cpp file includes.

The final script is the inverse of the first and removes all of the #pragma directives.

The scripts were easy to write and they output the exact information I want, so I don't think a code analyzer tool would have saved me much time. The main downside to my approach is that I have to do a full rebuild anytime I want to get all of the current data. But since this project is something I'm doing on the side, rapid feedback isn't that important. I can just kick off the build and come back to it when I have some down time.

Since beginning this project two weeks ago, I have been able to significantly reduce the include dependencies on about half a dozen header files. For example, one header file that was being included 455 times is now included only eighteen times. That is quite a time saver the next time someone changes that header file.

It's very motivating to see the include dependencies shrink after each small improvement I make. In his book, Michael Feathers talks about situations where programmers feel the code the work on is beyond repair. He points out that little changes here and there begin to add up, and you slowly realize the situation isn't as bad as it seems. I think the small victories I've had so far point to continued success and will result in a much better code base and a much more productive team.