Paul Hsieh's Programming Resources

Programming Bits

© Copyright 1996-2004 by Paul Hsieh


DOS (x86) gems Watcom C/C++ x86 Assembly Language Game Algorithms Operating Systems Pentium Programming Optimization


C C language


My programming philosophy

My personal opinions about programming.

Anybody who learns a structured programming language and some data types can program nearly anything. There is nothing special about being able to write program that performs some deterministic, calculable task. What makes a program good, is what you do beyond this. This may include things such as performance optimization, size optimization, flexibility or portability.

In my programming experience I have spent a considerable amount of time considering performance optimization. I have encountered many ideas that have shaped my general approach to speeding up my code.

hosted locally hot Programming optimization methods

What is the best programming language to start with? This is a recurring thread in the USENET programming newsgroups. While I've never done a study in this, I can give you my observations from my own experience.

hosted locally Beginner programmer recommendations.

Is Goto good or bad? It can produce spaghetti code, but sometimes it is just so irresistible.

hosted locally My thoughts on the goto issue.

Sleuth Curious case studies

hosted locally Fill byte, Fill word problem.

hosted locally Unrolling a loop in C.

hosted locally Swapping variables.

hosted locally Scanning a list for a match.

hosted locally Self replicating code

hosted locally Sorting with less than all the facts

hosted locally Assembly examples

hosted locally Bitmap intersection

hosted locally Dependency on makefile

hosted locally How to deal with buffer overflow

hosted locally Creating biased bitstreams

hosted externally The Bug of the Month (Gimpel software)

hosted locally An interesting 2s complement problem

hosted locally Obtaining User Input in C

hosted locally In-place array rotation

hosted locally Numerical flaws with C

hosted locally Sorting revisited

hosted locally Poisoned Numbers, Poisoned Branching

hosted locally Hash functions

hosted locally Misconceptions about rand()

think

Beefs du jour

The following is a list of things that have caused me difficulty for one reason or another that I feel it is worth commenting on:

  • Large, complicated, undocumented build schemes

    Often with large projects worked on by multiple teams with interdependencies the way to build all the pieces is a complicated windy series of scripts, makes ordered sub-builds, and the way to do it is subject to change and is tracked only by word of mouth.

    While there are many compromises in software development, this should not be one of them. With something like this, every place where you could get bit, you will be bit. A robust, well defined, clean, and well documented build procedure will end up saving you real development time if dealt with and evangelized.

    <Rant>

    Personally, I have a mechanism that I prefer for dealing with this. I prefer to simply mandate the following:

    rant

    Rule 1:

    The project is split into sub-projects, each with an associated containing directory. To build ANY sub-project execute a make in their containing subdirectory.

    Rule 2:

    A make in any given directory will build any contained projects, including projects contained in its subdirectories.

    Rule 3:

    There are no other rules.

    Rule 2 is a nicety that gives a sound kind of structure to any project, but its Rule 1 that I am most concerned with. That's what gives a project build robustness.

    Now the typical objection is that if the sub-projects themselves have interdependencies then a simple make in that directory will be insufficient because it requires another sub-project to have been made first. To that I answer: use recursion => if a sub-project requires another sub-project to have been made first, then it should recursively issue all dependent makes as part of its own make process (it can, of course, just issue core makes, instead of pervasively recursive makes in any non-contained directories.)

    It is incredibly amazing to me that I can find so many people who for some reason find this "too complicated" or somehow distasteful. I was forced, under extreme duress, to remove a recursive make call at a previous place of employment, the result of which had absolutely no benefit and had the obvious detracting property of breaking "my rules". The main objection from the "builders" at this previous place of employment was that its was hard to follow what was happening in the build in case that the build failed. In hindsight, I should have offered the suggestion that recursive makes be surrounded by a standard echoed message indicating the make they came from, and are returning to. But I guess I was so blinded by disgust at being forced to be a party to this monstrosity of a build process, that I could not think constructively.

    There followed an undocumented, but instituted build process that was only well defined for building the whole project, and whose sub-projects could only be build if, by hand, other sub-projects were made before it. That made things incredibly difficult for me who on several occasions had to build sub-projects that I was not involved in and had no familiarity with. The build would fail, and I would not have the foggiest clue how to proceed.

    Update: 11/29/01 It seems some people think differently. Let me just address their flawed statement of the problem:

    • It is very hard to get the order of the recursion into the sub-directories correct. This order is very unstable and frequently needs to be manually ``tweaked.'' Increasing the number of directories, or increasing the depth in the directory tree, cause this order to be increasingly unstable.

      Utter nonsense. The sub-directories for any programming project are determined by how the developers decide to break down the problem. Considerations for the intricacies of how the make sequence should work, should not be the driving factor for this kind of structure break down. The recursive make strategy does not precipitate the creation of extraneous directories.

      As to getting the order correct, I have never found this to be anything other than trivial -- a pariticular sub-component is dependent on another sub-component, so a dependency on the output of that other sub-component and a rule for making in that directory is added to your makefile. You only need to watch out for cycles (however, if you have circular dependencies, then something is flawed in the component breakdown.)

    • It is often necessary to do more than one pass over the sub-directories to build the whole system. This, naturally, leads to extended build times.

      This guy has missed the whole point of make. If a make is performed on an up to date component then it does nothing. Doing nothing typically takes no time. The check may take time, but this only becomes an issue if you have a flawed dependency checking mechanism.

    • Because the builds take so long, some dependency information is omitted, otherwise development builds take unreasonable lengths of time, and the developers are unproductive. This usually leads to things not being updated when they need to be, requiring frequent "clean" builds from scratch, to ensure everything has actually been built.

      This is all premised on increased build times which simply isn't true. "Clean builds from scratch" are only required because of a flawed mechanism for working out dependencies. I.e., if developers have a component dependency that they maintain by manually, and they screw up the order, they may end up with files in a undefined state. If the dependencies are correctly mapped, there is no issue.

    • Because inter-directory dependencies are either omitted or too hard to express, the Makefiles are often written to build too much to ensure that nothing is left out.

      Actually, in practice, this is not true. To avoid circular dependencies often very finely grained build rules (i.e., you don't build entire sub-components, but only parts of them) are required to correctly build any collection of inter-dependent sub-components.

    • Because inter-directory dependencies are either omitted or too hard to express, the Makefiles are often written to build too much to ensure that nothing is left out.

      This is completely the wrong way around as my experience assures me. When I was using recursive make, while nobody knew what was going on in the build process, it worked very well, and there were no issues as to how to get the right final output. It was also fairly speedy -- my sub-component took maybe 15 minutes to build from scratch. When I was told to take it out, it prevented developers from other groups from being able to build my components without significant assistance from my group, or the "build masters" or they could rebuild the entire project which required 100+ MBs of disk activity and 8 hours to kill (if it compiled correctly, otherwise it could take up to 36 hours to ferret out the problems.)

    </Rant>

  • A poor split of high level <=> low level interfaces

    This is actually a common technique that can be very advantageous for the purposes of code optimization. But it must be used with great caution and reluctance. Your code flexibility will turn to zero if you do this. Things like this have a hidden cost associated with them, in that it is harmful to all future development on a given code base which relies on this. Worse yet, if this sort of problem exists for no good reason, you end up paying the penalty and reaping no reward!

  • Branched development of similar projects

    Often, when developing a code base there may be reasons why the code must support multiple environments or behaviors. The logical temptation is to copy the code into a separate branch and to do independent development along such branches. There are times when this sort of thing is required, however if there is no schedule or plan for re-merging to a common polymorphic build (i.e., a solution where the computers deal with the differences) then you end up with twice as much stuff with collateral maintenance costs (i.e., a solution where the humans deal with the differences.)


books Books

There are a few good books that I have encountered about programming. Strangely enough, most are from Microsoft Press:

Writing Solid Code

A good, but very basic, book. Teaches common pitfalls, but is really meant for beginners.

Code Complete

A excellent book meant for the frustrated seasoned programmer.

librarian

Debugging the Development Process

Although I have not seen it myself, it comes highly recommended by a friend of mine. It takes a look at code production from a management level.

The Art of Computer Programming, Volume 2: Seminumerical Algorithms
by Donald Knuth

A treatment of random numbers, primes, multiplication, encryption and other fun stuff for those with a deeply mathematical bent. This book has a website.

The Zen of Code Optimization
by Michael Abrash

The first book on x86 code optimization worth a damn. This books takes you from optimizing the 8088 all the way up to the Pentium, always emphasizing sensible methodology.

Hackers Delight
by Henry S. Warren Jr.

A collection of programming tricks with an emphasis on numerical computation. This book has a website.

Other books and info

A fellow by the name of Sunir Shah maintains a list of computer books. It includes ISBN numbers, Publisher, Author as well as other information. The X2FTP site contains game programming book reviews.

Online books

Online Magazines


links Other Programming Links


computer misc Miscellaneous Stuff

Mathematics and theoretical computer science

Specifications

Components with Source Code

Grab bag

Humor/Editorials


Computer Nostalgia


home Mail me

7-Zip Valid HTML 4.0!