Performance | IO Bound | About | Home

Mascotr_harvey


Floating point performance

Simplify. If you have a mission-critical application that takes hours to run, you may be looking at it as a high-level project -- too high. Here are some suggestions.

Note: Don't depend on floating point and integer benchmark tests. The integer tests are generally larger, and they exercise memory performance. Floating point tests are relatively small, so they usually fit within the processor cache. If, for instance, you compare a processor running at 400MHz on a 66MHz bus to the same microprocessor running 400MHz on a 100MHz bus, the integer performance will appear to improve with the faster bus, but the floating point test will be little different. For the best guess of overall performance, especially if you will be using large floating point arrays, look at integer performance as an indication of memory I/O speed.

I/O Bound

Many small programs read lots of data; they are as dependent on file reading and writing (I/O) speed as processor performance. I've rewritten several 16-bit DOS Basic utilities as 32-bit command line programs in C, and found significant performance gains, even though I did nothing special in the C programs, while their Basic predecessors went to extremes to improve performance.

One simple Basic program reads a file of any size, and writes an Assembly language source code file containing DB statements for every byte. The Basic program was terribly slow, so I replaced the file I/O functions with a custom Assembly language library; performance improved about 270%.

A new test, running on a fast hard drive and a slow CPU, converts a 5.5MB DLL in 37 seconds using the optimized 16-bit Basic program, while the simple 32-bit C version completes the task in under four seconds. A brute-force method (using a look-up table instead of division) saved more than 300-million CPU clock cycles for processing (0.6 seconds on a 500MHz CPU!), but it increased the output file from 19.5MB to about 27MB (because the look-up table had fixed-length fields). Run time increased from 3.7 seconds to over five seconds, because of the time required to write the extra 7.5MB--and of course the next program that uses this data will also take longer to read a larger file. My final solution was multiple, variable-length look-up tables instead of division, which further improved performance by 20-percent, without increasing the file size.

Another important optimization is buffering, if the input and output files are on the same hard drive. Reading 64KB chunks is significantly faster than reading 8KB, while reading more than 64KB may not improve performance. The goal is to minimize disk head movement (seeking), which occurs if you constantly read a small amount of data from one file, and write processed data to another. An optimized programming library and operating system will cache some reads and writes, but a large amount of data will overwhelm the buffers if we do not minimize hard disk seeking.

This table shows averaged results for a 22MB source file. Output files are the same length, avoiding extra spaces, since we have discovered that extra spaces cost time.

TimeProgram
424 secondsUnoptimized 16-bit DOS Basic
157Optimized 16-bit DOS Basic with Assembly
18Unoptimized 32-bit C Console application
15Optimized 32-bit C Console application with tables instead of division
~13.5Optimized 32-bit C Console with optimized Assembly

Most of the performance improvement is from switching to 32-bit file I/O, not improved code. If your program reads a massive amount of data with simple calculations, consider a 32-bit Console program.

Top | Notice | Home | | © Copyright 2007 R. E. Harvey, All rights reserved.