EveKnows.com is 100% Linux powered. The free (as in speech!) system has proved to be absolutely perfect for our needs. It’s fast, stable, and customizable–exactly what you look for in a platform for running fresh, cutting-edge applications such as EveKnows.
One of the harder tasks I’ve had is tuning disk access. The search engine is currently running on a Debian 4.0 system with SATA hard drives. The UNIX utility top reports 10-20% IO usage (which is a good indicator of disk access) almost all the time. When I turn on the Caroline search spider, that usage spikes to 50%. At the moment this isn’t really a big deal, but as the site’s popularity continues to grow, it will eventually become a bottleneck and severely limit performance.
Thus, I’ve been trying to learn about profiling disk access on Linux systems. Maybe I’ve just been looking in the wrong places, but I haven’t been able to find any tools which can show me which applications are causing the heavy IO load. Some digging revealed that dmesg can report individual IO calls when /proc/sys/vm/block_dump is set to 1, but that raw information is essentially useless. To that end, I wrote a small Perl script which totals all of the IO statistics and displays a pretty table of results. If anyone is interested in using it themselves, the code is below.
Update: HTML tends to screw up Perl code, so copying/pasting the below code probably won’t work; if you just want to download the script for your own use, you can find it here.
#!/usr/bin/perl # # Copyright 2007 Aidan Trent
# Released under the terms of the GNU GPL # Usage: SCRIPT_NAME