Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

CAT | Hardware

EveKnows.com is 100% Linux powered. The free (as in speech!) system has proved to be absolutely perfect for our needs. It’s fast, stable, and customizable–exactly what you look for in a platform for running fresh, cutting-edge applications such as EveKnows.

One of the harder tasks I’ve had is tuning disk access. The search engine is currently running on a Debian 4.0 system with SATA hard drives. The UNIX utility top reports 10-20% IO usage (which is a good indicator of disk access) almost all the time. When I turn on the Caroline search spider, that usage spikes to 50%. At the moment this isn’t really a big deal, but as the site’s popularity continues to grow, it will eventually become a bottleneck and severely limit performance.

Thus, I’ve been trying to learn about profiling disk access on Linux systems. Maybe I’ve just been looking in the wrong places, but I haven’t been able to find any tools which can show me which applications are causing the heavy IO load. Some digging revealed that dmesg can report individual IO calls when /proc/sys/vm/block_dump is set to 1, but that raw information is essentially useless. To that end, I wrote a small Perl script which totals all of the IO statistics and displays a pretty table of results. If anyone is interested in using it themselves, the code is below.

Update: HTML tends to screw up Perl code, so copying/pasting the below code probably won’t work; if you just want to download the script for your own use, you can find it here.

#!/usr/bin/perl
#
# Copyright 2007 Aidan Trent 
# Released under the terms of the GNU GPL                                                                                                                                                                             

# Usage: SCRIPT_NAME 

No tags Hide

Jul/07

9

Hardware

One of the most frequent questions I’m asked is, “What sort of servers are required to run a porn search engine?”. Everyone seems a bit surprised by my answer: thus far, nothing special. From EveKnows.com’s testing launch in March until early July, in fact, the site ran on a single Athlon XP 2200 machine with 1GB of RAM and a standard IDE hard disk. Nothing special indeed. That configuration hasn’t had any trouble handling 60,000 page-views daily. Of course, I’m running EveKnows on a highly-tuned Linux server with custom-written software; everything has been optimized to make the most of the available resources, but the point still stands: it doesn’t take much hardware to handle the current traffic load.

Last week I finally upgraded to a dual-core Athlon 64 X2 4200 with a SATA hard drive. The new site design will go live sometime this week, marking the release of EveKnows.com 1.0. With luck, I’ll get a little bit of publicity and increase the daily traffic. The extra power is designed to insure against any possible spikes; I’d hate to get an influx of new visitors and watch the server melt under the load. At the moment, though, the load is sitting between 0.03 and 0.08. At least I can recompile kernels faster than ever before ;)

Also, I’ve added a Hardware category to this blog. I’ll keep everyone up to date on how the new server performs and how things scale as the site increases in popularity.

No tags Hide

Theme Design by devolux.org