<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Anatomy of a Search Engine &#187; Hardware</title>
	<atom:link href="http://blog.eveknows.com/category/hardware/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.eveknows.com</link>
	<description>Development of the EveKnows.com adult search engine</description>
	<lastBuildDate>Thu, 15 Oct 2009 00:03:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Profiling and Debugging Linux Disk Access</title>
		<link>http://blog.eveknows.com/2007/09/27/profiling-and-debugging-linux-disk-access/</link>
		<comments>http://blog.eveknows.com/2007/09/27/profiling-and-debugging-linux-disk-access/#comments</comments>
		<pubDate>Thu, 27 Sep 2007 06:12:58 +0000</pubDate>
		<dc:creator>aidan</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://blog.eveknows.com/2007/09/27/profiling-and-debugging-linux-disk-access/</guid>
		<description><![CDATA[EveKnows.com is 100% Linux powered.  The free (as in speech!) system has proved to be absolutely perfect for our needs.  It&#8217;s fast, stable, and customizable&#8211;exactly what you look for in a platform for running fresh, cutting-edge applications such as EveKnows.
One of the harder tasks I&#8217;ve had is tuning disk access.  The search [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://eveknows.com">EveKnows.com</a> is 100% Linux powered.  The free (as in speech!) system has proved to be absolutely perfect for our needs.  It&#8217;s fast, stable, and customizable&#8211;exactly what you look for in a platform for running fresh, cutting-edge applications such as EveKnows.</p>
<p>One of the harder tasks I&#8217;ve had is tuning disk access.  The search engine is currently running on a Debian 4.0 system with SATA hard drives.  The UNIX utility <em>top</em> reports 10-20% IO usage (which is a good indicator of disk access) almost all the time.  When I turn on the Caroline search spider, that usage spikes to 50%.  At the moment this isn&#8217;t really a big deal, but as the site&#8217;s popularity continues to grow, it will eventually become a bottleneck and severely limit performance.</p>
<p>Thus, I&#8217;ve been trying to learn about profiling disk access on Linux systems.  Maybe I&#8217;ve just been looking in the wrong places, but I haven&#8217;t been able to find any tools which can show me which applications are causing the heavy IO load.  Some digging revealed that <em>dmesg</em> can report individual IO calls when <em>/proc/sys/vm/block_dump</em> is set to <em>1</em>, but that raw information is essentially useless.  To that end, I wrote a small Perl script which totals all of the IO statistics and displays a pretty table of results.  If anyone is interested in using it themselves, the code is below.</p>
<p><strong>Update:</strong> HTML tends to screw up Perl code, so copying/pasting the below code probably won&#8217;t work; if you just want to download the script for your own use, you can <a href="/files/io_stats.pl.gz">find it here</a>.</p>
<pre>
#!/usr/bin/perl
#
# Copyright 2007 Aidan Trent <aidan@eveknows.com>
# Released under the terms of the GNU GPL                                                                                                                                                                             

# Usage: SCRIPT_NAME <time>
# The optional <time> parameter tells the script how many
# minutes it should spend gathering IO statistics. The
# default is 5.                                                                                                                                                                                                       

use strict;
use warnings;

my $sleep_time = 60 * 5; # 5 minutes
if ($ARGV[0]) {
    $sleep_time = 60 * int ($ARGV[0]);
}
`echo 1 > /proc/sys/vm/block_dump`;
sleep ($sleep_time); # 5 minutes
`echo 0 > /proc/sys/vm/block_dump`;

`dmesg > /tmp/io_stats.temp`;
open (FD, '/tmp/io_stats.temp') or die;
my (%total, %read, %write, %dirtied);
while (<FD>) {
    if (/(.*)\(\d+\):\s+(dirtied|READ|WRITE)/i) {
        my $name = $1;
        my $type = $2;
        print "$name - $2\n";
        if (!$total{$name}) {
            $total{$name} = 0;
        }
        $total{$name}++;
        if (!$read{$name}) {
            $read{$name} = 0;
        }
        if ($type =~ /read/i) {
            $read{$name}++;
        }
        if (!$write{$name}) {
            $write{$name} = 0;
        }
        if ($type =~ /write/i) {
            $write{$name}++;
        }
        if (!$dirtied{$name}) {
            $dirtied{$name} = 0;
        }
        if ($type =~ /dirtied/i) {
            $dirtied{$name}++;
        }
    }
}
close (FD);

print "Name\t\tTotal\tRead\tWrite\tDirtied\n";
foreach my $key (sort {$total{$b} <=> $total{$a}} keys %total) {
    my $tab = '';
    if (length ($key) < 7) {
        $tab = "\t";
    }
    print "$key:$tab\t$total{$key}\t$read{$key}\t$write{$key}\t$dirtied{$key}\n";
}

unlink ('/tmp/io_stats.temp');
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.eveknows.com/2007/09/27/profiling-and-debugging-linux-disk-access/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hardware</title>
		<link>http://blog.eveknows.com/2007/07/09/hardware/</link>
		<comments>http://blog.eveknows.com/2007/07/09/hardware/#comments</comments>
		<pubDate>Tue, 10 Jul 2007 02:35:19 +0000</pubDate>
		<dc:creator>aidan</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Hardware]]></category>

		<guid isPermaLink="false">http://blog.eveknows.com/2007/07/09/hardware/</guid>
		<description><![CDATA[One of the most frequent questions I&#8217;m asked is, &#8220;What sort of servers are required to run a porn search engine?&#8221;.  Everyone seems a bit surprised by my answer: thus far, nothing special.  From EveKnows.com&#8217;s testing launch in March until early July, in fact, the site ran on a single Athlon XP 2200 [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most frequent questions I&#8217;m asked is, &#8220;What sort of servers are required to run <a href="http://eveknows.com">a porn search engine</a>?&#8221;.  Everyone seems a bit surprised by my answer: thus far, nothing special.  From EveKnows.com&#8217;s testing launch in March until early July, in fact, the site ran on a single Athlon XP 2200 machine with 1GB of RAM and a standard IDE hard disk.  Nothing special indeed.  That configuration hasn&#8217;t had any trouble handling 60,000 page-views daily.  Of course, I&#8217;m running EveKnows on a highly-tuned Linux server with custom-written software; everything has been optimized to make the most of the available resources, but the point still stands: it doesn&#8217;t take much hardware to handle the current traffic load.</p>
<p>Last week I finally upgraded to a dual-core Athlon 64 X2 4200 with a SATA hard drive.  The new site design will go live sometime this week, marking the release of EveKnows.com 1.0.  With luck, I&#8217;ll get a little bit of publicity and increase the daily traffic.  The extra power is designed to insure against any possible spikes; I&#8217;d hate to get an influx of new visitors and watch the server melt under the load.  At the moment, though, the load is sitting between 0.03 and 0.08.  At least I can recompile kernels faster than ever before ;)</p>
<p>Also, I&#8217;ve added a Hardware category to this blog.  I&#8217;ll keep everyone up to date on how the new server performs and how things scale as the site increases in popularity.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.eveknows.com/2007/07/09/hardware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
