Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

Mar/07

18

Disk Optimizations

So with Caroline running full-bore, I noticed a frightening thing–the EveKnows.com server was bottle-necked by disk IO. Between the spider downloading thumbnails and inserting galleries and SQL server fetching them and the Apache server handing out web pages, my machine was slowing to a crawl. Since the search database is far too large to fit into memory, each search ends up hitting the disk several times. Couple that with a constant stream of SQL INSERT statements and you get a MySQL daemon that’s constantly waiting for the hard disk to give it what it needs, and an idle CPU waiting for something to do.

After some researching, I discovered a few tricks to relieve the congestion. First, I used hdparm to enable multiple sector counts (16) and write-back caching on my disk. This alone made a huge improvement. Then, I tuned Caroline to make far fewer disk hits; it had been logging data, but I don’t really need the log so I turned that off, and tweaked the way that it opens the stop_list file to do it once per thread rather than once per gallery (which was something I should have done from the beginning, but hey, this is the first search engine I’ve ever written!) Finally, I setup two databases on the SQL server, one to use for the search engine and one for the spider to insert new galleries into. They are identical, save that the spider’s database doesn’t have any indexes, which creates far fewer disk writes when adding galleries. Then I setup a cron job to, once per day, replace the search database with the recently updated version from Caroline, add in the indexes needed to keep the searches effecient, and then go back to spidering.

Thus far, the plan seems to be working well. ‘top’ no longer shows a WA% of 90, and searches are completing in less than one second again while Caroline is running. Woohoo!

RSS Feed

No comments yet.

Leave a comment!

« WWW::Mechanize Memory Management

Search Result Weighting »

Theme Design by devolux.org