Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

Mar/07

16

WWW::Mechanize Memory Management

With the new threaded model for Caroline, I started to notice memory usage getting out of control. Under the previous forking system, I could expect a typical run to go through 200 MB, but the new model was easily topping 1GB and then dying when the system ran out of physical memory. After some searching, I learned that Perl’s WWW::Mechanize module caches each page in memory for the lifetime of the object (this allows for the ‘back’ feature of Mechanize); cool if you’re trying to mimic a web browser, but totally unusable for a web robot like Caroline. Thankfully there is a stack_depth() method for WWW::Mechanize which controls the number of pages cached. By setting this to 2 I managed to get Caroline’s memory usage back under control. She’s off happily indexing more galleries as I type :)

RSS Feed

No comments yet.

Leave a comment!

« Site Redesign

Disk Optimizations »

Theme Design by devolux.org