Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

Archive for July 2007

Today marks the release of EveKnows.com 1.0, the Internet’s first Web 2.0 porn search engine! EveKnows is different than the other sex search engines–rather than searching adult sites in general, it focuses solely on free picture and movie galleries. We aim to revolutionize porn searches the way Google changed mainstream searching, with an emphasis on clean, quality results and absolutely no pay-for-placement galleries. This means you, the user, find exactly the porn you’re looking for, rather than the same, tired content from companies with large advertising budgets.

EveKnows.com utilizes the latest Web 2.0 technologies, making it a true joy to use. These include:

  • Advanced Search Algorithm: Much like Google, EveKnows tracks which sites link to each gallery, giving preference to galleries displayed on multiple popular sites.
  • Search Suggestions: Start typing a model’s name, and similar queries will automatically be displayed
  • Custom RSS Feeds: Use your web browser’s RSS reader to watch feeds of your favorite models. The feeds are updated in real-time, so you’ll always see their latest published galleries.
  • Popular Search Cloud: See what everyone else is searching for! The more popular a model’s name is, the darker the link will be.
  • Automated Indexing: Our web robot, Caroline, is crawling the Internet for fresh porn 24 hours each day. Every clean gallery she finds is added to the site in real-time, resulting in an enormous database of the absolute newest porn anywhere!
  • Standards-compliant Interface: EveKnows uses standard XHTML, CSS, and JavaScript, so it will work properly and look fantastic on a wide range of devices, from desktop PCs and Apple notebooks, to PSPs and iPhones.
  • Open Development: The Anatomy of a Search Engine blog details the development of EveKnows.com, eliciting comments and suggestions for improvements from the site’s users.

More information can be found on the About Eve section of the site, along with advanced usage instructions. TGP owners and gallery submitters are welcome to request their own sites be indexed by our web robot Caroline.

So go ahead–try it out and see what you think!

No tags Hide

Jul/07

9

Hardware

One of the most frequent questions I’m asked is, “What sort of servers are required to run a porn search engine?”. Everyone seems a bit surprised by my answer: thus far, nothing special. From EveKnows.com’s testing launch in March until early July, in fact, the site ran on a single Athlon XP 2200 machine with 1GB of RAM and a standard IDE hard disk. Nothing special indeed. That configuration hasn’t had any trouble handling 60,000 page-views daily. Of course, I’m running EveKnows on a highly-tuned Linux server with custom-written software; everything has been optimized to make the most of the available resources, but the point still stands: it doesn’t take much hardware to handle the current traffic load.

Last week I finally upgraded to a dual-core Athlon 64 X2 4200 with a SATA hard drive. The new site design will go live sometime this week, marking the release of EveKnows.com 1.0. With luck, I’ll get a little bit of publicity and increase the daily traffic. The extra power is designed to insure against any possible spikes; I’d hate to get an influx of new visitors and watch the server melt under the load. At the moment, though, the load is sitting between 0.03 and 0.08. At least I can recompile kernels faster than ever before ;)

Also, I’ve added a Hardware category to this blog. I’ll keep everyone up to date on how the new server performs and how things scale as the site increases in popularity.

No tags Hide

Jul/07

2

Thumbnail Generation

This weekend someone asked how EveKnows generates thumbnails of each indexed porn gallery. The simple answer is ImageMagick, an amazingly full-featured command-line image processor. Read on for the gory details…

When Caroline, the EveKnows.com web crawler, finds a porn gallery, it downloads three images. Before anyone asks, yes, I’m only using the first of these images for the time being. The others will be showing up in a refresh of the search engine’s interface later this summer. ;)

Anyway, once Caroline verifies that is was able to download the images, ImageMagick kicks into action. Since Caroline is written in Perl, I use the Perl::Magick binding for ImageMagick, but the same concepts apply to command-line usage. First we pull the width and height from the image like so:
my ($w, $h) = $image->Get ('width', 'height');
If either $w or $h is null, then our image is invalid and we stop parsing. In order to provide high-quality search results, EveKnows does not index any galleries which block automated retrieval of images.

Now that we have the dimensions, we verify that the image is large enough to crop (we don’t want to further shrink small images like movie thumbnails), and then we subtract X pixels from both the width and the height. X depends on the size of the source images; for example, an 800×600 image would have 64 pixels cut off of each dimension, leaving us with a 736×536 photo. This removes any borders or watermarks located at the edge of the image.

Next Caroline resizes the image to that its shortest side is 110px. For our 736×536px example, this would result in a thumbnail 151×110px. The code to do this will look like
$image->Resize (geometry=>'x110', support=>0.9);
or
$image->Resize (geometry=>'110x', support=>0.9);
depending on whether the width or the height were larger.

Finally, we crop the image to a 110×110px square. If the orientation of the photo was landscape, we base our crop at the center of the image. Using our 151×110px example, would chop 20px off of the left and right sides of the image. For portrait-oriented images, we try to focus on the model’s face. This is usually in the top third of the photo, so we shift the center of the crop up by 1/3 of the image’s height.

That’s about it! We then save the resulting file and serve it up to searchers. I play a couple of tricks with contrast to attempt to make the thumbnails stand out a bit, and I also strip them of extra info to reduce file size, but both of these techniques are well-documented on the ImageMagick website. One bug I’ve noticed with the current version of Perl::Magick is that the cropping isn’t very exact, so I’ve modified Caroline to save the image to disk before we make our 110×110px crop, then using the command-line version of ImageMagick to run mogrify on the resulting file, cropping it to the desired size. Hopefully a future version of Perl::Magick will resolve this issue; rendering the image to a file twice noticeably reduces the quality of the final thumbnail.

No tags Hide

Theme Design by devolux.org