Thumbnail Generation
This weekend someone asked how EveKnows generates thumbnails of each indexed porn gallery. The simple answer is ImageMagick, an amazingly full-featured command-line image processor. Read on for the gory details…
When Caroline, the EveKnows.com web crawler, finds a porn gallery, it downloads three images. Before anyone asks, yes, I’m only using the first of these images for the time being. The others will be showing up in a refresh of the search engine’s interface later this summer. ;)
Anyway, once Caroline verifies that is was able to download the images, ImageMagick kicks into action. Since Caroline is written in Perl, I use the Perl::Magick binding for ImageMagick, but the same concepts apply to command-line usage. First we pull the width and height from the image like so:
my ($w, $h) = $image->Get ('width', 'height');
If either $w or $h is null, then our image is invalid and we stop parsing. In order to provide high-quality search results, EveKnows does not index any galleries which block automated retrieval of images.
Now that we have the dimensions, we verify that the image is large enough to crop (we don’t want to further shrink small images like movie thumbnails), and then we subtract X pixels from both the width and the height. X depends on the size of the source images; for example, an 800×600 image would have 64 pixels cut off of each dimension, leaving us with a 736×536 photo. This removes any borders or watermarks located at the edge of the image.
Next Caroline resizes the image to that its shortest side is 110px. For our 736×536px example, this would result in a thumbnail 151×110px. The code to do this will look like
$image->Resize (geometry=>'x110', support=>0.9);
or
$image->Resize (geometry=>'110x', support=>0.9);
depending on whether the width or the height were larger.
Finally, we crop the image to a 110×110px square. If the orientation of the photo was landscape, we base our crop at the center of the image. Using our 151×110px example, would chop 20px off of the left and right sides of the image. For portrait-oriented images, we try to focus on the model’s face. This is usually in the top third of the photo, so we shift the center of the crop up by 1/3 of the image’s height.
That’s about it! We then save the resulting file and serve it up to searchers. I play a couple of tricks with contrast to attempt to make the thumbnails stand out a bit, and I also strip them of extra info to reduce file size, but both of these techniques are well-documented on the ImageMagick website. One bug I’ve noticed with the current version of Perl::Magick is that the cropping isn’t very exact, so I’ve modified Caroline to save the image to disk before we make our 110×110px crop, then using the command-line version of ImageMagick to run mogrify on the resulting file, cropping it to the desired size. Hopefully a future version of Perl::Magick will resolve this issue; rendering the image to a file twice noticeably reduces the quality of the final thumbnail.