CAT | Development
Not much has changed on EveKnows.com recently, but I’m not dead! I’ve been working on a number of under-the-hood improvements, the first of which is being rolled out today. Caroline, the search spider that builds the EveKnows database, has been significantly upgraded to be faster, smarter about detecting galleries, and produce better thumbnails. The new galleries Caroline uncovers will be integrated into EveKnows over the coming months, but you can get a sneak-peek at them by checking out the Latest Galleries page.
This upgrade has been a long time in coming. The previous version of Caroline didn’t scale well, so when EveKnows hit the 1,000,000 gallery mark, things started to slow down substantially. This new version should make the site much more responsive.
For those of you who submit galleries, rest assured that they are being indexed, but they won’t appear in the search results until I adjust the website to work with the new version of Caroline.
10
EveKnows 1.4 – Photo Slideshow!
0 Comments | Posted by aidan in Announcements, Development, Software
Today some very exciting features were added to everyone’s favorite porn search engine. I think the coolest by far is the new photo slideshow (more on that in a minute!), but here’s the full list so you can judge for yourself.
Quality of Results
The gallery ranking system was drastically overhauled and now offers significantly better results. A gallery’s date plays an important role in the new algorithm, so it should tend to keep fresh results near the top of the listings. It also helps to break up the tendency of ‘grouping’ search results, i.e. searching for ‘redhead’ would return lots of results from a single site, then a bunch of results from a different site, then results from a third, etc. Now the results will be mixed together much more appropriately.
More Thumbnails
New galleries will have three thumbnail images instead of only one, so you can get a better idea of whether the gallery contains the type of photo’s you’re interested in. Hover your mouse over the gallery and the thumbnail image will rotate, displaying each different photo for a few seconds. Please note that JavaScript must be enabled for this feature to work properly.
Photo Slideshow
New galleries now support a photographic slideshow of their images! Yes, that’s right: EveKnows automatically run through a complete slideshow of a gallery’s photos, all with a single click of your mouse! Not all galleries support this feature, and not all browsers are compatible with it. Notably, Safari will not be able to display most galleries as slideshows (although it does work with a few). Firefox, Internet Explorer, and Opera, however, all work properly. If a search result has an associated slideshow, you may view it by clicking the Slideshow link; galleries without slideshows will not have this link. If your web browser blocks pop-up windows, you will need to either press the ‘Play’ button to start the slideshow or turn of the pop-up blocker for EveKnows.com.
As always, feel free send me any comments or criticisms!
Today the EveKnows.com search suggestion module got an update. Previously it had been based on the words detected in galleries; while this seemed to make sense, it didn’t work out well in practice. I’m using aspell to handle the dictionary lookup, so words weren’t weighted based on their popularity. The new system still uses aspell, but it draws its dictionary from previous user queries. The basic premise here is that more people will search for the proper spelling of a porn star or phrase than for any particular misspelling, which will keep the desired results on top. If anyone notices any problems, please leave a comment explaining the issue.
Recently I made some adjustments to EveKnows, resulting in release 1.3. The changes have mostly been behind the scenes, but should lead to more accurate searches. If you notice any problems with the site, please leave a comment explaining the issue so that it can be resolved.
Also, I’ve added a Recent Searches feature, allowing you to view the 1,000 latest queries. They’re sorted chronologically, not by popularity.
Today I updated EveKnows to release 1.2. This update adds a Report Bad Gallery feature. If you find a gallery which is no longer available, inaccurately described, installs spyware, or has any other problems, please use this link to easily report the problem.
Other changes include some small user interface fixes. If you notice any problems with the site, please leave a comment explaining the issue so that it can be resolved.
27
Profiling and Debugging Linux Disk Access
0 Comments | Posted by aidan in Development, Hardware, Linux, Software
EveKnows.com is 100% Linux powered. The free (as in speech!) system has proved to be absolutely perfect for our needs. It’s fast, stable, and customizable–exactly what you look for in a platform for running fresh, cutting-edge applications such as EveKnows.
One of the harder tasks I’ve had is tuning disk access. The search engine is currently running on a Debian 4.0 system with SATA hard drives. The UNIX utility top reports 10-20% IO usage (which is a good indicator of disk access) almost all the time. When I turn on the Caroline search spider, that usage spikes to 50%. At the moment this isn’t really a big deal, but as the site’s popularity continues to grow, it will eventually become a bottleneck and severely limit performance.
Thus, I’ve been trying to learn about profiling disk access on Linux systems. Maybe I’ve just been looking in the wrong places, but I haven’t been able to find any tools which can show me which applications are causing the heavy IO load. Some digging revealed that dmesg can report individual IO calls when /proc/sys/vm/block_dump is set to 1, but that raw information is essentially useless. To that end, I wrote a small Perl script which totals all of the IO statistics and displays a pretty table of results. If anyone is interested in using it themselves, the code is below.
Update: HTML tends to screw up Perl code, so copying/pasting the below code probably won’t work; if you just want to download the script for your own use, you can find it here.
#!/usr/bin/perl # # Copyright 2007 Aidan Trent# Released under the terms of the GNU GPL # Usage: SCRIPT_NAME
Today EveKnows was upgraded to version 1.1. Changes include:
- Pop-up menus to allow more data to displayed for each search result
- Find Similar Sets command, which executes a search for similar galleries
- View More from this Site command, which displays every gallery in the EveKnows database from the same site
- Miscellaneous user interface fixes
The EveKnows.com database is now approaching 600,000 unique porn galleries. The vast majority of these have been pulled by our own web spider, Caroline. I’d like to invite every TGP/MGP owner to submit their own sites for indexing. Within one week your galleries will begin to show up in EveKnows.com’s search results, resulting in more traffic for your site. Caroline will also regularly re-index your site to pick up any new galleries you’ve posted.
One of the most frequent questions I’m asked is, “What sort of servers are required to run a porn search engine?”. Everyone seems a bit surprised by my answer: thus far, nothing special. From EveKnows.com’s testing launch in March until early July, in fact, the site ran on a single Athlon XP 2200 machine with 1GB of RAM and a standard IDE hard disk. Nothing special indeed. That configuration hasn’t had any trouble handling 60,000 page-views daily. Of course, I’m running EveKnows on a highly-tuned Linux server with custom-written software; everything has been optimized to make the most of the available resources, but the point still stands: it doesn’t take much hardware to handle the current traffic load.
Last week I finally upgraded to a dual-core Athlon 64 X2 4200 with a SATA hard drive. The new site design will go live sometime this week, marking the release of EveKnows.com 1.0. With luck, I’ll get a little bit of publicity and increase the daily traffic. The extra power is designed to insure against any possible spikes; I’d hate to get an influx of new visitors and watch the server melt under the load. At the moment, though, the load is sitting between 0.03 and 0.08. At least I can recompile kernels faster than ever before ;)
Also, I’ve added a Hardware category to this blog. I’ll keep everyone up to date on how the new server performs and how things scale as the site increases in popularity.
This weekend someone asked how EveKnows generates thumbnails of each indexed porn gallery. The simple answer is ImageMagick, an amazingly full-featured command-line image processor. Read on for the gory details…
When Caroline, the EveKnows.com web crawler, finds a porn gallery, it downloads three images. Before anyone asks, yes, I’m only using the first of these images for the time being. The others will be showing up in a refresh of the search engine’s interface later this summer. ;)
Anyway, once Caroline verifies that is was able to download the images, ImageMagick kicks into action. Since Caroline is written in Perl, I use the Perl::Magick binding for ImageMagick, but the same concepts apply to command-line usage. First we pull the width and height from the image like so:
my ($w, $h) = $image->Get ('width', 'height');
If either $w or $h is null, then our image is invalid and we stop parsing. In order to provide high-quality search results, EveKnows does not index any galleries which block automated retrieval of images.
Now that we have the dimensions, we verify that the image is large enough to crop (we don’t want to further shrink small images like movie thumbnails), and then we subtract X pixels from both the width and the height. X depends on the size of the source images; for example, an 800×600 image would have 64 pixels cut off of each dimension, leaving us with a 736×536 photo. This removes any borders or watermarks located at the edge of the image.
Next Caroline resizes the image to that its shortest side is 110px. For our 736×536px example, this would result in a thumbnail 151×110px. The code to do this will look like
$image->Resize (geometry=>'x110', support=>0.9);
or
$image->Resize (geometry=>'110x', support=>0.9);
depending on whether the width or the height were larger.
Finally, we crop the image to a 110×110px square. If the orientation of the photo was landscape, we base our crop at the center of the image. Using our 151×110px example, would chop 20px off of the left and right sides of the image. For portrait-oriented images, we try to focus on the model’s face. This is usually in the top third of the photo, so we shift the center of the crop up by 1/3 of the image’s height.
That’s about it! We then save the resulting file and serve it up to searchers. I play a couple of tricks with contrast to attempt to make the thumbnails stand out a bit, and I also strip them of extra info to reduce file size, but both of these techniques are well-documented on the ImageMagick website. One bug I’ve noticed with the current version of Perl::Magick is that the cropping isn’t very exact, so I’ve modified Caroline to save the image to disk before we make our 110×110px crop, then using the command-line version of ImageMagick to run mogrify on the resulting file, cropping it to the desired size. Hopefully a future version of Perl::Magick will resolve this issue; rendering the image to a file twice noticeably reduces the quality of the final thumbnail.
