The EveKnows.com database is now approaching 600,000 unique porn galleries. The vast majority of these have been pulled by our own web spider, Caroline. I’d like to invite every TGP/MGP owner to submit their own sites for indexing. Within one week your galleries will begin to show up in EveKnows.com’s search results, resulting in more traffic for your site. Caroline will also regularly re-index your site to pick up any new galleries you’ve posted.
Today marks the release of EveKnows.com 1.0, the Internet’s first Web 2.0 porn search engine! EveKnows is different than the other sex search engines–rather than searching adult sites in general, it focuses solely on free picture and movie galleries. We aim to revolutionize porn searches the way Google changed mainstream searching, with an emphasis on clean, quality results and absolutely no pay-for-placement galleries. This means you, the user, find exactly the porn you’re looking for, rather than the same, tired content from companies with large advertising budgets.
EveKnows.com utilizes the latest Web 2.0 technologies, making it a true joy to use. These include:
- Advanced Search Algorithm: Much like Google, EveKnows tracks which sites link to each gallery, giving preference to galleries displayed on multiple popular sites.
- Search Suggestions: Start typing a model’s name, and similar queries will automatically be displayed
- Custom RSS Feeds: Use your web browser’s RSS reader to watch feeds of your favorite models. The feeds are updated in real-time, so you’ll always see their latest published galleries.
- Popular Search Cloud: See what everyone else is searching for! The more popular a model’s name is, the darker the link will be.
- Automated Indexing: Our web robot, Caroline, is crawling the Internet for fresh porn 24 hours each day. Every clean gallery she finds is added to the site in real-time, resulting in an enormous database of the absolute newest porn anywhere!
- Standards-compliant Interface: EveKnows uses standard XHTML, CSS, and JavaScript, so it will work properly and look fantastic on a wide range of devices, from desktop PCs and Apple notebooks, to PSPs and iPhones.
- Open Development: The Anatomy of a Search Engine blog details the development of EveKnows.com, eliciting comments and suggestions for improvements from the site’s users.
More information can be found on the About Eve section of the site, along with advanced usage instructions. TGP owners and gallery submitters are welcome to request their own sites be indexed by our web robot Caroline.
So go ahead–try it out and see what you think!
One of the most frequent questions I’m asked is, “What sort of servers are required to run a porn search engine?”. Everyone seems a bit surprised by my answer: thus far, nothing special. From EveKnows.com’s testing launch in March until early July, in fact, the site ran on a single Athlon XP 2200 machine with 1GB of RAM and a standard IDE hard disk. Nothing special indeed. That configuration hasn’t had any trouble handling 60,000 page-views daily. Of course, I’m running EveKnows on a highly-tuned Linux server with custom-written software; everything has been optimized to make the most of the available resources, but the point still stands: it doesn’t take much hardware to handle the current traffic load.
Last week I finally upgraded to a dual-core Athlon 64 X2 4200 with a SATA hard drive. The new site design will go live sometime this week, marking the release of EveKnows.com 1.0. With luck, I’ll get a little bit of publicity and increase the daily traffic. The extra power is designed to insure against any possible spikes; I’d hate to get an influx of new visitors and watch the server melt under the load. At the moment, though, the load is sitting between 0.03 and 0.08. At least I can recompile kernels faster than ever before ;)
Also, I’ve added a Hardware category to this blog. I’ll keep everyone up to date on how the new server performs and how things scale as the site increases in popularity.
This weekend someone asked how EveKnows generates thumbnails of each indexed porn gallery. The simple answer is ImageMagick, an amazingly full-featured command-line image processor. Read on for the gory details…
When Caroline, the EveKnows.com web crawler, finds a porn gallery, it downloads three images. Before anyone asks, yes, I’m only using the first of these images for the time being. The others will be showing up in a refresh of the search engine’s interface later this summer. ;)
Anyway, once Caroline verifies that is was able to download the images, ImageMagick kicks into action. Since Caroline is written in Perl, I use the Perl::Magick binding for ImageMagick, but the same concepts apply to command-line usage. First we pull the width and height from the image like so:
my ($w, $h) = $image->Get ('width', 'height');
If either $w or $h is null, then our image is invalid and we stop parsing. In order to provide high-quality search results, EveKnows does not index any galleries which block automated retrieval of images.
Now that we have the dimensions, we verify that the image is large enough to crop (we don’t want to further shrink small images like movie thumbnails), and then we subtract X pixels from both the width and the height. X depends on the size of the source images; for example, an 800×600 image would have 64 pixels cut off of each dimension, leaving us with a 736×536 photo. This removes any borders or watermarks located at the edge of the image.
Next Caroline resizes the image to that its shortest side is 110px. For our 736×536px example, this would result in a thumbnail 151×110px. The code to do this will look like
$image->Resize (geometry=>'x110', support=>0.9);
or
$image->Resize (geometry=>'110x', support=>0.9);
depending on whether the width or the height were larger.
Finally, we crop the image to a 110×110px square. If the orientation of the photo was landscape, we base our crop at the center of the image. Using our 151×110px example, would chop 20px off of the left and right sides of the image. For portrait-oriented images, we try to focus on the model’s face. This is usually in the top third of the photo, so we shift the center of the crop up by 1/3 of the image’s height.
That’s about it! We then save the resulting file and serve it up to searchers. I play a couple of tricks with contrast to attempt to make the thumbnails stand out a bit, and I also strip them of extra info to reduce file size, but both of these techniques are well-documented on the ImageMagick website. One bug I’ve noticed with the current version of Perl::Magick is that the cropping isn’t very exact, so I’ve modified Caroline to save the image to disk before we make our 110×110px crop, then using the command-line version of ImageMagick to run mogrify on the resulting file, cropping it to the desired size. Hopefully a future version of Perl::Magick will resolve this issue; rendering the image to a file twice noticeably reduces the quality of the final thumbnail.
I’m putting together a roadmap of ideas for the future development of EveKnows.com, and I want your help. Please feel free to comment with any feature suggestions. Currently, my thinking is as follows:
- Continue to build up gallery database: I’m still shooting for 1,000,000+ indexed porn galleries by the end of summer
- Quality control: I’d like to add an easy method for people to report broken/mislabeled galleries
- Easier RSS access: There should be an easier way for people to get the RSS links for their favorite searches. Maybe I could also build in an RSS aggregator so that all of a user’s queries could be viewed from a single page.
- Interface redesign: I’m obviously no graphics artist; the look of EveKnows needs a lot of work.
Well, that’s what I intend to be working on next. Let me know what else you’d like to see!
So last weekend the eveknows.com server lost a hard drive. I had a recent backup, so not much was lost. It did, however, set us back about 20,000 porn galleries which had been indexed between the time of the backup and the moment the drive died. While I was getting it back online, I noticed a number of duplicate galleries had crept into the database, so I cleaned those out as well. That knocked us down another 30,000 or so. The good news is that the server is now back online with a faster drive system and no known duplicate galleries. The Caroline spider is back at work and has already brought us up to 270,000 pages of sexy photos and videos. She seems to be averaging about 8-10,000 new galleries each day, so by the end of June we should be nearing the half-million mark. I’m also looking into getting a mirrored RAID system for the drives, to prevent this sort of problem going forward.
One thing that I did lose in the crash was a blog post about a new feature, the EveKnows.com Search Plugin. Firefox 2 and IE 7 users can now add EveKnows to the list of search providers on their browser’s integrated search bar. Pretty cool, right? :)
Today I migrated EveKnows.com from mod_cgi to mod_perl and switched from using Perl to directly output HTML to using the HTML::Template module to handle formatting. The move to HTML::Template has been a long time in coming–I’m a big fan of separating logic from presentation, and this finally allows me to do that with EveKnows. Editing the HTML was getting messy since it was all embedded in the Perl code, and small changes on my staging server were breaking programming logic. Now the search engine is broken up into a Perl back-end and a simple HTML front-end, which should allow me to easily upgrade the interface as the engine matures.
While I was researching HTML::Template, I came across some benchmarks showing the benefits of using Apache’s mod_perl rather than running regular CGI processes, which is how EveKnows was originally designed. Besides cool advantages such as shared memory caching, mod_perl is generally an order of magnitude faster at rendering an HTML file from a Perl source script. EveKnows.com didn’t get *that* much of a performance improvement from the migration (most of its processing time is spent searching and sorting data), but it did increase from being able to serve ~2 requests/second to ~10/second, which is a pretty hefty increase. For anyone attempting a similar migration, I’d like to point out a couple of issues I ran into.
First, Apache needs to be told to use mod_perl rather than mod_cgi. This means editing your httpd.conf file (or virtual host file if you use those) and adding the following section:
SetHandler perl-script
PerlResponseHandler ModPerl::Registry
PerlOptions +ParseHeaders
Options ExecCGI FollowSymLinks
Restart Apache and any scripts ending in .cgi will now be processed by mod_perl.
The second problem stemmed from CGI::Simple. For whatever reason, this module does not play nice with mod_perl. I had to switch to the regular CGI module to get things back to normal.
Anyway, once these fixes were in place, EveKnows.com was working faster and cleaner than ever!
This week marks the release of EveKnows.com 1.0 Beta. The past three months have seen a crazy pace of development, and I believe the results speak for themselves:
Wicked-fast searching of porn galleries? Check.
Incredibly accurate sex search results? Check.
Custom RSS feeds for every set of search results? Check.
Predictive suggestions when entering search terms? Check.
Suggestions of more popular words for searches with few results? Check.
Integration with TGPs and MGPs for site searching? Check.
An awesome Web 2.0 tag cloud of popular search terms? Check!
Since the site interface seems to be under control, I’m focusing on building up a giant database of porn pictures and movies to search through. Its nearing 200,000 now and growing daily–my goal is to get 1,000,000 searchable galleries in the database in the next two weeks.
Anyway, check it out at http://eveknows.com and share the site with your friends :)
EveKnows.com is currently being upgraded to version 1.0-beta. The site may not be 100% available for the next few hours, but I’ll try to keep it working while the update is in progress.
Wow, busy week! I’ve started to rebuild the primary index three times now, and each time my testing has revealed a few new bugs that needed to be fixed in the Caroline, our search spider. Last night I believe I fixed the last of these; Caroline has now been running for 24 hours without issue and has indexed 50,000 galleries. Remarkably, even with such a small data set, the new search engine is returning far better results than the current version. If this keeps up, I’ll try to get the new engine live this weekend. Excitement!
Today I spent some time updating the ‘About Eve’ section of the site. It now includes a basic usage guide for the advanced features of the new engine. Source and Site searches are also making a return, and their use is explained as well. There will also be a nifty embeddable search box for TGP/MGP owners; the box will allow their surfers to use EveKnows to search all of the porn galleries of the TGP/MGP. Let me know how it works out :)
