Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

CAT | Development

Jun/07

19

EveKnows.com Roadmap

I’m putting together a roadmap of ideas for the future development of EveKnows.com, and I want your help. Please feel free to comment with any feature suggestions. Currently, my thinking is as follows:

  • Continue to build up gallery database: I’m still shooting for 1,000,000+ indexed porn galleries by the end of summer
  • Quality control: I’d like to add an easy method for people to report broken/mislabeled galleries
  • Easier RSS access: There should be an easier way for people to get the RSS links for their favorite searches. Maybe I could also build in an RSS aggregator so that all of a user’s queries could be viewed from a single page.
  • Interface redesign: I’m obviously no graphics artist; the look of EveKnows needs a lot of work.

Well, that’s what I intend to be working on next. Let me know what else you’d like to see!

No tags Hide

Today I migrated EveKnows.com from mod_cgi to mod_perl and switched from using Perl to directly output HTML to using the HTML::Template module to handle formatting. The move to HTML::Template has been a long time in coming–I’m a big fan of separating logic from presentation, and this finally allows me to do that with EveKnows. Editing the HTML was getting messy since it was all embedded in the Perl code, and small changes on my staging server were breaking programming logic. Now the search engine is broken up into a Perl back-end and a simple HTML front-end, which should allow me to easily upgrade the interface as the engine matures.

While I was researching HTML::Template, I came across some benchmarks showing the benefits of using Apache’s mod_perl rather than running regular CGI processes, which is how EveKnows was originally designed. Besides cool advantages such as shared memory caching, mod_perl is generally an order of magnitude faster at rendering an HTML file from a Perl source script. EveKnows.com didn’t get *that* much of a performance improvement from the migration (most of its processing time is spent searching and sorting data), but it did increase from being able to serve ~2 requests/second to ~10/second, which is a pretty hefty increase. For anyone attempting a similar migration, I’d like to point out a couple of issues I ran into.

First, Apache needs to be told to use mod_perl rather than mod_cgi. This means editing your httpd.conf file (or virtual host file if you use those) and adding the following section:

SetHandler perl-script
PerlResponseHandler ModPerl::Registry
PerlOptions +ParseHeaders
Options ExecCGI FollowSymLinks

Restart Apache and any scripts ending in .cgi will now be processed by mod_perl.

The second problem stemmed from CGI::Simple. For whatever reason, this module does not play nice with mod_perl. I had to switch to the regular CGI module to get things back to normal.

Anyway, once these fixes were in place, EveKnows.com was working faster and cleaner than ever!

No tags Hide

Jun/07

5

EveKnows 1.0 Beta

This week marks the release of EveKnows.com 1.0 Beta. The past three months have seen a crazy pace of development, and I believe the results speak for themselves:

Wicked-fast searching of porn galleries? Check.
Incredibly accurate sex search results? Check.
Custom RSS feeds for every set of search results? Check.
Predictive suggestions when entering search terms? Check.
Suggestions of more popular words for searches with few results? Check.
Integration with TGPs and MGPs for site searching? Check.
An awesome Web 2.0 tag cloud of popular search terms? Check!

Since the site interface seems to be under control, I’m focusing on building up a giant database of porn pictures and movies to search through. Its nearing 200,000 now and growing daily–my goal is to get 1,000,000 searchable galleries in the database in the next two weeks.

Anyway, check it out at http://eveknows.com and share the site with your friends :)

No tags Hide

Jun/07

3

Update in Progress…

EveKnows.com is currently being upgraded to version 1.0-beta. The site may not be 100% available for the next few hours, but I’ll try to keep it working while the update is in progress.

No tags Hide

May/07

31

Progress Towards EveKnows 1.0

Wow, busy week! I’ve started to rebuild the primary index three times now, and each time my testing has revealed a few new bugs that needed to be fixed in the Caroline, our search spider. Last night I believe I fixed the last of these; Caroline has now been running for 24 hours without issue and has indexed 50,000 galleries. Remarkably, even with such a small data set, the new search engine is returning far better results than the current version. If this keeps up, I’ll try to get the new engine live this weekend. Excitement!

Today I spent some time updating the ‘About Eve’ section of the site. It now includes a basic usage guide for the advanced features of the new engine. Source and Site searches are also making a return, and their use is explained as well. There will also be a nifty embeddable search box for TGP/MGP owners; the box will allow their surfers to use EveKnows to search all of the porn galleries of the TGP/MGP. Let me know how it works out :)

No tags Hide

May/07

28

Upcoming Changes

It’s been a while since my last post. I wanted to let everyone know that development of EveKnows.com is progressing at a break-neck pace–this past month has seen some tremendous improvements behind the scenes. Searches are running faster and with more accurate results than ever before, and I believe the site is finally ready to handle an increased load. The new engine will remain on the staging server for another week or two while I rebuild the search database. Stay tuned for some major changes coming in June…

No tags Hide

After yesterday’s post about slow negative search terms and MySQL’s disregard for the EXCEPT operator, I came upon a decent solution for EveKnows.com’s problem. With some (slightly) clever use of LEFT JOINs, I was able to cut the running time of queries with a single negated term in half, and that run time drops by an order of magnitude for queries involving multiple negated terms. W00t! The trick was to build a temporary table of gallery IDs which contain the negated terms, then take the LEFT JOIN of galleries matching the desired terms with the temporary table. This gives us a resulting table with two columns, matches.gallery_id and neg_matches.gallery_id; any rows with a non-NULL value for neg_matches.gallery_id are then dropped, resulting in the proper set of matches. A fairly simple solution; I feel pretty dumb for not seeing it earlier.

While I was working on this, I noticed that the existing src: and site: query modifiers were not functiong properly. Due to the new SQL database schema, a quick fix isn’t possible. I’ve dropped these modifiers for the time being, but intend to support them both at some point in the future.

No tags Hide

Today I discovered a wonderful new SQL operator: EXCEPT. This neat operator allows one to join two tables, with the result being all of the rows in table1 which are not in table2. One of the slowest operations for EveKnows is handling queries with negated terms (such as ‘teens -blonde’ to search for non-blonde teen porn); this is because the SQL code includes a NOT EXISTS SQL subquery which gets run for each and every result, verifying that the galleries containing the word ‘teen’ do not also contain the word ‘blonde’. The exact code looks something like this, assuming ri1 is a table full of galleries matching the word ‘teen’:

NOT EXISTS (SELECT * FROM ReverseIndex AS ri2 WHERE ri1.ri_location=ri2.ri_location AND ri2.ri_word IN ('blonde')

That subquery is fast, but if the search matches lots of rows (say, for example, 120,000), then its execution time begins to climb upward. Right now, popular searches with negated words take 3-5 seconds to process, compared to the 0.5-second average of other searches. Obviously, something needs to be done.

I thought I found the answer in the EXCEPT operator. It seems like it would be perfect; we take the ID of rows matching our search terms in temp tabel1 and the ID of rows to be negated in temp tabel2, then take table1 EXCEPT table2 and use the resulting list of IDs as the galleries to fetch. It turns out, however, that MySQL doesn’t support EXCEPT. The oft-suggested method for getting around this is doing an exclusion self-join, but I challenge anyone to self-join a 10-million row table–it’s simply not practical.

So, it looks like I’m back to square one: still searching for a fast way to handle negated searches without migrating to a different RDBMS.

No tags Hide

May/07

7

EveKnows 0.7

Development of the EveKnows.com porn search engine is progressing swiftly! Today sees the addition of sorting search results based on either date or relevancy. Like most search engines, EveKnows assigns a score for each word on a webpage based on the number of times it occurs, the word’s location on a page, and other factors. During a search, these scores are compared and the galleries with the highest score for the queries words get displayed at the top of the search results. This is knows as sorting by relevancy; theoretically, the most relevant results will be displayed first.

Sometimes, however, people are interested in recent galleries. To that end, I’ve added the ability to sort the search results with the newest galleries first. To make use of this, click the ‘Show newest galleries first’ link above the search results. If you want to get back to the previous view, click the ‘Show most relevant galleries first’ link. When sorted by date, the quality of the search results may be significantly lower (especially with broad search terms, like ‘naked teens’, which return thousands of results). The default sorting will still be based on relevancy, but the recent sort option is now available for those who are interested.

No tags Hide

Today I updated EveKnows.com to support searching for video galleries, photo galleries, or both. This feature exists in most other porn search engines and helps bring EveKnows up to feature-parity with them. To make use of this new ability, check the desired options (either ‘Photo galleries’ or ‘Video galleries’) beneath the search box, then click ‘Ask Eve’ as usual. By default, EveKnows will search for both photo and video porn galleries.

No tags Hide

« Previous Page« Previous Entries

Next Entries »Next Page »

Theme Design by devolux.org