Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

Archive for May 2007

May/07

31

Progress Towards EveKnows 1.0

Wow, busy week! I’ve started to rebuild the primary index three times now, and each time my testing has revealed a few new bugs that needed to be fixed in the Caroline, our search spider. Last night I believe I fixed the last of these; Caroline has now been running for 24 hours without issue and has indexed 50,000 galleries. Remarkably, even with such a small data set, the new search engine is returning far better results than the current version. If this keeps up, I’ll try to get the new engine live this weekend. Excitement!

Today I spent some time updating the ‘About Eve’ section of the site. It now includes a basic usage guide for the advanced features of the new engine. Source and Site searches are also making a return, and their use is explained as well. There will also be a nifty embeddable search box for TGP/MGP owners; the box will allow their surfers to use EveKnows to search all of the porn galleries of the TGP/MGP. Let me know how it works out :)

No tags Hide

May/07

28

Upcoming Changes

It’s been a while since my last post. I wanted to let everyone know that development of EveKnows.com is progressing at a break-neck pace–this past month has seen some tremendous improvements behind the scenes. Searches are running faster and with more accurate results than ever before, and I believe the site is finally ready to handle an increased load. The new engine will remain on the staging server for another week or two while I rebuild the search database. Stay tuned for some major changes coming in June…

No tags Hide

After yesterday’s post about slow negative search terms and MySQL’s disregard for the EXCEPT operator, I came upon a decent solution for EveKnows.com’s problem. With some (slightly) clever use of LEFT JOINs, I was able to cut the running time of queries with a single negated term in half, and that run time drops by an order of magnitude for queries involving multiple negated terms. W00t! The trick was to build a temporary table of gallery IDs which contain the negated terms, then take the LEFT JOIN of galleries matching the desired terms with the temporary table. This gives us a resulting table with two columns, matches.gallery_id and neg_matches.gallery_id; any rows with a non-NULL value for neg_matches.gallery_id are then dropped, resulting in the proper set of matches. A fairly simple solution; I feel pretty dumb for not seeing it earlier.

While I was working on this, I noticed that the existing src: and site: query modifiers were not functiong properly. Due to the new SQL database schema, a quick fix isn’t possible. I’ve dropped these modifiers for the time being, but intend to support them both at some point in the future.

No tags Hide

Today I discovered a wonderful new SQL operator: EXCEPT. This neat operator allows one to join two tables, with the result being all of the rows in table1 which are not in table2. One of the slowest operations for EveKnows is handling queries with negated terms (such as ‘teens -blonde’ to search for non-blonde teen porn); this is because the SQL code includes a NOT EXISTS SQL subquery which gets run for each and every result, verifying that the galleries containing the word ‘teen’ do not also contain the word ‘blonde’. The exact code looks something like this, assuming ri1 is a table full of galleries matching the word ‘teen’:

NOT EXISTS (SELECT * FROM ReverseIndex AS ri2 WHERE ri1.ri_location=ri2.ri_location AND ri2.ri_word IN ('blonde')

That subquery is fast, but if the search matches lots of rows (say, for example, 120,000), then its execution time begins to climb upward. Right now, popular searches with negated words take 3-5 seconds to process, compared to the 0.5-second average of other searches. Obviously, something needs to be done.

I thought I found the answer in the EXCEPT operator. It seems like it would be perfect; we take the ID of rows matching our search terms in temp tabel1 and the ID of rows to be negated in temp tabel2, then take table1 EXCEPT table2 and use the resulting list of IDs as the galleries to fetch. It turns out, however, that MySQL doesn’t support EXCEPT. The oft-suggested method for getting around this is doing an exclusion self-join, but I challenge anyone to self-join a 10-million row table–it’s simply not practical.

So, it looks like I’m back to square one: still searching for a fast way to handle negated searches without migrating to a different RDBMS.

No tags Hide

May/07

7

EveKnows 0.7

Development of the EveKnows.com porn search engine is progressing swiftly! Today sees the addition of sorting search results based on either date or relevancy. Like most search engines, EveKnows assigns a score for each word on a webpage based on the number of times it occurs, the word’s location on a page, and other factors. During a search, these scores are compared and the galleries with the highest score for the queries words get displayed at the top of the search results. This is knows as sorting by relevancy; theoretically, the most relevant results will be displayed first.

Sometimes, however, people are interested in recent galleries. To that end, I’ve added the ability to sort the search results with the newest galleries first. To make use of this, click the ‘Show newest galleries first’ link above the search results. If you want to get back to the previous view, click the ‘Show most relevant galleries first’ link. When sorted by date, the quality of the search results may be significantly lower (especially with broad search terms, like ‘naked teens’, which return thousands of results). The default sorting will still be based on relevancy, but the recent sort option is now available for those who are interested.

No tags Hide

Today I updated EveKnows.com to support searching for video galleries, photo galleries, or both. This feature exists in most other porn search engines and helps bring EveKnows up to feature-parity with them. To make use of this new ability, check the desired options (either ‘Photo galleries’ or ‘Video galleries’) beneath the search box, then click ‘Ask Eve’ as usual. By default, EveKnows will search for both photo and video porn galleries.

No tags Hide

Today I hacked a new feature into EveKnows.com–the ability to see where our galleries were found. Search results now include a line of the form “X Photos from thehun.net” or “Y videos from cutegirlsdaily.com”. This let’s everyone know the site which we used to find the gallery, and provides searchers with an easy link to the source for more porn they may enjoy. The links will open in a new window, so don’t worry about following one and losing your search results.

I’ve also been tweaking the Suggestion Dictionary. Suggestions are more accurate than ever and are now offered for small search results. For example, searching for ‘lezbian’ currently returns 8 results, plus a link saying, “Perhaps you meant ‘lesbian‘?”. The suggestion dictionary is built from our own database of search results, so it will always suggest the words that appear most-frequently in porn galleries.

No tags Hide

Theme Design by devolux.org