Anatomy of a Search Engine | Development of the EveKnows.com adult search engine

May/07

8

MySQL and the EXCEPT Operator

Today I discovered a wonderful new SQL operator: EXCEPT. This neat operator allows one to join two tables, with the result being all of the rows in table1 which are not in table2. One of the slowest operations for EveKnows is handling queries with negated terms (such as ‘teens -blonde’ to search for non-blonde teen porn); this is because the SQL code includes a NOT EXISTS SQL subquery which gets run for each and every result, verifying that the galleries containing the word ‘teen’ do not also contain the word ‘blonde’. The exact code looks something like this, assuming ri1 is a table full of galleries matching the word ‘teen’:

NOT EXISTS (SELECT * FROM ReverseIndex AS ri2 WHERE ri1.ri_location=ri2.ri_location AND ri2.ri_word IN ('blonde')

That subquery is fast, but if the search matches lots of rows (say, for example, 120,000), then its execution time begins to climb upward. Right now, popular searches with negated words take 3-5 seconds to process, compared to the 0.5-second average of other searches. Obviously, something needs to be done.

I thought I found the answer in the EXCEPT operator. It seems like it would be perfect; we take the ID of rows matching our search terms in temp tabel1 and the ID of rows to be negated in temp tabel2, then take table1 EXCEPT table2 and use the resulting list of IDs as the galleries to fetch. It turns out, however, that MySQL doesn’t support EXCEPT. The oft-suggested method for getting around this is doing an exclusion self-join, but I challenge anyone to self-join a 10-million row table–it’s simply not practical.

So, it looks like I’m back to square one: still searching for a fast way to handle negated searches without migrating to a different RDBMS.

RSS Feed

No comments yet.

Leave a comment!

«

»

Theme Design by devolux.org