Archive for the ‘Development’ category

Hardware Upgrade

March 7th, 2007

Those of you testing out the initial release of EveKnows likely noticed the slow response times; the problem was not the web server but the SQL server. To keep costs low I was using a shared server but five to ten seconds to generate results for a query on only a few thousand galleries was unacceptable. This afternoon I purchased a dedicated machine and found a few ways to optimize the queries, resulting in searching in the half-second range. I imagine there is still room for improvement, but this should do for the testing period.

Last night I integrated records of incoming links into EveKnows. The spider now tracks the text of links to galleries (or alt tags in the case of images), but with reducing weighting and a character limit. This makes keyword spamming far less productive, while giving galleries without much embedded text more accurate rankings. Sadly, spamming seems rampant in the porn industry, perhaps more so than in other Internet subcultures. I’m attempting to program the spider so as to detect natural English phrases and rate them higher than lists of keywords. Feedback on the quality of search results is most welcome, as well as ideas on how to combat the constant problem of gallery spamming.

Search Terms

March 4th, 2007

After a day of messing around with SQL queries, I’ve finally got a handle on doing a logical AND search against the reverse index. Previously the search terms were OR’ed together, so searching for Liz Vicious would return any gallery that matched the word Liz and then every gallery containing the word Vicious. Turns out that there’s a lot of stuff with the word Vicious in it that has absolutely nothing to do with the sexy goth redhead I was looking for, so I knew the searching algorithm needed work. The trick is in SQL’s HAVING clause; the Eve engine does a COUNT(*) on the returned results, which are grouped by URL. The result of the COUNT(*) function is the number of matching terms; a quick HAVING COUNT(*)=$n_terms line in the SQL SELECT statement cleaned up the mess.

I also made some changes to the spider so that it pulls search terms from incoming links; this should help improve search quality, but the change means the existing gallery database is worthless. I’ve scrapped it and started spidering from scratch. In a day or two I’ll take whatever’s been spidered and move it to the production server at EveKnows.com. Stay tuned for some quality porn searches!

Indexing Progress

March 4th, 2007

The spider component of EveKnows.com has been progressing swiftly. It’s become very reliable at examining a webpage and determining whether the page is or is not a photo gallery. In the past 48 hours is indexed over 15,000 galleries, which is a decent amount to begin testing the searching and sorting algorithms on.

The user interface has been uploaded to http://eveknows.com, but the database will not be put online till its performance has been tested. You’ll notice that, unlike just about every other search engine on the Web, there is no Advertise link. There will be no paid-placement galleries on EveKnows.com. Rest assured, every search result will be a genuine, organic gallery linked to by popular TGP sites. Feel free to comment on the look of the interface and include any features you’d like to see added to the search engine.

Introducing Eve

February 28th, 2007

EveKnows.com is an experimental search engine, currently under heavy development. It is the brain-child of a Computer Science major looking for a real-life application of the mathematics and computational theory absorbed during university, with the neat addition of providing an excuse to look at boobies much of the day. This blog is intended to be a place to discuss development. Community feedback is very much welcome, whether bug reports, feature suggestions, or questions about the search engine’s architecture. This is as much a learning experience as anything else; we’re not starting out with an enterprise-class data center, just a few machines performing dedicated indexing tasks and a shared server for publishing the results.

The main focus of EveKnows.com is gallery searching. For the time being, this is limited to picture galleries; movie galleries will be added later. The idea is to create a search engine which is always up to date with the latest promotional galleries, making it incredibly easy to find the freshest content of your favorite porn stars.

For those interested, here’s what we have thus far:

  • A reverse-index engine using keyword scoring.
  • Weighted parsing includes URLs and titles for added accuracy.
  • The engine is currently written in Perl. Perl is a wonderful language which, following the engine’s design, allowed a prototype to be developed in 24 hours.
  • The search database is currently housed on a dedicated MySQL 5 server. We’ll see how this scales as the database grows…
  • A spider which searches TGPs for gallery links. When a gallery is found, it is indexed, thumb-nailed, and added to the database.

Once the prototype is working reasonably well, it will be moved to http://eveknows.com and made available for public use.