Tuesday, May 11, 2004

Google and Semantic Categorization

Wired News: Dropping the Bomb on Google

This article from Wired discusses an ongoing controversy caused by Google's listing an anti-Semitic web site as the first response when searching for the term 'jew'. There are logical reasons for this when examined neutrally - there are probably a large number of anti-Semitic people in the world and they are probably active on the Internet because (as a medium largely devoid of censorship and restriction) it lends itself well for that purpose. Also, as the article points out (quoting the anti-Semitic site, I believe), actual Jewish people are unlikely to search on the term 'jew', probably using 'jews' or 'jewish' (or 'judaism') instead. It's a little thin, but it's at least possible.

There are solutions to this, but whether they're technologically feasible is a different matter. The ultimate solution is to categorize and display results to searches in some clustered fashion. I am envisioning a star-shaped graph with the search term in the center surrounded by circles of varying sizes which contain the title and an excerpt from the highest ranked page within each category, where larger circles correspond to higher ranked categories collectively for the term, and the largest two or three circles may contain multiple site titles and excerpts. Clicking on a title would lead to a single page, clicking elsewhere within the circle would focus the displayed results to that category (with subcategories within this). It might even be possible to find the most common phrase in sites within the category which is not present in sites in other categories and thus derive an automatic 'title' for the category.

But this all assumes that we can categorize similar results. I'm not sure how far along that technology is.


Post a Comment

Subscribe to Post Comments [Atom]

<< Home