Saturday, September 30, 2006

Large Scale Query Analysis for Personalization Opportunities

This is a research paper by Omid Madai of Yahoo! Research, which talks about a large scale analysis of Query logs to find opportunities for personalization. The query logs involved 1.35 million browser cookies over a period of 6 moths. The paper tries to show the following three results

  • There is significantly more history for a user of a random Query rather than a random user.
  • Users exhibit consistent topical interests that vary between users
  • User’s Clicks can reveal users’ special interests

The paper starts with

“Interacting with search engines has traditionally been an impersonal affair, with the returned results a function only of the query entered”

which is actually the problem. Search Engines have to learn to show different results for different people, depending upon the users’ interests. For this the paper proposes a two fold model :

“We define contextualization as integrating a user’s history into the results ranking. We break contextualization into two types: 1) personalization is considering a user’s long-term interests, and 2) adjustment is reacting to the user’s short-term action history. Personalization and adjustment are complementary approaches to integrating user history”.

“With adjustment, a search engine could quickly react to a user’s actions”

“integrating longterm interests could address the “cold-start” situation when a user searches after a period of inactivity, or begins a new search need”


They measure the user activity as follows

A graph is plotted between the number of queries N and the number of users doing that query. Also a graph is plotted between the number of queries and the percentage of queries coming from users performing atleast N searches. The second graph signifies taking a random query from the sample and finding the probability that the query’s users performed atleast N searches. This metric seemed more promising than the first one.


The second section of the paper focuses on finding the areas of interest of the user.

“ The idea of personalization is grounded on two related assumptions about user behavior. Our first assumption is that users have reasonably consistent interests. A user’s history will only be useful if her previous actions help us predict her future interests. This will be true if her interests are consistent over time. Our second assumption is that users have different interests from one another. After all, if everyone has the same interests there is no need for personalization!”

“Note that knowing the range of topics that a user is primarily interested can have a number of personalization oriented applications, such as reordering returned pages as well as displaying ads of interest”

A query categorizer is employed to determine the topics for a query and based on the topis the interest distribution of the user is estimated. It found that the interest distribution of a user shows a positive convergence and is different from other users.


The third section is focused on finding the information that the user clicks contain. It is noteworthy that a click on portal.acm.com reveals more information that a click on Yahoo!. Also if a user returns to a url which it clicks reveals that url contains the right information for the user rather than the possibility that the url was of no use. Combining both the above features we obtain Rare and Sticky domains. Rare in the sense that very few users click them and sticky in the sense that users return to such domains. Many of the results from above fall into a category which may be called special interest. They are not generally popular but people have a intension to return. Ti\his click information could be immediately applied for reranking the results. For Example, single click on Citeseer could dramatically improve results for the rest of your search, even without any prior user history.

Clearly the interest convergence part was for the long term personalization approach and the click information would be better suited for the short term adjustment.

0 Comments:

Post a Comment

<< Home