Saturday, September 30, 2006

Personalized Web Search Using Query Expansion

This post is about a paper written by Dr. Pabitra Mitra, of IIT Kharagpur. It talks about presenting personalized search results to the users by expanding the queries based on probabilistic expansion and collaborative filtering to make the ambiguous queries more meaningful.

For example, given a search keyword ”apple”, a user might be looking for the fruit apple or might be searching for apple computers. A typical search engine returns the same results regardless of who submitted the query. Hence, the need arises to have personalized web search system which gives results relevant to the user as highly ranked pages.

Query expansion removes the ambiguity in the query given by the user by adding the words capturing user’s intention.


The proposed personalization structure consists of the following steps:

  • The user gives a query to the search engine.
  • Query expansion(personalization) based on user’s profile is done. This helps to disambiguate the meaning of the given query in order to provide relevant results to the user.
  • The expanded query is submitted to the search engine.
  • Search engine results on the expanded query are represented to user.
  • The documents retrieved by the user for the given query is marked to update the user profile.
  • Update the user profile in a offline process periodically.

The Algorithm consists of an offline phase and an online phase.

The offline phase deals with building up the user profile probabilistically. Initially Query sessions for a user are formed by expanding the existing user document space by documents which are relevant for the particular query. Similar users are found by taking the cosine similarity between the document space of different users. This phase also performs pseudo query selection and updates the profile to adapt to users interests.

The online phase deals with the real-time query expansion. The query is expanded by ‘n’ document terms whose cohesion weight for the given query is above certain threshold.

Experimental evaluation is done using precision and relative recall as performance metrics. Generally, user is interestedin top results and do not go beyond the first result page (i.e. top 10 results). So, the user is requested to submit a query and to judge the top 10 documents retrieved for the original query and the personalized query. User marks documents as relevant or irrelevant. Here, the precision is computed as ratio of the number of relevant documents marked by the user to the number of documents presented to the user, and the relative recall is computed as ratio of the number of relevant documents marked by the user to the total number of distinct relevant documents retrieved for the original query and the expanded query.

Overall, there is significant increase in the average precision values without degradation in the average relative recall. The cases where the system performed poorly or gave no improvement in the performance were:

  • user’s frequent change of interests,
  • very specific queries by users, and
  • No correlation of queries to the user’s profile learnt by the system.

Mainly, the system was able to deal with the short queries and expand them correctly, to increase information content of the query to reflect user’s intention.


The major advantages of the system are that it does not require explicit feedback, it performs collaborative filtering and user clustering, and also pseudo query term selection for better query expansion, even from sparse browsing history.

0 Comments:

Post a Comment

<< Home