There are several web based search engines which facilitate searching information over the internet.
However, usually the search results in lengthy lists of links to the web pages that the user needs to
traverse to get information. Moreover, user might visit more than one search engines to get relevant
results. This makes the search time consuming. This project aims to reduce the search effort by reducing
the overhead of manual filtration of search results.
The project aims to get implicit feedback (blind feedback) i.e. We present the search results to the user
and monitor which links user clicks on. We refine the search results based on the links that the user
traverses. In order to enhance the search results, the proposed system incorporates techniques like
query expansion. I have a working implementation of the initial idea as described in the next section.
The current system implementation provides a web
interface to specify search query. The search query is relayed to 2 search engines, namely Google and
Bing. The search results acquired from these search engines are ranked and presented to the user.
Figure 1 illustrates the operation of the meta search system.
Once the results are displayed, the user opens the webpages present in the result. The pages that a user
opens for a specific query are recorded. Once all relevant web pages are visited, the system can improve
the search results when improve button is clicked.
During improvement of the search results, all relevant pages (the ones opened by the user) are
considered for query expansion. The expanded query is used as a search query and the search results
are provided to the user. The current system processes web pages in real-time i.e. all processing is
performed on the fly when a user submits a query. Moreover, the system is only functional for web
pages in English.
The efficiency of the system, in general, can be increased. In this section, possibilities of increasing the system efficiency in both diverse and focused mode are discussed. The diversity of the system can be increased, for a given query, by computing all possible topics that this query may have and then sending a query for each topic. This requires the system to have a dictionary/disambiguation list such that provided a word all possible topics that a word may have are listed.
In the focused mode, the system efficiency is highly dependent on the content and size of the pages opened by a user (relevant web pages). For example, many webpages have images and videos which are ignored currently. Images and videos can be utilized in determining the keywords by incorporating name, anchor text and tags associated with an image or a video. During query expansion, it is possible to use the synonyms when computing the expansion terms. Every synonym for a word may provide a different set of results.