Chapter 9
9.1 Relevant feedback and pseudo relevance feedback
Goal: involve the user in the retrieval process so as to
improve the final result set
Procedure:
The user issues a (short, simple) query.
The system returns an initial set of retrieval results.
The user marks some returned documents as relevant or
nonrelevant.
The systemcomputes a better representation of the
information need based on the user feedback.
The system displays a revised set of retrieval results.
Algorithms:
Rocchio algorithms
Probabilistic algorithms
Some search engines offer relevance feedback
How to evaluate feedback strategies
Pseudo relevance feedback
Indirect relevance feedback
Other uses of relevance feedback include:
Following a changing information need (e.g., names of car
models of interest change over time)
Maintaining an information filter (e.g., for a news feed).
Active learning (deciding which examples it is most useful
to know the class of to reduce annotation costs).
9.2 Global methods for query reformulation
Vocabulary tools for query reformulation
Users give additional input on query words or phrases
Automatic thesaurus generation by analyzing a collection of
documents: exploit word cooccuurrence and use a shallow grammatical analysis of
the text and to exploit grammatical relations or grammatical dependencies.
Improving the effectiveness of information retrieval with
local context analysis
Propose a new technique, called local context analysis, which
selects expansion terms based on cooccurrence with the query terms within the
top-ranked documents. Experiments on a number of collections, both English and
non-English, show that local context analysis offers more effective and
consistent retrieval results.
A study of methods for negative relevance feedback
Conduct a systematic study of methods for negative relevance
feedback.
Compare a set of representative negative feedback methods,
covering vector-space models and language models, as well as several special
heuristics for negative feedback. Evaluating negative feedback methods requires
a test set with sufficient difficult topics, but there are not many naturally
difficult topics in the existing test collections. We use two sampling
strategies to adapt a test collection with easy topics to evaluate negative
feedback. Experiment results on several TREC collections show that language
model based negative feedback methods are generally more effective than those
based on vector-space models, and using multiple negative models is an
effective heuristic for negative feedback.
Relevance feedback revisited
Experiments were performed at NIST to complete some of the
missing links found in using the probabilistic retrieval model. These
experiments, using the Cranfield 1400 collection, showed the importance of
query expansion in addition to query reweighting, and showed that adding as few
as 20 well-selected terms could result in performance improvements of over
100%.