Information Retrieval_INFSCI2140: February 2014

Friday, February 28, 2014

Week 8: Reading Notes

MIR Chapter 10
10.1
The human side of the information seeking process and the aspects of this process that can best be supported by the user interface.

10.2
Design principles for human-computer interaction:

Offer information feedback
Reduce working memory load
Provide alternative interfaces for novice and expert users

Notions related to information visualization:

Using icons and color highlighting
brushing and linking
panning and zooming
focus-plus-context
magic lenses
use of animation

An important aspect of human-computer interaction is the methodology for evaluation of user interface techniques.
Precision and recall measures have been widely used for comparing the ranking results of non-interactive systems.

10.3 The Information Access Process
Steps:

Start with an information need
Select a system and collections to search on
Formulate a query
Send the query to the system
Receive the results in the form of information items
Scan, evaluate, and interpret the results
Either stop, or,
Reformulate the query and go to step 4

10.4 Many lists of collection

Week 8: Muddiest Point

The Recall/ Precision graph is performed this way:

But when interpolation of it, it is an empirical face that on average as recall increases, precision decreases, so the approach is that:

So theoretically the point with smaller recall would have higher precision, why the graph do not perform in this way (with dotted line) below:

Friday, February 21, 2014

Week 7: Reading Notes

Chapter 9

9.1 Relevant feedback and pseudo relevance feedback

Goal: involve the user in the retrieval process so as to improve the final result set

Procedure:

The user issues a (short, simple) query.

The system returns an initial set of retrieval results.

The user marks some returned documents as relevant or nonrelevant.

The systemcomputes a better representation of the information need based on the user feedback.

The system displays a revised set of retrieval results.

Algorithms:

Rocchio algorithms

Probabilistic algorithms

Some search engines offer relevance feedback

How to evaluate feedback strategies

Pseudo relevance feedback

Indirect relevance feedback

Other uses of relevance feedback include:

Following a changing information need (e.g., names of car models of interest change over time)

Maintaining an information filter (e.g., for a news feed).

Active learning (deciding which examples it is most useful to know the class of to reduce annotation costs).

9.2 Global methods for query reformulation

Vocabulary tools for query reformulation

Users give additional input on query words or phrases

Automatic thesaurus generation by analyzing a collection of documents: exploit word cooccuurrence and use a shallow grammatical analysis of the text and to exploit grammatical relations or grammatical dependencies.

Improving the effectiveness of information retrieval with local context analysis

Propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.

A study of methods for negative relevance feedback

Conduct a systematic study of methods for negative relevance feedback.

Compare a set of representative negative feedback methods, covering vector-space models and language models, as well as several special heuristics for negative feedback. Evaluating negative feedback methods requires a test set with sufficient difficult topics, but there are not many naturally difficult topics in the existing test collections. We use two sampling strategies to adapt a test collection with easy topics to evaluate negative feedback. Experiment results on several TREC collections show that language model based negative feedback methods are generally more effective than those based on vector-space models, and using multiple negative models is an effective heuristic for negative feedback.

Relevance feedback revisited

Experiments were performed at NIST to complete some of the missing links found in using the probabilistic retrieval model. These experiments, using the Cranfield 1400 collection, showed the importance of query expansion in addition to query reweighting, and showed that adding as few as 20 well-selected terms could result in performance improvements of over 100%.

Thursday, February 20, 2014

Week 7: Muddiest point

For the Kappa measure for inter-judge agreement, the P(A) is to calculate the agreement proportion of actual situation, the P(E) is to get the theoretical agreement proportion by chance. What does the Kappa equation: [P(A) - P(E)]/ [1- P(E)] mean to evaluate the performance?

And if the number of judges is larger than 2, say 5, how to average the pairwise kappas? Is that (C5,2), that is, every two of them should be calculated with kappas then get their average?

Friday, February 14, 2014

Week 6: Muddiest point

In the Jelinek- Mercer Smoothing,

it uses cross-validation to set the λ (train, held-out, test).
While in the Dirichlet Prior Smoothing,

how to set the value of μ?