Friday, February 14, 2014

Week 6: Reading Notes

Chapter 8
8.1
Effectiveness of IR system: consider


  • A document collection
  • A test suite of information needs, expressible as queries
  • A set of relevance judgments, standardly a binary assessment of either relevant or nonrelevant for each query-document pair ( refer to gold standard or ground truth judgment of relevance)



8.2
The test collections that are most often used for this purpose:


  • The Cranfield collection
  • Text Retrieval Conference (TREC)
  • NII Text Collections for IR Systems (NTCIR)
  •             Cross Language Evaluation Forum (CLEF)
  •  Reuters-21578 and Reuters-RCV1
  •       20 Newsgroups



8.3
The straightforward notion of relevant and nonrelevant documents and the formal evaluation methodolody that has been developed for evaluating unranked retrieval results.
Precision and Recall

F measure: the weighted harmonic mean of precision and recall



8.4
Develop measures to evaluate ranked retrieval results
The MAP value for a test collection is the arithmetic mean of average precision values for individual information needs.4


Alternative problem is R-precision.
Another concept used in evaluation is an ROC curve (Receiver Operating Characterstics)

For a set of queries Q, let R(j,d) be the relevance score assessors gave to document d for query j. So


8.5
Reliable and informative test collections.
In social sciences, a common measure for agreement between judges is the kappa statistic. It is designed for categorical judgments and corrects a simple agreement rate for the rate of chance agreement.
One way to approach measuring this is by using distinct facts or entities as evaluation units.


8.6
User utility and how it is approximated by the use of document relevance.


  •   System issues
  • User utility
  • Refining a deployed system


8.7
Short summary of the document: two basic kinds of summaries are static and dynamic

What's the value of TREC: is there a gap to jump or a chasm to bridge? 
To address not further generalisation across information-seeking contexts but context-driven particularisation. This note develops this argument from an analysis of TREC work, applying notions taken from discussions of evaluation for language and information processing in general.

Cumulated gain-based evaluation of IR techniques ACM Transactions on Information Systems
Several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position.
The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data




No comments:

Post a Comment