Generalizing from Relevance Feedback using Named Entity Wildcards
Adaptive Filtering is online prediction of relevance documents task.
Related to question answering, information extraction, text categorization, and thesaurus-based query expansion.
Approach:
1. Other facts of the same type
2. Facts of other types
Nugget-level adaptive filtering has important implications on filtering task:
1. Given a nugget-level evaluation, the user is assumed to be looking for specific facts.
2. The system must determine the novelty of each candidate passage with respect to passages delivered to the user in the past.
3. Relevance feedback at the passage level might be too specific to learn.
Components of the system:
1. Adaptive Filering Component
2. Novelty Detection Component
Information Retrieval on the Semantic Web
Answering queries on the web
Event ontology
Information Exaction
Inference System and Information Retrieval System
Hybrid Information Retrieval
Friday, April 25, 2014
Week 14: Muddiest Point
Since I'm interested in the Human Computer Interaction, so in the new fronts in the IR, there's a field-- Users and interactive IR, I was wondering that how the HCI theory applied to the Information Retrieval Model.
Saturday, April 19, 2014
Week 13: Muddiest Point
In evaluating the clustering with purity and entropy, we should firstly find out the ground truth, what if not provided a ground truth? Should we use the purity and entropy to evaluate as well?
Friday, April 11, 2014
Week 12: Muddiest Points
When discussing the personalized search, Professor talked about one way to provide more user information rather than the user query, which is user’s interaction history that can be used as implicit
feedback. But I was wondering that whether the interaction history meant the implicit feedback the user did before or the ongoing feedback. If it's the former one, where to collect such history information.
Week 12: Reading Notes
IIR Chapter 13
Text classification and Naive Bayes
1. The text classification problem
Text classification and Naive Bayes
1. The text classification problem
TRAINMULTINOMIALNB(C,D)
1 V ← EXTRACTVOCABULARY(D)
2 N ← COUNTDOCS(D)
3 for each c ∈ C
4 do Nc ← COUNTDOCSINCLASS(D, c)
5 prior[c] ← Nc/N
6 textc ← CONCATENATETEXTOFALLDOCSINCLASS(D, c)
7 for each t ∈ V
8 do Tct ← COUNTTOKENSOFTERM(textc, t)
9 for each t ∈ V
10 do condprob[t][c] ← Tct+1
åt′ (Tct′+1)
11 return V, prior, condprob
APPLYMULTINOMIALNB(C,V, prior, condprob, d)
1 W ← EXTRACTTOKENSFROMDOC(V, d)
2 for each c ∈ C
3 do score[c] ← log prior[c]
4 for each t ∈ W
5 do score[c] += log condprob[t][c]
6 return argmaxc∈C score[c]
2. Relation to multinomial unigram language model
3. The Bernoulli model
4. Properties of Naive Bayes
4.1 A variant of the multinomial model
5. Feature selection: is the process of selecting a subset of the terms occurring
in the training set and using only this subset as features in text classification.
5.1 Mutual information
Friday, April 4, 2014
Week 11: Reading Notes
User Profiles for Personalized Information Access
Many research projects are exploring the use of personalized applications that manage this deluge by tailoring the information presented to individual users.
The amount of information available causes information overloading, the demand for personalized approaches for information access increases. How user profiles specifically designed for providing personalized information access.
1.
Profiles that can be modified or augmented are considered to be dynamic, different from static profiles.
Short-term profiles represent user's current interests,
Long-term profiles indicate interests that are not subject to frequent changes over time.
2. Collecting Information About users
This system must be able to uniquely identify users.
5 basic approaches to user identification:
software agents
logins
enhanced proxy servers
cookies
session ids
3. Methods for User Information Collection
Explicit User Information Collection
Implicit User Information Collection
4. User Profile Representations
Keyword Profiles
Semantic Network Profiles
Concept Profiles
5. User Profile Construction
Building Keyword Profiles
Building Semantic Network Profiles
Building Concept Profiles
Many research projects are exploring the use of personalized applications that manage this deluge by tailoring the information presented to individual users.
The amount of information available causes information overloading, the demand for personalized approaches for information access increases. How user profiles specifically designed for providing personalized information access.
1.
Profiles that can be modified or augmented are considered to be dynamic, different from static profiles.
Short-term profiles represent user's current interests,
Long-term profiles indicate interests that are not subject to frequent changes over time.
2. Collecting Information About users
This system must be able to uniquely identify users.
5 basic approaches to user identification:
software agents
logins
enhanced proxy servers
cookies
session ids
3. Methods for User Information Collection
Explicit User Information Collection
Implicit User Information Collection
4. User Profile Representations
Keyword Profiles
Semantic Network Profiles
Concept Profiles
5. User Profile Construction
Building Keyword Profiles
Building Semantic Network Profiles
Building Concept Profiles
Week 11: Muddiest Point
I have a question in this lecture. For the MLIR, sometimes it will use word-level alignments. It will translate two languages from noun to noun, verb to verb, but after translation the word order may be different according to different languages. But sometimes, word to word translation is not accurate and may have ambiguity. How to solve this problem?
Saturday, March 29, 2014
Friday, March 21, 2014
Week 9: Reading Notes
IIR Chapter 19
19.2 Web Characteristics
1. Web graph
19.2 Web Characteristics
1. Web graph
We can view the static Web consisting of static HTML pages together with the hyperlinks between them as a directed graph in which each web page is a node and each hyperlink a directed edge.
2. Spam
19.3 Advertising as the economic model
19.4 The search user experience:
1. User query needs
19.5 Index size and estimation
Chapter 21
21.1 The Web is as a graph
1. Anchor text and the web graph
21.2 Page Rank
1. Markov chains
2. The page Rank computation
3. Topic-specific Page Rank
21.3 Hubs and Authorities
1. Choosing the subset of the Web
Week 9: Muddiest Point
In this lecture, I got one question. When talking about the Document Surrogate, the length of complexity of URL was mentioned. But as we know, many URLs are generated automatically with the data from database and the URLs in general would only be copied and pasted by users, few people will type the whole URLs or change the URLs' query, in this case, how would the length of URLs influence the Document Surrogate?
Friday, February 28, 2014
Week 8: Reading Notes
MIR Chapter 10
10.1
The human side of the information seeking process and the aspects of this process that can best be supported by the user interface.
10.2
Design principles for human-computer interaction:
Notions related to information visualization:
An important aspect of human-computer interaction is the methodology for evaluation of user interface techniques.
Precision and recall measures have been widely used for comparing the ranking results of non-interactive systems.
10.3 The Information Access Process
Steps:
10.4 Many lists of collection
10.1
The human side of the information seeking process and the aspects of this process that can best be supported by the user interface.
10.2
Design principles for human-computer interaction:
- Offer information feedback
- Reduce working memory load
- Provide alternative interfaces for novice and expert users
Notions related to information visualization:
- Using icons and color highlighting
- brushing and linking
- panning and zooming
- focus-plus-context
- magic lenses
- use of animation
An important aspect of human-computer interaction is the methodology for evaluation of user interface techniques.
Precision and recall measures have been widely used for comparing the ranking results of non-interactive systems.
10.3 The Information Access Process
Steps:
- Start with an information need
- Select a system and collections to search on
- Formulate a query
- Send the query to the system
- Receive the results in the form of information items
- Scan, evaluate, and interpret the results
- Either stop, or,
- Reformulate the query and go to step 4
10.4 Many lists of collection
Week 8: Muddiest Point
The Recall/ Precision graph is performed this way:
But when interpolation of it, it is an empirical face that on average as recall increases, precision decreases, so the approach is that:
So theoretically the point with smaller recall would have higher precision, why the graph do not perform in this way (with dotted line) below:
Friday, February 21, 2014
Week 7: Reading Notes
Chapter 9
9.1 Relevant feedback and pseudo relevance feedback
Goal: involve the user in the retrieval process so as to
improve the final result set
Procedure:
The user issues a (short, simple) query.
The system returns an initial set of retrieval results.
The user marks some returned documents as relevant or
nonrelevant.
The systemcomputes a better representation of the
information need based on the user feedback.
The system displays a revised set of retrieval results.
Algorithms:
Rocchio algorithms
Probabilistic algorithms
Some search engines offer relevance feedback
How to evaluate feedback strategies
Pseudo relevance feedback
Indirect relevance feedback
Other uses of relevance feedback include:
Following a changing information need (e.g., names of car
models of interest change over time)
Maintaining an information filter (e.g., for a news feed).
Active learning (deciding which examples it is most useful
to know the class of to reduce annotation costs).
9.2 Global methods for query reformulation
Vocabulary tools for query reformulation
Users give additional input on query words or phrases
Automatic thesaurus generation by analyzing a collection of
documents: exploit word cooccuurrence and use a shallow grammatical analysis of
the text and to exploit grammatical relations or grammatical dependencies.
Improving the effectiveness of information retrieval with
local context analysis
Propose a new technique, called local context analysis, which
selects expansion terms based on cooccurrence with the query terms within the
top-ranked documents. Experiments on a number of collections, both English and
non-English, show that local context analysis offers more effective and
consistent retrieval results.
A study of methods for negative relevance feedback
Conduct a systematic study of methods for negative relevance
feedback.
Compare a set of representative negative feedback methods,
covering vector-space models and language models, as well as several special
heuristics for negative feedback. Evaluating negative feedback methods requires
a test set with sufficient difficult topics, but there are not many naturally
difficult topics in the existing test collections. We use two sampling
strategies to adapt a test collection with easy topics to evaluate negative
feedback. Experiment results on several TREC collections show that language
model based negative feedback methods are generally more effective than those
based on vector-space models, and using multiple negative models is an
effective heuristic for negative feedback.
Relevance feedback revisited
Experiments were performed at NIST to complete some of the
missing links found in using the probabilistic retrieval model. These
experiments, using the Cranfield 1400 collection, showed the importance of
query expansion in addition to query reweighting, and showed that adding as few
as 20 well-selected terms could result in performance improvements of over
100%.
Thursday, February 20, 2014
Week 7: Muddiest point
For the Kappa measure for inter-judge agreement, the P(A) is to calculate the agreement proportion of actual situation, the P(E) is to get the theoretical agreement proportion by chance. What does the Kappa equation: [P(A) - P(E)]/ [1- P(E)] mean to evaluate the performance?
And if the number of judges is larger than 2, say 5, how to average the pairwise kappas? Is that (C5,2), that is, every two of them should be calculated with kappas then get their average?
And if the number of judges is larger than 2, say 5, how to average the pairwise kappas? Is that (C5,2), that is, every two of them should be calculated with kappas then get their average?
Friday, February 14, 2014
Week 6: Muddiest point
In the Jelinek- Mercer Smoothing,
it uses cross-validation to set the λ (train, held-out, test).
While in the Dirichlet Prior Smoothing,
how to set the value of μ?
it uses cross-validation to set the λ (train, held-out, test).
While in the Dirichlet Prior Smoothing,
how to set the value of μ?
Subscribe to:
Comments (Atom)




