Information Retrieval_INFSCI2140: April 2014

Friday, April 25, 2014

Week 14: Reading Notes

Generalizing from Relevance Feedback using Named Entity Wildcards
Adaptive Filtering is online prediction of relevance documents task.

Related to question answering, information extraction, text categorization, and thesaurus-based query expansion.

Approach:
1. Other facts of the same type
2. Facts of other types

Nugget-level adaptive filtering has important implications on filtering task:
1. Given a nugget-level evaluation, the user is assumed to be looking for specific facts.
2. The system must determine the novelty of each candidate passage with respect to passages delivered to the user in the past.
3. Relevance feedback at the passage level might be too specific to learn.

Components of the system:
1. Adaptive Filering Component
2. Novelty Detection Component

Information Retrieval on the Semantic Web
Answering queries on the web
Event ontology
Information Exaction
Inference System and Information Retrieval System
Hybrid Information Retrieval

Week 14: Muddiest Point

Since I'm interested in the Human Computer Interaction, so in the new fronts in the IR, there's a field-- Users and interactive IR, I was wondering that how the HCI theory applied to the Information Retrieval Model.

Saturday, April 19, 2014

Week 13: Muddiest Point

In evaluating the clustering with purity and entropy, we should firstly find out the ground truth, what if not provided a ground truth? Should we use the purity and entropy to evaluate as well?

Friday, April 11, 2014

Week 12: Muddiest Points

When discussing the personalized search, Professor talked about one way to provide more user information rather than the user query, which is user’s interaction history that can be used as implicit feedback. But I was wondering that whether the interaction history meant the implicit feedback the user did before or the ongoing feedback. If it's the former one, where to collect such history information.

Week 12: Reading Notes

IIR Chapter 13
Text classification and Naive Bayes
1. The text classification problem

TRAINMULTINOMIALNB(C,D)

1 V ← EXTRACTVOCABULARY(D)

2 N ← COUNTDOCS(D)

3 for each c ∈ C

4 do Nc ← COUNTDOCSINCLASS(D, c)

5 prior[c] ← Nc/N

6 textc ← CONCATENATETEXTOFALLDOCSINCLASS(D, c)

7 for each t ∈ V

8 do Tct ← COUNTTOKENSOFTERM(textc, t)

9 for each t ∈ V

10 do condprob[t][c] ← Tct+1

åt′ (Tct′+1)

11 return V, prior, condprob

APPLYMULTINOMIALNB(C,V, prior, condprob, d)

1 W ← EXTRACTTOKENSFROMDOC(V, d)

2 for each c ∈ C

3 do score[c] ← log prior[c]

4 for each t ∈ W

5 do score[c] += log condprob[t][c]

6 return argmaxc∈C score[c]

2. Relation to multinomial unigram language model

3. The Bernoulli model

4. Properties of Naive Bayes

4.1 A variant of the multinomial model

5. Feature selection: is the process of selecting a subset of the terms occurring

in the training set and using only this subset as features in text classification.

5.1 Mutual information

Friday, April 4, 2014

Week 11: Reading Notes

User Profiles for Personalized Information Access

Many research projects are exploring the use of personalized applications that manage this deluge by tailoring the information presented to individual users.

The amount of information available causes information overloading, the demand for personalized approaches for information access increases. How user profiles specifically designed for providing personalized information access.

1.
Profiles that can be modified or augmented are considered to be dynamic, different from static profiles.
Short-term profiles represent user's current interests,
Long-term profiles indicate interests that are not subject to frequent changes over time.

2. Collecting Information About users
This system must be able to uniquely identify users.

5 basic approaches to user identification:
software agents
logins
enhanced proxy servers
cookies
session ids

3. Methods for User Information Collection
Explicit User Information Collection
Implicit User Information Collection

4. User Profile Representations
Keyword Profiles
Semantic Network Profiles
Concept Profiles

5. User Profile Construction
Building Keyword Profiles
Building Semantic Network Profiles
Building Concept Profiles

Week 11: Muddiest Point

I have a question in this lecture. For the MLIR, sometimes it will use word-level alignments. It will translate two languages from noun to noun, verb to verb, but after translation the word order may be different according to different languages. But sometimes, word to word translation is not accurate and may have ambiguity. How to solve this problem?