Microsoft Store
 

Information retrieval


 

Information retrieval (IR) is the art and science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these have their own bodies of literature, theory, praxis and technologies.

Performance measures

There are various ways to measure how well the retrieved information matches the intended information:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Precision

The proportion of relevant documents of all documents retrieved:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:P = (number of relevant documents retrieved) / (number of documents retrieved)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In binary classification, precision is analogous to positive predictive value.

Related Topics:
Binary classification - Positive predictive value

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Precision can also be evaluated at a given cut-off rank, denoted P@n, instead of all retrieved documents.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Recall

The proportion of retrieved documents that are relevant, out of all relevant documents available:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:R = (number of relevant documents retrieved) / (number of relevant documents)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In binary classification, recall is called sensitivity.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

F-measure

The harmonic mean of precision and recall:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:F = 2 imes mathrm{precision} imes mathrm{recall} / (mathrm{precision} + mathrm{recall}).,

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Mean average precision

Over a set of queries, find the mean of the average precisions, where Average Precision is the average of the precision after each relevant document is retrieved.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Where r is the rank, N the number retrieved, rel() a binary function on the relevance of a given rank, and P() precision at a given cut-off rank:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

: operatorname{Ave}P = rac{sum_{r=1}^N (P(r) imes mathrm{rel}(r))}{mbox{number of relevant documents}} !

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

This method emphasizes returning more relevant documents earlier.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~