루씬 2.4.3 Field options for term vectors

Elastic/Elasticsearch 2013. 1. 23. 19:02
퇴근 하면서 급하게 올리다 보니 아무런 내용도 없이 그냥 스크랩 내용만 등록을 했내요.

[요약하면]
Term vectors are a mix between an indexed field and a stored field. They’re similar to a stored field because you can quickly retrieve all term vector fields for a given document: term vectors are keyed first by document ID . But then, they’re keyed secondarily by term, meaning they store a miniature inverted index for that one document. Unlike a stored field, where the original

[어떤 경우에 사용하지]
Sometimes when you index a document you’d like to retrieve all its unique terms at search time. One common use is to speed up highlighting the matched tokens in stored fields. (Highlighting is covered more in sections 8.3 and 8.4.) Another use is to enable a link, “Find similar documents,” that when clicked runs a new search using the salient terms in an original document. Yet another example is automatic categorization of documents. Section 5.9 shows concrete examples of using term vectors once they’re in your index.


2.4.3 Field options for term vectors
Sometimes when you index a document you’d like to retrieve all its unique terms at search time. One common use is to speed up highlighting the matched tokens in stored fields. (Highlighting is covered more in sections 8.3 and 8.4.) Another use is to enable a link, “Find similar documents,” that when clicked runs a new search using the salient terms in an original document. Yet another example is automatic categorization of documents. Section 5.9 shows concrete examples of using term vectors once they’re in your index.
But what exactly are term vectors? Term vectors are a mix between an indexed field and a stored field. They’re similar to a stored field because you can quickly retrieve all term vector fields for a given document: term vectors are keyed first by document ID . But then, they’re keyed secondarily by term, meaning they store a miniature inverted index for that one document. Unlike a stored field, where the original
String content is stored verbatim, term vectors store the actual separate terms that were produced by the analyzer, allowing you to retrieve all terms for each field, and the frequency of their occurrence within the document, sorted in lexicographic order. Because the tokens coming out of an analyzer also have position and offset information (see section 4.2.1), you can choose separately whether these details are also stored in your term vectors by passing these constants as the fourth argument to the Field constructor:
TermVector.YES —Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information
TermVector.WITH_POSITIONS —Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets
TermVector.WITH_OFFSETS —Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions
TermVector.WITH_POSITIONS_OFFSETS —Stores unique terms and their counts, along with positions and offsets
TermVector.NO —Doesn’t store any term vector information
Note that you can’t index term vectors unless you’ve also turned on indexing for the field. Stated more directly: if Index.NO is specified for a field, you must also specify
TermVector.NO .

We’re done with the detailed options to control indexing, storing, and term vec-tors. Now let’s see how you can create a field with values other than String .


: