루씬 2.4.3 Field options for term vectors
Elastic/Elasticsearch 2013. 1. 23. 19:02[요약하면]
Term vectors are a mix between an
indexed field and a stored field. They’re similar to a stored field
because you can quickly retrieve all term vector fields for a given document:
term vectors are keyed first by document ID . But then, they’re keyed
secondarily by term, meaning they store a miniature inverted index for
that one document. Unlike a stored field, where the original
[어떤 경우에 사용하지]
Sometimes when you index a document
you’d like to retrieve all its unique terms at search time. One
common use is to speed up highlighting the matched tokens in stored
fields. (Highlighting is covered more in sections 8.3 and 8.4.) Another
use is to enable a link, “Find similar documents,” that when
clicked runs a new search using the salient terms in an original document. Yet
another example is automatic categorization of documents. Section 5.9 shows
concrete examples of using term vectors once they’re in your index.
2.4.3 Field options for term vectors
Sometimes when you index a
document you’d like to retrieve all its unique terms at search time. One common
use is to speed up highlighting the matched tokens in stored fields.
(Highlighting is covered more in sections 8.3 and 8.4.) Another use is to enable
a link, “Find similar documents,” that when clicked runs a new search using the
salient terms in an original document. Yet another example is automatic
categorization of documents. Section 5.9 shows concrete examples of using term
vectors once they’re in your index.
But what exactly are term vectors? Term
vectors are a mix between an indexed field and a stored field. They’re similar
to a stored field because you can quickly retrieve all term vector fields for a
given document: term vectors are keyed first by document ID . But then, they’re
keyed secondarily by term, meaning they store a miniature inverted index for
that one document. Unlike a stored field, where the original
String content
is stored verbatim, term vectors store the actual separate terms that were
produced by the analyzer, allowing you to retrieve all terms for each field, and
the frequency of their occurrence within the document, sorted in lexicographic
order. Because the tokens coming out of an analyzer also have position and
offset information (see section 4.2.1), you can choose separately whether these
details are also stored in your term vectors by passing these constants as the
fourth argument to the Field constructor:
TermVector.YES —Records the unique
terms that occurred, and their counts, in each document, but doesn’t store any
positions or offsets information
TermVector.WITH_POSITIONS —Records the
unique terms and their counts, and also the positions of each occurrence of
every term, but no offsets
TermVector.WITH_OFFSETS —Records the unique terms
and their counts, with the offsets (start and end character position) of each
occurrence of every term, but no positions
TermVector.WITH_POSITIONS_OFFSETS
—Stores unique terms and their counts, along with positions and offsets
TermVector.NO —Doesn’t store any term vector information
Note that you
can’t index term vectors unless you’ve also turned on indexing for the field.
Stated more directly: if Index.NO is specified for a field, you must also
specify
TermVector.NO .
We’re done with the detailed options to control
indexing, storing, and term vec-tors. Now let’s see how you can create a field
with values other than String .