[Lucene] 4.9.0 analyzer & tokenizer....
Elastic/Elasticsearch 2014. 11. 5. 13:05http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html
https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/TokenStream.html
Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY
Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer
TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here"));
OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
try {
ts.reset(); // Resets this stream to the beginning. (Required)
while (ts.incrementToken()) {
// Use AttributeSource.reflectAsString(boolean)
// for token stream debugging.
System.out.println("token: " + ts.reflectAsString(true));
System.out.println("token start offset: " + offsetAtt.startOffset());
System.out.println(" token end offset: " + offsetAtt.endOffset());
}
ts.end(); // Perform end-of-stream operations, e.g. set the final offset.
} finally {
ts.close(); // Release resources associated with this stream.
}
The workflow of the new TokenStream
API is as follows:
- Instantiation of
TokenStream
/TokenFilter
s which add/get attributes to/from theAttributeSource
. - The consumer calls
reset()
. - The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
- The consumer calls
incrementToken()
until it returns false consuming the attributes after each call. - The consumer calls
end()
so that any end-of-stream operations can be performed. - The consumer calls
close()
to release any resource when finished using theTokenStream
.
이전 버전이랑 바뀐 내용이 있으니 확인하셔야 합니다. :)