[Lucene] 4.9.0 analyzer & tokenizer....
Elastic/Elasticsearch 2014. 11. 5. 13:05http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html
https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/TokenStream.html
Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here")); OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class); try { ts.reset(); // Resets this stream to the beginning. (Required) while (ts.incrementToken()) { // Use
AttributeSource.reflectAsString(boolean)
// for token stream debugging. System.out.println("token: " + ts.reflectAsString(true)); System.out.println("token start offset: " + offsetAtt.startOffset()); System.out.println(" token end offset: " + offsetAtt.endOffset()); } ts.end(); // Perform end-of-stream operations, e.g. set the final offset. } finally { ts.close(); // Release resources associated with this stream. }
The workflow of the new TokenStream
API is as follows:
- Instantiation of
TokenStream
/TokenFilter
s which add/get attributes to/from theAttributeSource
. - The consumer calls
reset()
. - The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
- The consumer calls
incrementToken()
until it returns false consuming the attributes after each call. - The consumer calls
end()
so that any end-of-stream operations can be performed. - The consumer calls
close()
to release any resource when finished using theTokenStream
.
이전 버전이랑 바뀐 내용이 있으니 확인하셔야 합니다. :)