'tokenstream'에 해당되는 글 1건

  1. 2014.11.05 [Lucene] 4.9.0 analyzer & tokenizer....

[Lucene] 4.9.0 analyzer & tokenizer....

Elastic/Elasticsearch 2014. 11. 5. 13:05

http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/package-summary.html

https://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/TokenStream.html

Version matchVersion = Version.LUCENE_XY; // Substitute desired Lucene version for XY Analyzer analyzer = new StandardAnalyzer(matchVersion); // or any other analyzer TokenStream ts = analyzer.tokenStream("myfield", new StringReader("some text goes here")); OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class); try { ts.reset(); // Resets this stream to the beginning. (Required) while (ts.incrementToken()) { // Use AttributeSource.reflectAsString(boolean) // for token stream debugging. System.out.println("token: " + ts.reflectAsString(true)); System.out.println("token start offset: " + offsetAtt.startOffset()); System.out.println(" token end offset: " + offsetAtt.endOffset()); } ts.end(); // Perform end-of-stream operations, e.g. set the final offset. } finally { ts.close(); // Release resources associated with this stream. }


The workflow of the new TokenStream API is as follows:

  1. Instantiation of TokenStream/TokenFilters which add/get attributes to/from the AttributeSource.
  2. The consumer calls reset().
  3. The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
  4. The consumer calls incrementToken() until it returns false consuming the attributes after each call.
  5. The consumer calls end() so that any end-of-stream operations can be performed.
  6. The consumer calls close() to release any resource when finished using the TokenStream.



이전 버전이랑 바뀐 내용이 있으니 확인하셔야 합니다. :)

: