[Elasticsearch] lucene arirang analyzer 플러그인 적용 on elasticsearch 2.0

Elastic/Elasticsearch 2015. 11. 4. 15:37

elasticsearch 2.0 GA 기념으로 수명님의 lucene arirang 한글분석기 적용방법을 알아 보도록 하겠습니다.

이전에 작성된 elasticsearch analyzer arirang 은 아래 글 참고 부탁 드립니다.


http://jjeong.tistory.com/958


[Requirement]

elasticsearch 2.0

jdk 1.7 이상 (elastic 에서 추천 하는 버전은 1.8 이상입니다.)

maven 3.1 이상

arirang.lucene-analyzer-5.0-1.0.0.jar (http://cafe.naver.com/korlucene/1274)

arirang-morph-1.0.0.jar (http://cafe.naver.com/korlucene/1274)


[Analysis Plugins]

https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/analysis.html


[Plugin 작성 시 변경 내용 - 하나]

- es-plugin.properties 파일이 없어 지고 plugin-descriptor.properties 가 생겼습니다.

- plugin-descriptor.properties 내용은 아래와 같습니다.


classname=org.elasticsearch.plugin.analysis.arirang.AnalysisArirangPlugin

name=arirang

jvm=true

java.version=1.7

site=false

isolated=true

description=Arirang plugin

version=${project.version}

elasticsearch.version=${elasticsearch.version}

hash=${buildNumber}

timestamp=${timestamp}


▶ 자세한 설명을 원하시는 분들은 아래 링크 참고 하시면 됩니다.

https://www.elastic.co/guide/en/elasticsearch/plugins/current/plugin-authors.html#_plugin_descriptor_file


[Plugin 작성 시 변경 내용 - 둘]

- 기존에 상속 받았던 AbstractPlugin이 없어지고 Plugin을 상속 받아 구현하도록 변경 되었습니다.

From.

public class AnalysisArirangPlugin extends AbstractPlugin {...}


To.

public class AnalysisArirangPlugin extends Plugin {...}


그 밖에는 변경된 내용은 아래 arirang 에서 바뀐 부분이 적용된 내용이 전부 입니다.


[Arirang 변경 내용]

- KoreanAnalyzer 에서 lucene version 정보를 받았으나 이제는 정보를 받지 않습니다.

From.

analyzer = new KoreanAnalyzer(Lucene.VERSION.LUCENE_47);


To.

analyzer = new KoreanAnalyzer();


- KoreanTokenizer 에서는 기존에 reader 정보를 받았으나 이제는 정보를 받지 않습니다.

From.

return new KoreanTokenizer(reader);


To.

return new KoreanTokenizer();


[2.0 적용 시 바뀐 내용]

- assemblies/plugin.xml을 수정 하였습니다.

plugins 폴더에 zip 파일 올려 두고 압축 풀면 바로 동작 할 수 있도록 구성을 변경 하였습니다.


<?xml version="1.0"?>

<assembly>

  <id>plugin</id>

  <formats>

      <format>zip</format>

  </formats>

  <includeBaseDirectory>false</includeBaseDirectory>


  <files>

    <file>

      <source>lib/arirang.lucene-analyzer-5.0-1.0.0.jar</source>

      <outputDirectory>analysis-arirang</outputDirectory>

    </file>

    <file>

      <source>lib/arirang-morph-1.0.0.jar</source>

      <outputDirectory>analysis-arirang</outputDirectory>

    </file>

    <file>

      <source>target/elasticsearch-analysis-arirang-1.0.0.jar</source>

      <outputDirectory>analysis-arirang</outputDirectory>

    </file>

    <file>

      <source>${basedir}/src/main/resources/plugin-descriptor.properties</source>

      <outputDirectory>analysis-arirang</outputDirectory>

      <filtered>true</filtered>

    </file>

  </files>

</assembly>

코드는 직관적이라서 쉽게 이해 하실 수 있을 거라 생각 합니다.

필요한 jar 파일들과 properties 파일을 analysis-arirang 이라는 폴더로 묶는 것입니다.

<filtered>true</filtered> 옵션은 아래 링크 참고 하세요. (해당 파일이 filtering 되었는지 확인 하는 것입니다.)

https://maven.apache.org/plugins/maven-assembly-plugin/assembly.html#class_file


여기서 plugin-descriptor.properties 파일이 포함이 안되어 있게 되면 elasticsearch 실행 시 에러가 발생하고 실행이 안됩니다.

주의 하셔야 하는 부분(?) 입니다.


- plugin-descriptor.properties 파일 없을 때 에러 메시지


[2015-11-04 12:34:14,522][INFO ][node                     ] [Lady Jacqueline Falsworth Crichton] initializing ...


Exception in thread "main" java.lang.IllegalStateException: Unable to initialize plugins

Likely root cause: java.nio.file.NoSuchFileException: /Users/hwjeong/server/app/elasticsearch/elasticsearch-2.0.0/plugins/analysis-arirang/plugin-descriptor.properties

at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)

at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)

at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)

at java.nio.file.Files.newByteChannel(Files.java:315)

at java.nio.file.Files.newByteChannel(Files.java:361)

at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:380)

at java.nio.file.Files.newInputStream(Files.java:106)

at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86)

at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:306)

at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:112)

at org.elasticsearch.node.Node.<init>(Node.java:144)

at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)

at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

Refer to the log for complete error details.


- Test Code 추가

뭐가 coverage 를 올리기 위한 그런 테스트 코드는 아닙니다. ;;


▶ ArirangAnalysisTest.java


이 테스트는 elasticsearch에서 실제 작성된 플러그인이 제대로 module 로 등록 되고 등록된 module에 대한 service 를 가져 오는지 보는 것입니다.

elasticsearch 소스코드를 내려 받으시면 plugins 에 들어 있는 코드 그대로 copy & paste 한 것입니다.


▶ ArirangAnalyzerTest.java


이 테스트는 _analyze 에 대한 REST API 와 실제 index.analysis 세팅 사이에 구성이 어떻게 코드로 반영 되는지 상호 맵핑 하기 위해 작성 되었습니다.

analyzer, tokenizer, tokenfilter 에 대해서 어떻게 동작 하는지 그나마 쉽게 이해 하시는데 도움이 될까 싶어 작성된 코드 입니다.


※ Elasticsearch Test Suite 이슈 - 자답(?)

현재 master branch 는 문제 없이 잘 됩니다.

다만 2.0 branch 에서는 아래와 같은 또는 다른 문제가 발생을 합니다.

그냥 master 받아서 테스트 하시길 권장 합니다.


※ Elasticsearch Test Suite 이슈.

이건 제가 잘못해서 발생 한 것일 수도 있기 때문에 혹시 해결 하신 분이 계시면 공유 좀 부탁 드립니다.


▶ 발생 에러

/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/bin/java -ea -Didea.launcher.port=7533 "-Didea.launcher.bin.path=/Applications/IntelliJ IDEA 14 CE.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath "/Applications/IntelliJ IDEA 14 CE.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ IDEA 14 CE.app/Contents/plugins/junit/lib/junit-rt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/javafx-doclet.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/tools.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/htmlconverter.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/rt.jar:/Users/hwjeong/git/elasticsearch-analysis-arirang/target/test-classes:/Users/hwjeong/git/elasticsearch-analysis-arirang/target/classes:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-core/5.2.1/lucene-core-5.2.1.jar:/Users/hwjeong/.m2/repository/org/elasticsearch/elasticsearch/2.0.0/elasticsearch-2.0.0.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-backward-codecs/5.2.1/lucene-backward-codecs-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-analyzers-common/5.2.1/lucene-analyzers-common-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-queries/5.2.1/lucene-queries-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-memory/5.2.1/lucene-memory-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-highlighter/5.2.1/lucene-highlighter-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-queryparser/5.2.1/lucene-queryparser-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-sandbox/5.2.1/lucene-sandbox-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-suggest/5.2.1/lucene-suggest-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-misc/5.2.1/lucene-misc-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-join/5.2.1/lucene-join-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-grouping/5.2.1/lucene-grouping-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-spatial/5.2.1/lucene-spatial-5.2.1.jar:/Users/hwjeong/.m2/repository/com/spatial4j/spatial4j/0.4.1/spatial4j-0.4.1.jar:/Users/hwjeong/.m2/repository/com/google/guava/guava/18.0/guava-18.0.jar:/Users/hwjeong/.m2/repository/com/carrotsearch/hppc/0.7.1/hppc-0.7.1.jar:/Users/hwjeong/.m2/repository/joda-time/joda-time/2.8.2/joda-time-2.8.2.jar:/Users/hwjeong/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.5.3/jackson-dataformat-smile-2.5.3.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.5.3/jackson-dataformat-yaml-2.5.3.jar:/Users/hwjeong/.m2/repository/org/yaml/snakeyaml/1.12/snakeyaml-1.12.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-cbor/2.5.3/jackson-dataformat-cbor-2.5.3.jar:/Users/hwjeong/.m2/repository/io/netty/netty/3.10.5.Final/netty-3.10.5.Final.jar:/Users/hwjeong/.m2/repository/com/ning/compress-lzf/1.0.2/compress-lzf-1.0.2.jar:/Users/hwjeong/.m2/repository/com/tdunning/t-digest/3.0/t-digest-3.0.jar:/Users/hwjeong/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.6/HdrHistogram-2.1.6.jar:/Users/hwjeong/.m2/repository/commons-cli/commons-cli/1.3.1/commons-cli-1.3.1.jar:/Users/hwjeong/.m2/repository/com/twitter/jsr166e/1.1.0/jsr166e-1.1.0.jar:/Users/hwjeong/.m2/repository/log4j/log4j/1.2.16/log4j-1.2.16.jar:/Users/hwjeong/.m2/repository/org/slf4j/slf4j-api/1.6.2/slf4j-api-1.6.2.jar:/Users/hwjeong/.m2/repository/org/slf4j/slf4j-log4j12/1.6.2/slf4j-log4j12-1.6.2.jar:/Users/hwjeong/git/elasticsearch-analysis-arirang/lib/arirang-morph-1.0.0.jar:/Users/hwjeong/git/elasticsearch-analysis-arirang/lib/arirang.lucene-analyzer-5.0-1.0.0.jar:/Users/hwjeong/.m2/repository/junit/junit/4.11/junit-4.11.jar:/Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar:/Users/hwjeong/.m2/repository/com/carrotsearch/randomizedtesting/randomizedtesting-runner/2.1.16/randomizedtesting-runner-2.1.16.jar:/Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-all/1.3/hamcrest-all-1.3.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-test-framework/5.2.1/lucene-test-framework-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-codecs/5.2.1/lucene-codecs-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar:/Users/hwjeong/.m2/repository/org/elasticsearch/elasticsearch/2.0.0/elasticsearch-2.0.0-tests.jar:/Users/hwjeong/.m2/repository/net/java/dev/jna/jna/4.1.0/jna-4.1.0.jar" com.intellij.rt.execution.application.AppMain com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 org.elasticsearch.index.analysis.ArirangAnalysisTest,testArirangAnalysis

log4j:WARN No appenders could be found for logger (org.elasticsearch.bootstrap).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


java.lang.RuntimeException: found jar hell in test classpath

at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:63)

at org.elasticsearch.test.ESTestCase.<clinit>(ESTestCase.java:106)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:270)

at com.carrotsearch.randomizedtesting.RandomizedRunner$1.run(RandomizedRunner.java:573)

Caused by: java.lang.IllegalStateException: jar hell!

class: org.hamcrest.BaseDescription

jar1: /Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar

jar2: /Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-all/1.3/hamcrest-all-1.3.jar

at org.elasticsearch.bootstrap.JarHell.checkClass(JarHell.java:267)

at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:185)

at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:86)

at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:61)

... 4 more


: