[Elasticsearch] lucene arirang analyzer 플러그인 적용 on elasticsearch 2.0
Elastic/Elasticsearch 2015. 11. 4. 15:37elasticsearch 2.0 GA 기념으로 수명님의 lucene arirang 한글분석기 적용방법을 알아 보도록 하겠습니다.
이전에 작성된 elasticsearch analyzer arirang 은 아래 글 참고 부탁 드립니다.
[Requirement]
elasticsearch 2.0
jdk 1.7 이상 (elastic 에서 추천 하는 버전은 1.8 이상입니다.)
maven 3.1 이상
arirang.lucene-analyzer-5.0-1.0.0.jar (http://cafe.naver.com/korlucene/1274)
arirang-morph-1.0.0.jar (http://cafe.naver.com/korlucene/1274)
[Analysis Plugins]
https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/analysis.html
[Plugin 작성 시 변경 내용 - 하나]
- es-plugin.properties 파일이 없어 지고 plugin-descriptor.properties 가 생겼습니다.
- plugin-descriptor.properties 내용은 아래와 같습니다.
classname=org.elasticsearch.plugin.analysis.arirang.AnalysisArirangPlugin
name=arirang
jvm=true
java.version=1.7
site=false
isolated=true
description=Arirang plugin
version=${project.version}
elasticsearch.version=${elasticsearch.version}
hash=${buildNumber}
timestamp=${timestamp}
▶ 자세한 설명을 원하시는 분들은 아래 링크 참고 하시면 됩니다.
[Plugin 작성 시 변경 내용 - 둘]
- 기존에 상속 받았던 AbstractPlugin이 없어지고 Plugin을 상속 받아 구현하도록 변경 되었습니다.
From.
public class AnalysisArirangPlugin extends AbstractPlugin {...}
To.
public class AnalysisArirangPlugin extends Plugin {...}
그 밖에는 변경된 내용은 아래 arirang 에서 바뀐 부분이 적용된 내용이 전부 입니다.
[Arirang 변경 내용]
- KoreanAnalyzer 에서 lucene version 정보를 받았으나 이제는 정보를 받지 않습니다.
From.
analyzer = new KoreanAnalyzer(Lucene.VERSION.LUCENE_47);
To.
analyzer = new KoreanAnalyzer();
- KoreanTokenizer 에서는 기존에 reader 정보를 받았으나 이제는 정보를 받지 않습니다.
From.
return new KoreanTokenizer(reader);
To.
return new KoreanTokenizer();
[2.0 적용 시 바뀐 내용]
- assemblies/plugin.xml을 수정 하였습니다.
plugins 폴더에 zip 파일 올려 두고 압축 풀면 바로 동작 할 수 있도록 구성을 변경 하였습니다.
<?xml version="1.0"?>
<assembly>
<id>plugin</id>
<formats>
<format>zip</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<files>
<file>
<source>lib/arirang.lucene-analyzer-5.0-1.0.0.jar</source>
<outputDirectory>analysis-arirang</outputDirectory>
</file>
<file>
<source>lib/arirang-morph-1.0.0.jar</source>
<outputDirectory>analysis-arirang</outputDirectory>
</file>
<file>
<source>target/elasticsearch-analysis-arirang-1.0.0.jar</source>
<outputDirectory>analysis-arirang</outputDirectory>
</file>
<file>
<source>${basedir}/src/main/resources/plugin-descriptor.properties</source>
<outputDirectory>analysis-arirang</outputDirectory>
<filtered>true</filtered>
</file>
</files>
</assembly>
코드는 직관적이라서 쉽게 이해 하실 수 있을 거라 생각 합니다.
필요한 jar 파일들과 properties 파일을 analysis-arirang 이라는 폴더로 묶는 것입니다.
<filtered>true</filtered> 옵션은 아래 링크 참고 하세요. (해당 파일이 filtering 되었는지 확인 하는 것입니다.)
https://maven.apache.org/plugins/maven-assembly-plugin/assembly.html#class_file
여기서 plugin-descriptor.properties 파일이 포함이 안되어 있게 되면 elasticsearch 실행 시 에러가 발생하고 실행이 안됩니다.
주의 하셔야 하는 부분(?) 입니다.
- plugin-descriptor.properties 파일 없을 때 에러 메시지
[2015-11-04 12:34:14,522][INFO ][node ] [Lady Jacqueline Falsworth Crichton] initializing ...
Exception in thread "main" java.lang.IllegalStateException: Unable to initialize plugins
Likely root cause: java.nio.file.NoSuchFileException: /Users/hwjeong/server/app/elasticsearch/elasticsearch-2.0.0/plugins/analysis-arirang/plugin-descriptor.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:315)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:380)
at java.nio.file.Files.newInputStream(Files.java:106)
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86)
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:306)
at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:112)
at org.elasticsearch.node.Node.<init>(Node.java:144)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
Refer to the log for complete error details.
- Test Code 추가
뭐가 coverage 를 올리기 위한 그런 테스트 코드는 아닙니다. ;;
▶ ArirangAnalysisTest.java
이 테스트는 elasticsearch에서 실제 작성된 플러그인이 제대로 module 로 등록 되고 등록된 module에 대한 service 를 가져 오는지 보는 것입니다.
elasticsearch 소스코드를 내려 받으시면 plugins 에 들어 있는 코드 그대로 copy & paste 한 것입니다.
▶ ArirangAnalyzerTest.java
이 테스트는 _analyze 에 대한 REST API 와 실제 index.analysis 세팅 사이에 구성이 어떻게 코드로 반영 되는지 상호 맵핑 하기 위해 작성 되었습니다.
analyzer, tokenizer, tokenfilter 에 대해서 어떻게 동작 하는지 그나마 쉽게 이해 하시는데 도움이 될까 싶어 작성된 코드 입니다.
※ Elasticsearch Test Suite 이슈 - 자답(?)
현재 master branch 는 문제 없이 잘 됩니다.
다만 2.0 branch 에서는 아래와 같은 또는 다른 문제가 발생을 합니다.
그냥 master 받아서 테스트 하시길 권장 합니다.
※ Elasticsearch Test Suite 이슈.
이건 제가 잘못해서 발생 한 것일 수도 있기 때문에 혹시 해결 하신 분이 계시면 공유 좀 부탁 드립니다.
▶ 발생 에러
/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/bin/java -ea -Didea.launcher.port=7533 "-Didea.launcher.bin.path=/Applications/IntelliJ IDEA 14 CE.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath "/Applications/IntelliJ IDEA 14 CE.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ IDEA 14 CE.app/Contents/plugins/junit/lib/junit-rt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/javafx-doclet.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/tools.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/htmlconverter.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home/jre/lib/rt.jar:/Users/hwjeong/git/elasticsearch-analysis-arirang/target/test-classes:/Users/hwjeong/git/elasticsearch-analysis-arirang/target/classes:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-core/5.2.1/lucene-core-5.2.1.jar:/Users/hwjeong/.m2/repository/org/elasticsearch/elasticsearch/2.0.0/elasticsearch-2.0.0.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-backward-codecs/5.2.1/lucene-backward-codecs-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-analyzers-common/5.2.1/lucene-analyzers-common-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-queries/5.2.1/lucene-queries-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-memory/5.2.1/lucene-memory-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-highlighter/5.2.1/lucene-highlighter-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-queryparser/5.2.1/lucene-queryparser-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-sandbox/5.2.1/lucene-sandbox-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-suggest/5.2.1/lucene-suggest-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-misc/5.2.1/lucene-misc-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-join/5.2.1/lucene-join-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-grouping/5.2.1/lucene-grouping-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-spatial/5.2.1/lucene-spatial-5.2.1.jar:/Users/hwjeong/.m2/repository/com/spatial4j/spatial4j/0.4.1/spatial4j-0.4.1.jar:/Users/hwjeong/.m2/repository/com/google/guava/guava/18.0/guava-18.0.jar:/Users/hwjeong/.m2/repository/com/carrotsearch/hppc/0.7.1/hppc-0.7.1.jar:/Users/hwjeong/.m2/repository/joda-time/joda-time/2.8.2/joda-time-2.8.2.jar:/Users/hwjeong/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.5.3/jackson-dataformat-smile-2.5.3.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.5.3/jackson-dataformat-yaml-2.5.3.jar:/Users/hwjeong/.m2/repository/org/yaml/snakeyaml/1.12/snakeyaml-1.12.jar:/Users/hwjeong/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-cbor/2.5.3/jackson-dataformat-cbor-2.5.3.jar:/Users/hwjeong/.m2/repository/io/netty/netty/3.10.5.Final/netty-3.10.5.Final.jar:/Users/hwjeong/.m2/repository/com/ning/compress-lzf/1.0.2/compress-lzf-1.0.2.jar:/Users/hwjeong/.m2/repository/com/tdunning/t-digest/3.0/t-digest-3.0.jar:/Users/hwjeong/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.6/HdrHistogram-2.1.6.jar:/Users/hwjeong/.m2/repository/commons-cli/commons-cli/1.3.1/commons-cli-1.3.1.jar:/Users/hwjeong/.m2/repository/com/twitter/jsr166e/1.1.0/jsr166e-1.1.0.jar:/Users/hwjeong/.m2/repository/log4j/log4j/1.2.16/log4j-1.2.16.jar:/Users/hwjeong/.m2/repository/org/slf4j/slf4j-api/1.6.2/slf4j-api-1.6.2.jar:/Users/hwjeong/.m2/repository/org/slf4j/slf4j-log4j12/1.6.2/slf4j-log4j12-1.6.2.jar:/Users/hwjeong/git/elasticsearch-analysis-arirang/lib/arirang-morph-1.0.0.jar:/Users/hwjeong/git/elasticsearch-analysis-arirang/lib/arirang.lucene-analyzer-5.0-1.0.0.jar:/Users/hwjeong/.m2/repository/junit/junit/4.11/junit-4.11.jar:/Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar:/Users/hwjeong/.m2/repository/com/carrotsearch/randomizedtesting/randomizedtesting-runner/2.1.16/randomizedtesting-runner-2.1.16.jar:/Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-all/1.3/hamcrest-all-1.3.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-test-framework/5.2.1/lucene-test-framework-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/lucene/lucene-codecs/5.2.1/lucene-codecs-5.2.1.jar:/Users/hwjeong/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar:/Users/hwjeong/.m2/repository/org/elasticsearch/elasticsearch/2.0.0/elasticsearch-2.0.0-tests.jar:/Users/hwjeong/.m2/repository/net/java/dev/jna/jna/4.1.0/jna-4.1.0.jar" com.intellij.rt.execution.application.AppMain com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 org.elasticsearch.index.analysis.ArirangAnalysisTest,testArirangAnalysis
log4j:WARN No appenders could be found for logger (org.elasticsearch.bootstrap).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
java.lang.RuntimeException: found jar hell in test classpath
at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:63)
at org.elasticsearch.test.ESTestCase.<clinit>(ESTestCase.java:106)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at com.carrotsearch.randomizedtesting.RandomizedRunner$1.run(RandomizedRunner.java:573)
Caused by: java.lang.IllegalStateException: jar hell!
class: org.hamcrest.BaseDescription
jar1: /Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
jar2: /Users/hwjeong/.m2/repository/org/hamcrest/hamcrest-all/1.3/hamcrest-all-1.3.jar
at org.elasticsearch.bootstrap.JarHell.checkClass(JarHell.java:267)
at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:185)
at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:86)
at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:61)
... 4 more