[Elasticsearch] Lucene Arirang Analyzer Plugin for Elasticsearch 5.0.1

Elastic/Elasticsearch 2016. 11. 24. 19:02

우선 빌드한 플러그인 zip 파일 먼저 공유 합니다.

나중에 작업한 내용에 대해서는 github 에 올리도록 하겠습니다.

요즘 프로젝트며 운영 업무가 너무 많아서 이것도 겨우 겨우 시간 내서 작업 했내요.


elasticsearch-analysis-arirang-5.0.1.zip


설치 방법)

$ bin/elasticsearch-plugin install --verbose file:///elasticsearch-analysis-arirang/target/elasticsearch-analysis-arirang-5.0.1.zip


설치 로그)

-> Downloading file:///elasticsearch-analysis-arirang-5.0.1.zip

Retrieving zip from file:///elasticsearch-analysis-arirang-5.0.1.zip

[=================================================] 100%

- Plugin information:

Name: analysis-arirang

Description: Arirang plugin

Version: 5.0.1

 * Classname: org.elasticsearch.plugin.analysis.arirang.AnalysisArirangPlugin

-> Installed analysis-arirang


Elasticsearch 실행 로그)

$ bin/elasticsearch

[2016-11-24T18:49:09,922][INFO ][o.e.n.Node               ] [] initializing ...

[2016-11-24T18:49:10,083][INFO ][o.e.e.NodeEnvironment    ] [aDGu2B9] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [733.1gb], net total_space [930.3gb], spins? [unknown], types [hfs]

[2016-11-24T18:49:10,084][INFO ][o.e.e.NodeEnvironment    ] [aDGu2B9] heap size [1.9gb], compressed ordinary object pointers [true]

[2016-11-24T18:49:10,085][INFO ][o.e.n.Node               ] [aDGu2B9] node name [aDGu2B9] derived from node ID; set [node.name] to override

[2016-11-24T18:49:10,087][INFO ][o.e.n.Node               ] [aDGu2B9] version[5.0.1], pid[56878], build[080bb47/2016-11-11T22:08:49.812Z], OS[Mac OS X/10.12.1/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_72/25.72-b15]

[2016-11-24T18:49:11,335][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [aggs-matrix-stats]

[2016-11-24T18:49:11,335][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [ingest-common]

[2016-11-24T18:49:11,335][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [lang-expression]

[2016-11-24T18:49:11,335][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [lang-groovy]

[2016-11-24T18:49:11,335][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [lang-mustache]

[2016-11-24T18:49:11,336][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [lang-painless]

[2016-11-24T18:49:11,336][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [percolator]

[2016-11-24T18:49:11,336][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [reindex]

[2016-11-24T18:49:11,336][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [transport-netty3]

[2016-11-24T18:49:11,336][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded module [transport-netty4]

[2016-11-24T18:49:11,336][INFO ][o.e.p.PluginsService     ] [aDGu2B9] loaded plugin [analysis-arirang]

[2016-11-24T18:49:14,151][INFO ][o.e.n.Node               ] [aDGu2B9] initialized

[2016-11-24T18:49:14,151][INFO ][o.e.n.Node               ] [aDGu2B9] starting ...

[2016-11-24T18:49:14,377][INFO ][o.e.t.TransportService   ] [aDGu2B9] publish_address {127.0.0.1:9300}, bound_addresses {[fe80::1]:9300}, {[::1]:9300}, {127.0.0.1:9300}

[2016-11-24T18:49:17,511][INFO ][o.e.c.s.ClusterService   ] [aDGu2B9] new_master {aDGu2B9}{aDGu2B9mQ8KkWCe3fnqeMw}{_y9RzyKGSvqYAFcv99HBXg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)

[2016-11-24T18:49:17,584][INFO ][o.e.g.GatewayService     ] [aDGu2B9] recovered [0] indices into cluster_state

[2016-11-24T18:49:17,588][INFO ][o.e.h.HttpServer         ] [aDGu2B9] publish_address {127.0.0.1:9200}, bound_addresses {[fe80::1]:9200}, {[::1]:9200}, {127.0.0.1:9200}

[2016-11-24T18:49:17,588][INFO ][o.e.n.Node               ] [aDGu2B9] started


한글형태소분석 실행)

$ curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: 6d392d83-5816-71ad-556b-5cd6f92af634" -d '{

  "analyzer" : "arirang_analyzer",

  "text" : "[한국] 엘라스틱서치 사용자 그룹의 HENRY 입니다."

}' "http://localhost:9200/_analyze"


형태소분석 결과)

{

  "tokens": [

    {

      "token": "[",

      "start_offset": 0,

      "end_offset": 1,

      "type": "symbol",

      "position": 0

    },

    {

      "token": "한국",

      "start_offset": 1,

      "end_offset": 3,

      "type": "korean",

      "position": 1

    },

    {

      "token": "]",

      "start_offset": 3,

      "end_offset": 4,

      "type": "symbol",

      "position": 2

    },

    {

      "token": "엘라스틱서치",

      "start_offset": 5,

      "end_offset": 11,

      "type": "korean",

      "position": 3

    },

    {

      "token": "엘라",

      "start_offset": 5,

      "end_offset": 7,

      "type": "korean",

      "position": 3

    },

    {

      "token": "스틱",

      "start_offset": 7,

      "end_offset": 9,

      "type": "korean",

      "position": 4

    },

    {

      "token": "서치",

      "start_offset": 9,

      "end_offset": 11,

      "type": "korean",

      "position": 5

    },

    {

      "token": "사용자",

      "start_offset": 12,

      "end_offset": 15,

      "type": "korean",

      "position": 6

    },

    {

      "token": "그룹",

      "start_offset": 16,

      "end_offset": 18,

      "type": "korean",

      "position": 7

    },

    {

      "token": "henry",

      "start_offset": 20,

      "end_offset": 25,

      "type": "word",

      "position": 8

    },

    {

      "token": "입니다",

      "start_offset": 26,

      "end_offset": 29,

      "type": "korean",

      "position": 9

    }

  ]

}


: