'elasticsearch'에 해당되는 글 420건

  1. 2021.05.11 [Elasticsearch] Nori 사전 빌드하기.
  2. 2021.04.30 [Elasticsearch] Arirang Plugin에 테스트로 Jamo Tokenizer API 넣어 봤습니다.
  3. 2021.04.09 [Elasticsearch] esrally 링크.
  4. 2021.04.06 [Elasticsearch] Elastic APM Quick 구성
  5. 2021.04.05 [Elasticsearch] Nori Analyzer 테스트
  6. 2021.04.05 [Elasticsearch] Document Indexing 관련
  7. 2021.02.02 [Elastic] Elasticsearch Cluster 구성하기 based on IaC 1
  8. 2020.12.11 [Elasticsearch] Discovery Mode 정리.
  9. 2020.10.20 [Similarity] Universal Sentence Encoder/4 - 링크.
  10. 2020.09.23 [Elasticsearch] 멀티노드 논리적 실행.

[Elasticsearch] Nori 사전 빌드하기.

Elastic/Elasticsearch 2021. 5. 11. 20:51

[추가 사항]

https://issues.apache.org/jira/browse/SOLR-12655?focusedCommentId=16604160&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&fbclid=IwAR3jRIpaCQ497v-qhofkc3DVmNabPab1ErDQhXnOsA0LNoqpHypa5cUSpy0#comment-16604160

 

아래 발생한 오류는 UnknownDictionaryBuilder.java 에서 아래 코드 수정으로 해결 되었습니다.

기본적으로 ivy.xml, build,xml 에 보면 사전 버전 정보가 들어가 있습니다.

이 사전이 변경 되면서 POS tag list 가 달라 졌는데요. 이 영향으로 에러가 발생 하게 됩니다.

private static final String NGRAM_DICTIONARY_ENTRY = "NGRAM,1801,3561,3668,SY,*,*,*,*,*,*,*";

코드를 수정 하지 않으려면 사전 버전을 맞춰서 사용 하시면 됩니다.

 

Elasticsearch User Group 에 #유정인 님이 도움 주셨습니다.


https://github.com/jimczi/nori/blob/master/how-to-custom-dict.asciidoc

https://bitbucket.org/eunjeon/mecab-ko/src/mecab-0.996/
https://bitbucket.org/eunjeon/mecab-ko-dic/src/v2.1.1/

$ git clone https://bitbucket.org/eunjeon/mecab-ko.git
$ git checkout tags/mecab-0.996
$ ./configure
$ make
$ sudo make install
$ mecab -v

https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/

$ wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
$ tar -xvzf mecab-ko-dic-2.1.1-20180720.tar.gz
$ cd mecab-ko-dic-2.1.1-20180720
$ brew install autoconf automake libtool
$ autoreconf
$ ./configure
$ make
$ sudo make install
$ ./tools/add-userdic.sh

$ tar cvzf custom-mecab-ko-dic.tar.gz mecab-ko-dic-2.1.1-20180720
$ git clone https://github.com/apache/lucene.git
$ git checkout tags/releases/lucene-solr/8.8.1
$ vi lucene/analysis/nori/ivy.xml

~       <!--artifact name="mecab-ko-dic" type=".tar.gz" url="https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz" /-->
+         <artifact name="mecab-ko-dic" type=".tar.gz" url="file:///Users/mzc02-henryjeong/Temp/fastcampus/analysis-nori/custom-mecab-ko-dic.tar.gz" />

 

$ vi lucene/analysis/nori/build.xml

~   <!--property name="dict.version" value="mecab-ko-dic-2.0.3-20170922" /-->
+   <property name="dict.version" value="mecab-ko-dic-2.1.1-20180720" />

 

$ cd lucene/analysis/nori
// apache ant 설치
$ mkdir -p ~/.ant/lib
$ ant ivy-bootstrap
$ ant regenerate
build-dict:
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$buffer.dat
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$fst.dat
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$posDict.dat
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$targetMap.dat
[java] Exception in thread "main" java.lang.AssertionError
[java] at org.apache.lucene.analysis.ko.util.BinaryDictionaryWriter.put(BinaryDictionaryWriter.java:112)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryWriter.put(UnknownDictionaryWriter.java:39)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryBuilder.readDictionaryFile(UnknownDictionaryBuilder.java:71)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryBuilder.readDictionaryFile(UnknownDictionaryBuilder.java:47)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryBuilder.build(UnknownDictionaryBuilder.java:41)
[java] at org.apache.lucene.analysis.ko.util.DictionaryBuilder.build(DictionaryBuilder.java:39)
[java] at org.apache.lucene.analysis.ko.util.DictionaryBuilder.main(DictionaryBuilder.java:52)

BUILD FAILED
$ git status .
HEAD detached at releases/lucene-solr/8.8.1
Changes not staged for commit:
(use "git add/rm ..." to update what will be committed)
(use "git restore ..." to discard changes in working directory)
modified: build.xml
modified: ivy.xml
deleted: src/resources/org/apache/lucene/analysis/ko/dict/CharacterDefinition.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/ConnectionCosts.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$buffer.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$fst.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$posDict.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$targetMap.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/UnknownDictionary$buffer.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/UnknownDictionary$posDict.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/UnknownDictionary$targetMap.dat
$ ant jar
...중략...
-jar-core:
[jar] Building jar: /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/build/analysis/nori/lucene-analyzers-nori-8.8.1-SNAPSHOT.jar
...중략...

/Users/mzc02-henryjeong/Works/app/apache-ant-1.10.10

일단 시간이 별로 없어서 이 정도까지만 테스트 하고 오류는 나중에 심각 하게 살펴 보겠습니다.

Arirang 만 잘 해도 되는데 Nori 도 할 줄 알아야 하니까...
근데 사전 관리 방식은 Arirang 이 편하고 좋습니다.

사실 한자 사전 고치려다가 여기까지 왔네요.

:

[Elasticsearch] Arirang Plugin에 테스트로 Jamo Tokenizer API 넣어 봤습니다.

Elastic/Elasticsearch 2021. 4. 30. 09:25

https://github.com/HowookJeong/elasticsearch-analysis-arirang/tree/hanguel-jamo-tokenizer-7.12.0

 

Checkout 받으신 후 빌드 하시고 설치 하시면 됩니다.

 

$ mvn clean install -DskipTests=true
$ bin/elasticsearch-plugin install file:///Users/mzc02-henryjeong/Works/github/howookjeong/elasticsearch-analysis-arirang/target/elasticsearch-analysis-arirang-7.12.0.zip

 

[Request]
curl --location --request POST 'http://localhost:9200/_arirang/jamo?text=엘라스틱서치&token=CHOSUNG'

 

[Method]

GET / POST

 

[Response]
CHOSUNG -> ㅇㄹㅅㅌㅅㅊ
JUNGSUNG -> ㅔㅏㅡㅣㅓㅣ
JONGSUNG -> ㄹㄱ
KORTOENG -> dpffktmxlrtjcl

 

[Parameters]

  • text
    형태소 분석할 문자열
  • token
    분석 유형 지정
    CHOSUNG (초성)
    JUNGSUNG (중성)
    JONGSUNG (종성)
    KORTOENG (한영 변환)

기능 테스트로 넣어 둔거라서 성능적인 검증은 하지 않았습니다.

:

[Elasticsearch] esrally 링크.

Elastic/Elasticsearch 2021. 4. 9. 08:31

https://github.com/elastic/rally
https://esrally.readthedocs.io/en/stable/

 

elasticsearch cluster 성능 점검용으로 활용 하면 좋아요.

:

[Elasticsearch] Elastic APM Quick 구성

Elastic/Elasticsearch 2021. 4. 6. 14:39

Elastic 사에서 제공 하는 다양한 도구와 서비스 들이 있습니다.

APM 이라는 아주 좋은 도구도 제공 하는데요.

Quick 하게 필요한 정보만 기록해 봅니다.

 

[Elastic APM Server]

https://www.elastic.co/guide/en/apm/server/current/overview.html
https://www.elastic.co/downloads/apm

 

[Elastic APM Agent]

https://www.elastic.co/guide/en/apm/agent/java/current/intro.html
https://search.maven.org/search?q=g:co.elastic.apm%20AND%20a:elastic-apm-agent

 

<intellij 에서 vm 옵션으로 등록합니다.>
-javaagent:/Users/mzc02-henryjeong/Works/elastic/apm-agent/elastic-apm-agent-1.22.0.jar -Delastic.apm.service_name=poc-service -Delastic.apm.application_packages=com.mzc.poc -Delastic.apm.server_url=http://localhost:8200

 

<Kibana 에서 Index Pattern 등록 하고 Discover 합니다.>

apm-{versin}-onboarding-*
apm-{versin}-span-*
apm-{versin}-error-*
apm-{versin}-transaction-*
apm-{versin}-profile-*
apm-{versin}-metric-*

- alias 로 자동 생성 되어 있음.

 

구성 시 사전 필요한 stack 은)

- Elasticsearch

- Kibana

- Spring Boot Web Application

:

[Elasticsearch] Nori Analyzer 테스트

Elastic/Elasticsearch 2021. 4. 5. 17:23

Nori Analyzer 기본 테스트 입니다.

공홈 참고문서)

www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html

 

기본 사전)

bitbucket.org/eunjeon/mecab-ko-dic/src/master/

 

POS Tag)

lucene.apache.org/core/8_8_0/analyzers-nori/org/apache/lucene/analysis/ko/POS.Tag.html

 

여기서 주의 할 점은 filter 선언 시 postags 가 아닌 stoptags 로 선언 하셔야 합니다.

제가 실수로 postags 로 작성을 했었네요. (수정해 두었습니다.)

 

_analyze  API 를 이용해서 RESTful API 호출로 테스트 한 내용입니다.

{
    "tokenizer": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed",
        "discard_punctuation": "true",
        "user_dictionary_rules": ["c++ c+ +", "C샤프", "세종", "세종시 세종 시"]
    },
    "filter": [
        {        
            "type": "nori_part_of_speech",
            "stoptags": [
                "E",
                "IC",
                "J",
                "MAG", "MAJ", "MM",
                "SP", "SSC", "SSO", "SC", "SE",
                "XPN", "XSA", "XSN", "XSV",
                "UNA", "NA", "VSV"
            ]
        },
        {
            "type": "nori_readingform"
        }
    ],
    "text": "世宗市에서 c++ 언어를 가르치는 학원이 있나요?",
    "attributes" : ["posType", "leftPOS", "rightPOS", "morphemes", "reading"],
    "explain": true        
}
더보기

실행한 결과)

{
    "detail": {
        "custom_analyzer": true,
        "charfilters": [],
        "tokenizer": {
            "name": "__anonymous__nori_tokenizer",
            "tokens": [
                {
                    "token": "世宗",
                    "start_offset": 0,
                    "end_offset": 2,
                    "type": "word",
                    "position": 0,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": "세종",
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "市",
                    "start_offset": 2,
                    "end_offset": 3,
                    "type": "word",
                    "position": 1,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": "시",
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "에서",
                    "start_offset": 3,
                    "end_offset": 5,
                    "type": "word",
                    "position": 2,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "c++",
                    "start_offset": 6,
                    "end_offset": 9,
                    "type": "word",
                    "position": 3,
                    "positionLength": 2,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                    "posType": "COMPOUND",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "c+",
                    "start_offset": 6,
                    "end_offset": 8,
                    "type": "word",
                    "position": 3,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "+",
                    "start_offset": 8,
                    "end_offset": 9,
                    "type": "word",
                    "position": 4,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "언어",
                    "start_offset": 10,
                    "end_offset": 12,
                    "type": "word",
                    "position": 5,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "를",
                    "start_offset": 12,
                    "end_offset": 13,
                    "type": "word",
                    "position": 6,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "가르치",
                    "start_offset": 14,
                    "end_offset": 17,
                    "type": "word",
                    "position": 7,
                    "leftPOS": "VV(Verb)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "VV(Verb)"
                },
                {
                    "token": "는",
                    "start_offset": 17,
                    "end_offset": 18,
                    "type": "word",
                    "position": 8,
                    "leftPOS": "E(Verbal endings)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "E(Verbal endings)"
                },
                {
                    "token": "학원",
                    "start_offset": 19,
                    "end_offset": 21,
                    "type": "word",
                    "position": 9,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "이",
                    "start_offset": 21,
                    "end_offset": 22,
                    "type": "word",
                    "position": 10,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "있",
                    "start_offset": 23,
                    "end_offset": 24,
                    "type": "word",
                    "position": 11,
                    "leftPOS": "VA(Adjective)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "VA(Adjective)"
                },
                {
                    "token": "나요",
                    "start_offset": 24,
                    "end_offset": 26,
                    "type": "word",
                    "position": 12,
                    "leftPOS": "E(Verbal endings)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "E(Verbal endings)"
                }
            ]
        },
        "tokenfilters": [
            {
                "name": "__anonymous__nori_part_of_speech",
                "tokens": [
                    {
                        "token": "世宗",
                        "start_offset": 0,
                        "end_offset": 2,
                        "type": "word",
                        "position": 0,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "세종",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "市",
                        "start_offset": 2,
                        "end_offset": 3,
                        "type": "word",
                        "position": 1,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "시",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c++",
                        "start_offset": 6,
                        "end_offset": 9,
                        "type": "word",
                        "position": 3,
                        "positionLength": 2,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                        "posType": "COMPOUND",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c+",
                        "start_offset": 6,
                        "end_offset": 8,
                        "type": "word",
                        "position": 3,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "+",
                        "start_offset": 8,
                        "end_offset": 9,
                        "type": "word",
                        "position": 4,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "언어",
                        "start_offset": 10,
                        "end_offset": 12,
                        "type": "word",
                        "position": 5,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "가르치",
                        "start_offset": 14,
                        "end_offset": 17,
                        "type": "word",
                        "position": 7,
                        "leftPOS": "VV(Verb)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VV(Verb)"
                    },
                    {
                        "token": "학원",
                        "start_offset": 19,
                        "end_offset": 21,
                        "type": "word",
                        "position": 9,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "있",
                        "start_offset": 23,
                        "end_offset": 24,
                        "type": "word",
                        "position": 11,
                        "leftPOS": "VA(Adjective)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VA(Adjective)"
                    }
                ]
            },
            {
                "name": "__anonymous__nori_readingform",
                "tokens": [
                    {
                        "token": "세종",
                        "start_offset": 0,
                        "end_offset": 2,
                        "type": "word",
                        "position": 0,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "세종",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "시",
                        "start_offset": 2,
                        "end_offset": 3,
                        "type": "word",
                        "position": 1,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "시",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c++",
                        "start_offset": 6,
                        "end_offset": 9,
                        "type": "word",
                        "position": 3,
                        "positionLength": 2,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                        "posType": "COMPOUND",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c+",
                        "start_offset": 6,
                        "end_offset": 8,
                        "type": "word",
                        "position": 3,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "+",
                        "start_offset": 8,
                        "end_offset": 9,
                        "type": "word",
                        "position": 4,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "언어",
                        "start_offset": 10,
                        "end_offset": 12,
                        "type": "word",
                        "position": 5,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "가르치",
                        "start_offset": 14,
                        "end_offset": 17,
                        "type": "word",
                        "position": 7,
                        "leftPOS": "VV(Verb)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VV(Verb)"
                    },
                    {
                        "token": "학원",
                        "start_offset": 19,
                        "end_offset": 21,
                        "type": "word",
                        "position": 9,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "있",
                        "start_offset": 23,
                        "end_offset": 24,
                        "type": "word",
                        "position": 11,
                        "leftPOS": "VA(Adjective)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VA(Adjective)"
                    }
                ]
            }
        ]
    }
}

synonyms filter 추가)

주의 할 사항은 user_dic.txt 에 정의 되지 않은 단어의 경우 의도한 결과가 나오지 않을 수 있습니다.

{
    "tokenizer": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed",
        "discard_punctuation": "true",
        "user_dictionary_rules": ["c++ c+", "c샤프", "c샵", "삼성전자", "세종", "세종시 세종 시"]
    },
    "filter": [
        {
            "type": "synonym_graph",            
            "synonyms": [ 
                "삼성전자, 삼전",
                "c샤프, c샵"
            ]
        },
        {        
            "type": "nori_part_of_speech",
            "stoptags": [
                "E",
                "IC",
                "J",
                "MAG", "MAJ", "MM",
                "SP", "SSC", "SSO", "SC", "SE",
                "XPN", "XSA", "XSN", "XSV",
                "UNA", "NA", "VSV"
            ]
        },
        {
            "type": "nori_readingform"
        }        
    ],
    "text": "世宗市에서 c++, c샤프 언어를 가르치는 삼성전자 학원이 있나요?",
    "attributes" : ["posType", "leftPOS", "rightPOS", "morphemes", "reading"],
    "explain": false        
}

nori_userdict.txt)

user_dictionary_rules 를 user_dictionary 로 변경해서 설정을 하게 되면 아래와 같습니다.

  • "user_dictionary": "nori_userdict.txt"
    • 위 파일은 elasticsearch 가 설치된 위치의 config 경로 아래 위치 합니다.
c++ c+
c샤프
c샵
삼성전자
세종
세종시 세종 시

 

:

[Elasticsearch] Document Indexing 관련

Elastic/Elasticsearch 2021. 4. 5. 15:36

Elasticsearch 에서 Indexing 관련해서 봐두면 좋은 Class 입니다.

 

  • InternalEngine
    • Node 레벨에서 선언 되며, Elasticsearch 에서의 대부분의 Operation 에 대한 정의가 되어 있습니다.
  • NodeClient
    • Elasticsearch Cluster 구성 시 Node 에 해당 합니다.
  • IndexShard
    • 물리적인 Index 의 Operation 에 대한 정의가 되어 있습니다.
  • Translog
    • Commit 되지 않은 색인 작업 내역에 대한 Operation 정의가 되어 있습니다.

Flush 에 대한 대략적인 흐름)

    Commit 하면 tranlog 를 indexWriter 가 segments 파일에 write 하고 tranlog 는 flush 되면서 refresh 동기화가 이루어 집니다.
    (Synced flush 의 경우 refresh 가 먼저 수행 됩니다.)

 

:

[Elastic] Elasticsearch Cluster 구성하기 based on IaC

Elastic 2021. 2. 2. 12:04

https://github.com/HowookJeong/ecos-installer-web

배포 및 실행 가이드

Prerequisite)

  • bastion machine 은 ubuntu 를 사용 합니다.
  • bastion 및 ec2 instance 에 ssh tunnuling 을 위한 key pairs 생성을 합니다.
  • 로컬 장비에서 실행 하기 위해 계정의 access/secret key 생성을 합니다.

Step 1) Local 실행 환경 구성

$ aws configure --profile ecos 

Put region : ap-northeast-2
Put output : json
Put access key : xxxxxxxxxxxxxx
Put secret key : xxxxxxxxxxxxxx
Put key pairs file to ~/.ssh/

 

Step 2) Terraform & Ansible 환경 설정

$ vi docker-compose.yml

 environment:
 ...중략...
   - serverPort=${SERVER_PORT}
   - configWorkingPath=/tmp/home/mzc/app
   - configTerraformAwsBastionIp=${BASTION_IP}
   - configTerraformAwsSecurityGroup=sg-xxxxxxxxxxxxxxxx
   - configTerraformAwsAz=ap-northeast-2a
   - configTerraformAwsAmi=ami-061b0ee20654981ab
   - configTerraformAwsSubnet=subnet-xxxxxxxxxxxxxxxx
   - configTerraformAwsKeyName=ec2key-gw
   - configTerraformAwsPemFile=ec2key-gw.pem
   - configTerraformAwsPathElasticsearch=/tmp/home/mzc/app/terraform/_CLUSTERNAME_
   - configTerraformAwsPathKibana=/tmp/home/mzc/app/terraform/_CLUSTERNAME_/kibana
   - configTerraformAwsBackendBucket=megatoi-terraform-state
   - configTerraformAwsBackendKeyElasticsearch=_CLUSTERNAME_/terraform.tfstate
   - configTerraformAwsBackendKeyKibana=_CLUSTERNAME_/kibana/terraform.tfstate
 volumes:
 ...중략...
   - /Users/계정/.aws:/root/.aws
   - /Users/계정/.ssh:/root/.ssh

 

Step 3) 배포 된 docker image load

$ sudo docker load -i ecos-installer-web-0.0.1.tar

 

Step 4) 컨테이너 실행/중지

$ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose up -d

$ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose down

Local 개발 환경

  • $ cd .aws
  • $ aws configure --profile ecos
  • $ vi config

[profile ecos]

region = ap-northeast-2

output = json

  • $ vi credentials

[ecos]

aws_access_key_id = xxxxxxxxxxxxxxxxx

aws_secret_access_key = xxxxxxxxxxxxxxxxxx

Project Docker Compose 설정

...중략...
    volumes:
      - /Users/mzc02-henryjeong/.aws:/root/.aws
      - /Users/mzc02-henryjeong/.ssh:/root/.ssh
      - /Users/mzc02-henryjeong/Temp/logs:/home/mzc/logs
      - /var/run/docker.sock:/var/run/docker.sock
      - /Users/mzc02-henryjeong/Works/app/terraform:/home/mzc/backup/terraform
...중략...
  • aws 접속 및 ssh 터널링을 위해 관련 path 에 대한 mount 를 합니다.

Build Step

  • $ ./gradlew clean build bootJar -Pprofile=dev -x test
  • $ docker build --build-arg BASTION_IP=xxx.xxx.xxx.xxx --tag ecos-installer-web:0.0.1 .
  • OR $ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose build
  • $ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose up
  • $ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose down
  • $ docker image ls
  • $ docker rmi -f 7f52709a6615
  • $ docker exec -it ecos-installer-web /bin/sh
  • $ sudo docker save -o ecos-installer-web-0.0.1.tar ecos-installer-web:0.0.1
  • $ sudo docker load -i ecos-installer-web-0.0.1.tar

Terraform path 와 Elasticsearch Cluster 명명 규칙

  • Terraform File Path : /tmp/home/mzc/app/terraform/${CLUSTERNAME}/${TIMESTAMP}
  • Backend Key : ${CLUSTERNAME}/${TIMESTAMP}/terraform.tfstate
  • 신규 생성 시
    • Step 1) Terraform File Path : /tmp/home/mzc/app/terraform/elasticsearch
    • Step 1) Backend Key : elasticsearch/terraform.tfstate
  • 추가 시
    • Step 1) Terraform File Path : /tmp/home/mzc/app/terraform/elasticsearch/1598860075233
    • Step 1) Backend Key : elasticsearch/1598860075233/terraform.tfstate
    • Step 2) Backend Key : elasticsearch/1598860075233/terraform.tfstate
    • 기존 클러스터에 Join 시키기 위해 master ip 정보를 구해야 함

생성 및 설정

  • aws account access/secret key 생성
  • aws configure 설정
  • bastion 서버 생성
  • vpc 내 정보 설정
    • security group
    • subnet
    • az
    • ec2 네트워크 및 보안에서 키 페어 생성 및 등록 (keyName, keyPem)
    • ami
  • terraform 정보 설정
    • terraform working path 설정
    • terraform backend 설정
  • aws cluster instance 설정
    • node topology 설정 (node 유형)
    • instance type 설정 (cpu, mem, network 성능)
    • instance size 설정 (node 규모)
    • disk volume size 설정 (elasticsearch storage)
  • elasticsearch cluster 설정
    • cluster name 설정
    • 설치를 위한 elasticsearch version 지정
    • port 설정 (http, tcp)
    • path.data/logs 설정
  • ansible 설정
    • working path 설정
    • bastion ip 설정

Service Flow

  • TerraformService

    • terraform
      • createTerraformS3Backend
      • readTerraformTemplateForElasticsearch
      • writeTerraformTemplateForElasticsearch
      • runTerraformTemplateForElasticsearch
      • backupTerraformTemplateStateForElasticsearch (if it is not s3 backend)
  • ElasticsearchService

    • docker
      • createDockerComposeConfiguration
    • ansible
      • createAnsibleInventories
      • createAnsibleRoles

runAnsiblePlaybook

:

[Elasticsearch] Discovery Mode 정리.

Elastic/Elasticsearch 2020. 12. 11. 11:34

참고문서)

www.elastic.co/guide/en/elasticsearch/reference/7.x/modules-discovery.html

 

 

1. 단독 구성

discovery.type=single-node

 

2. 클러스터 구성

discovery.seed_hosts=e1,e2,e3

cluster.initial_master_nodes=e1,e2,e3

 

1번과 같이 단독 구성은 어떤 형태로든 클러스터 환경 구성이 안됩니다.

또한, 단독 구성 노드를 동일 인스턴스, 로컬 환경에서 여러 개 실행이 되지 않습니다.

 

2번과 같은 클러스터 구성에서는 최소 2대 이상의 구성이 필요 하며,

Master 노드에 대한 자격을 가지는 노드도 또한 2대 이상 필요 합니다.

 

3개 노드 구성 시)
Master 노드가 죽게 되면, Master 노드 자격 노드가 Master 로 선출 되며 서비스가 가능 합니다.

2개 노드 구성 시)
Master 노드가 죽게 되면 서비스가 불가능 합니다.

Master 노드 이외 다른 노드가 죽어도 서비스는 불가능 합니다.

 

간혹, 클러스터 구성 시 Master 노드에 대한 쿼럼 구성을 오해 하시는 경우가 있어서 작성해 보았습니다.

 

:

[Similarity] Universal Sentence Encoder/4 - 링크.

ITWeb/검색일반 2020. 10. 20. 17:32

https://tfhub.dev/google/universal-sentence-encoder/4

https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch

 

Elasticsearch 내 dense_vector type 을 이용한 Similarity 검색을 하기 위해 필요한 거라 링크 걸어 둡니다.

:

[Elasticsearch] 멀티노드 논리적 실행.

Elastic/Elasticsearch 2020. 9. 23. 16:53

elasticsearch-version.tar.gz 을 받아서 압축 해제 한 후에 단일 인스턴스에서 여러개의 노드를 실행 시켜 클러스터 구성을 하기 위한 방법 입니다.

 

Case 1)

$ ES_PATH_CONF=config bin/elasticsearch -Epath.data=data1 -Epath.logs=logs1 -d -p 1.pid

$ ES_PATH_CONF=config bin/elasticsearch -Epath.data=data2 -Epath.logs=logs2 -d -p 2.pid

$ ES_PATH_CONF=config bin/elasticsearch -Epath.data=data3 -Epath.logs=logs3 -d -p 3.pid

 

Case 2)

$ ES_PATH_CONF=config1 bin/elasticsearch -Epath.data=data1 -Epath.logs=logs1 -d -p 1.pid

$ ES_PATH_CONF=config2 bin/elasticsearch -Epath.data=data2 -Epath.logs=logs2 -d -p 2.pid

$ ES_PATH_CONF=config3 bin/elasticsearch -Epath.data=data3 -Epath.logs=logs3 -d -p 3.pid

 

별다른건 하나도 없으며, 활용하는 방법에 대한 차이 정도로 보면 될 것 같습니다.

: