'Elastic'에 해당되는 글 498건

  1. 2021.05.11 [Elasticsearch] Nori 사전 빌드하기.
  2. 2021.04.30 [Elasticsearch] Arirang Plugin에 테스트로 Jamo Tokenizer API 넣어 봤습니다.
  3. 2021.04.09 [Elasticsearch] esrally 링크.
  4. 2021.04.06 [Elasticsearch] Elastic APM Quick 구성
  5. 2021.04.05 [Elasticsearch] Nori Analyzer 테스트
  6. 2021.04.05 [Elasticsearch] Document Indexing 관련
  7. 2021.02.15 [Metricbeat] Module enable/disable.
  8. 2021.02.09 [Elastic] Enterprise Search - App Search JSON Document.
  9. 2021.02.04 [Elastic] Workplace Search ...
  10. 2021.02.02 [Elastic] Elasticsearch Cluster 구성하기 based on IaC 1

[Elasticsearch] Nori 사전 빌드하기.

Elastic/Elasticsearch 2021. 5. 11. 20:51

[추가 사항]

https://issues.apache.org/jira/browse/SOLR-12655?focusedCommentId=16604160&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&fbclid=IwAR3jRIpaCQ497v-qhofkc3DVmNabPab1ErDQhXnOsA0LNoqpHypa5cUSpy0#comment-16604160

 

아래 발생한 오류는 UnknownDictionaryBuilder.java 에서 아래 코드 수정으로 해결 되었습니다.

기본적으로 ivy.xml, build,xml 에 보면 사전 버전 정보가 들어가 있습니다.

이 사전이 변경 되면서 POS tag list 가 달라 졌는데요. 이 영향으로 에러가 발생 하게 됩니다.

private static final String NGRAM_DICTIONARY_ENTRY = "NGRAM,1801,3561,3668,SY,*,*,*,*,*,*,*";

코드를 수정 하지 않으려면 사전 버전을 맞춰서 사용 하시면 됩니다.

 

Elasticsearch User Group 에 #유정인 님이 도움 주셨습니다.


https://github.com/jimczi/nori/blob/master/how-to-custom-dict.asciidoc

https://bitbucket.org/eunjeon/mecab-ko/src/mecab-0.996/
https://bitbucket.org/eunjeon/mecab-ko-dic/src/v2.1.1/

$ git clone https://bitbucket.org/eunjeon/mecab-ko.git
$ git checkout tags/mecab-0.996
$ ./configure
$ make
$ sudo make install
$ mecab -v

https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/

$ wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
$ tar -xvzf mecab-ko-dic-2.1.1-20180720.tar.gz
$ cd mecab-ko-dic-2.1.1-20180720
$ brew install autoconf automake libtool
$ autoreconf
$ ./configure
$ make
$ sudo make install
$ ./tools/add-userdic.sh

$ tar cvzf custom-mecab-ko-dic.tar.gz mecab-ko-dic-2.1.1-20180720
$ git clone https://github.com/apache/lucene.git
$ git checkout tags/releases/lucene-solr/8.8.1
$ vi lucene/analysis/nori/ivy.xml

~       <!--artifact name="mecab-ko-dic" type=".tar.gz" url="https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz" /-->
+         <artifact name="mecab-ko-dic" type=".tar.gz" url="file:///Users/mzc02-henryjeong/Temp/fastcampus/analysis-nori/custom-mecab-ko-dic.tar.gz" />

 

$ vi lucene/analysis/nori/build.xml

~   <!--property name="dict.version" value="mecab-ko-dic-2.0.3-20170922" /-->
+   <property name="dict.version" value="mecab-ko-dic-2.1.1-20180720" />

 

$ cd lucene/analysis/nori
// apache ant 설치
$ mkdir -p ~/.ant/lib
$ ant ivy-bootstrap
$ ant regenerate
build-dict:
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$buffer.dat
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$fst.dat
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$posDict.dat
[delete] Deleting /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/analysis/nori/src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$targetMap.dat
[java] Exception in thread "main" java.lang.AssertionError
[java] at org.apache.lucene.analysis.ko.util.BinaryDictionaryWriter.put(BinaryDictionaryWriter.java:112)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryWriter.put(UnknownDictionaryWriter.java:39)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryBuilder.readDictionaryFile(UnknownDictionaryBuilder.java:71)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryBuilder.readDictionaryFile(UnknownDictionaryBuilder.java:47)
[java] at org.apache.lucene.analysis.ko.util.UnknownDictionaryBuilder.build(UnknownDictionaryBuilder.java:41)
[java] at org.apache.lucene.analysis.ko.util.DictionaryBuilder.build(DictionaryBuilder.java:39)
[java] at org.apache.lucene.analysis.ko.util.DictionaryBuilder.main(DictionaryBuilder.java:52)

BUILD FAILED
$ git status .
HEAD detached at releases/lucene-solr/8.8.1
Changes not staged for commit:
(use "git add/rm ..." to update what will be committed)
(use "git restore ..." to discard changes in working directory)
modified: build.xml
modified: ivy.xml
deleted: src/resources/org/apache/lucene/analysis/ko/dict/CharacterDefinition.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/ConnectionCosts.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$buffer.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$fst.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$posDict.dat
modified: src/resources/org/apache/lucene/analysis/ko/dict/TokenInfoDictionary$targetMap.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/UnknownDictionary$buffer.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/UnknownDictionary$posDict.dat
deleted: src/resources/org/apache/lucene/analysis/ko/dict/UnknownDictionary$targetMap.dat
$ ant jar
...중략...
-jar-core:
[jar] Building jar: /Users/mzc02-henryjeong/Temp/analysis-nori/lucene/lucene/build/analysis/nori/lucene-analyzers-nori-8.8.1-SNAPSHOT.jar
...중략...

/Users/mzc02-henryjeong/Works/app/apache-ant-1.10.10

일단 시간이 별로 없어서 이 정도까지만 테스트 하고 오류는 나중에 심각 하게 살펴 보겠습니다.

Arirang 만 잘 해도 되는데 Nori 도 할 줄 알아야 하니까...
근데 사전 관리 방식은 Arirang 이 편하고 좋습니다.

사실 한자 사전 고치려다가 여기까지 왔네요.

:

[Elasticsearch] Arirang Plugin에 테스트로 Jamo Tokenizer API 넣어 봤습니다.

Elastic/Elasticsearch 2021. 4. 30. 09:25

https://github.com/HowookJeong/elasticsearch-analysis-arirang/tree/hanguel-jamo-tokenizer-7.12.0

 

Checkout 받으신 후 빌드 하시고 설치 하시면 됩니다.

 

$ mvn clean install -DskipTests=true
$ bin/elasticsearch-plugin install file:///Users/mzc02-henryjeong/Works/github/howookjeong/elasticsearch-analysis-arirang/target/elasticsearch-analysis-arirang-7.12.0.zip

 

[Request]
curl --location --request POST 'http://localhost:9200/_arirang/jamo?text=엘라스틱서치&token=CHOSUNG'

 

[Method]

GET / POST

 

[Response]
CHOSUNG -> ㅇㄹㅅㅌㅅㅊ
JUNGSUNG -> ㅔㅏㅡㅣㅓㅣ
JONGSUNG -> ㄹㄱ
KORTOENG -> dpffktmxlrtjcl

 

[Parameters]

  • text
    형태소 분석할 문자열
  • token
    분석 유형 지정
    CHOSUNG (초성)
    JUNGSUNG (중성)
    JONGSUNG (종성)
    KORTOENG (한영 변환)

기능 테스트로 넣어 둔거라서 성능적인 검증은 하지 않았습니다.

:

[Elasticsearch] esrally 링크.

Elastic/Elasticsearch 2021. 4. 9. 08:31

https://github.com/elastic/rally
https://esrally.readthedocs.io/en/stable/

 

elasticsearch cluster 성능 점검용으로 활용 하면 좋아요.

:

[Elasticsearch] Elastic APM Quick 구성

Elastic/Elasticsearch 2021. 4. 6. 14:39

Elastic 사에서 제공 하는 다양한 도구와 서비스 들이 있습니다.

APM 이라는 아주 좋은 도구도 제공 하는데요.

Quick 하게 필요한 정보만 기록해 봅니다.

 

[Elastic APM Server]

https://www.elastic.co/guide/en/apm/server/current/overview.html
https://www.elastic.co/downloads/apm

 

[Elastic APM Agent]

https://www.elastic.co/guide/en/apm/agent/java/current/intro.html
https://search.maven.org/search?q=g:co.elastic.apm%20AND%20a:elastic-apm-agent

 

<intellij 에서 vm 옵션으로 등록합니다.>
-javaagent:/Users/mzc02-henryjeong/Works/elastic/apm-agent/elastic-apm-agent-1.22.0.jar -Delastic.apm.service_name=poc-service -Delastic.apm.application_packages=com.mzc.poc -Delastic.apm.server_url=http://localhost:8200

 

<Kibana 에서 Index Pattern 등록 하고 Discover 합니다.>

apm-{versin}-onboarding-*
apm-{versin}-span-*
apm-{versin}-error-*
apm-{versin}-transaction-*
apm-{versin}-profile-*
apm-{versin}-metric-*

- alias 로 자동 생성 되어 있음.

 

구성 시 사전 필요한 stack 은)

- Elasticsearch

- Kibana

- Spring Boot Web Application

:

[Elasticsearch] Nori Analyzer 테스트

Elastic/Elasticsearch 2021. 4. 5. 17:23

Nori Analyzer 기본 테스트 입니다.

공홈 참고문서)

www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html

 

기본 사전)

bitbucket.org/eunjeon/mecab-ko-dic/src/master/

 

POS Tag)

lucene.apache.org/core/8_8_0/analyzers-nori/org/apache/lucene/analysis/ko/POS.Tag.html

 

여기서 주의 할 점은 filter 선언 시 postags 가 아닌 stoptags 로 선언 하셔야 합니다.

제가 실수로 postags 로 작성을 했었네요. (수정해 두었습니다.)

 

_analyze  API 를 이용해서 RESTful API 호출로 테스트 한 내용입니다.

{
    "tokenizer": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed",
        "discard_punctuation": "true",
        "user_dictionary_rules": ["c++ c+ +", "C샤프", "세종", "세종시 세종 시"]
    },
    "filter": [
        {        
            "type": "nori_part_of_speech",
            "stoptags": [
                "E",
                "IC",
                "J",
                "MAG", "MAJ", "MM",
                "SP", "SSC", "SSO", "SC", "SE",
                "XPN", "XSA", "XSN", "XSV",
                "UNA", "NA", "VSV"
            ]
        },
        {
            "type": "nori_readingform"
        }
    ],
    "text": "世宗市에서 c++ 언어를 가르치는 학원이 있나요?",
    "attributes" : ["posType", "leftPOS", "rightPOS", "morphemes", "reading"],
    "explain": true        
}
더보기

실행한 결과)

{
    "detail": {
        "custom_analyzer": true,
        "charfilters": [],
        "tokenizer": {
            "name": "__anonymous__nori_tokenizer",
            "tokens": [
                {
                    "token": "世宗",
                    "start_offset": 0,
                    "end_offset": 2,
                    "type": "word",
                    "position": 0,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": "세종",
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "市",
                    "start_offset": 2,
                    "end_offset": 3,
                    "type": "word",
                    "position": 1,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": "시",
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "에서",
                    "start_offset": 3,
                    "end_offset": 5,
                    "type": "word",
                    "position": 2,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "c++",
                    "start_offset": 6,
                    "end_offset": 9,
                    "type": "word",
                    "position": 3,
                    "positionLength": 2,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                    "posType": "COMPOUND",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "c+",
                    "start_offset": 6,
                    "end_offset": 8,
                    "type": "word",
                    "position": 3,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "+",
                    "start_offset": 8,
                    "end_offset": 9,
                    "type": "word",
                    "position": 4,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "언어",
                    "start_offset": 10,
                    "end_offset": 12,
                    "type": "word",
                    "position": 5,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "를",
                    "start_offset": 12,
                    "end_offset": 13,
                    "type": "word",
                    "position": 6,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "가르치",
                    "start_offset": 14,
                    "end_offset": 17,
                    "type": "word",
                    "position": 7,
                    "leftPOS": "VV(Verb)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "VV(Verb)"
                },
                {
                    "token": "는",
                    "start_offset": 17,
                    "end_offset": 18,
                    "type": "word",
                    "position": 8,
                    "leftPOS": "E(Verbal endings)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "E(Verbal endings)"
                },
                {
                    "token": "학원",
                    "start_offset": 19,
                    "end_offset": 21,
                    "type": "word",
                    "position": 9,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "이",
                    "start_offset": 21,
                    "end_offset": 22,
                    "type": "word",
                    "position": 10,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "있",
                    "start_offset": 23,
                    "end_offset": 24,
                    "type": "word",
                    "position": 11,
                    "leftPOS": "VA(Adjective)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "VA(Adjective)"
                },
                {
                    "token": "나요",
                    "start_offset": 24,
                    "end_offset": 26,
                    "type": "word",
                    "position": 12,
                    "leftPOS": "E(Verbal endings)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "E(Verbal endings)"
                }
            ]
        },
        "tokenfilters": [
            {
                "name": "__anonymous__nori_part_of_speech",
                "tokens": [
                    {
                        "token": "世宗",
                        "start_offset": 0,
                        "end_offset": 2,
                        "type": "word",
                        "position": 0,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "세종",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "市",
                        "start_offset": 2,
                        "end_offset": 3,
                        "type": "word",
                        "position": 1,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "시",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c++",
                        "start_offset": 6,
                        "end_offset": 9,
                        "type": "word",
                        "position": 3,
                        "positionLength": 2,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                        "posType": "COMPOUND",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c+",
                        "start_offset": 6,
                        "end_offset": 8,
                        "type": "word",
                        "position": 3,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "+",
                        "start_offset": 8,
                        "end_offset": 9,
                        "type": "word",
                        "position": 4,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "언어",
                        "start_offset": 10,
                        "end_offset": 12,
                        "type": "word",
                        "position": 5,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "가르치",
                        "start_offset": 14,
                        "end_offset": 17,
                        "type": "word",
                        "position": 7,
                        "leftPOS": "VV(Verb)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VV(Verb)"
                    },
                    {
                        "token": "학원",
                        "start_offset": 19,
                        "end_offset": 21,
                        "type": "word",
                        "position": 9,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "있",
                        "start_offset": 23,
                        "end_offset": 24,
                        "type": "word",
                        "position": 11,
                        "leftPOS": "VA(Adjective)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VA(Adjective)"
                    }
                ]
            },
            {
                "name": "__anonymous__nori_readingform",
                "tokens": [
                    {
                        "token": "세종",
                        "start_offset": 0,
                        "end_offset": 2,
                        "type": "word",
                        "position": 0,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "세종",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "시",
                        "start_offset": 2,
                        "end_offset": 3,
                        "type": "word",
                        "position": 1,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "시",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c++",
                        "start_offset": 6,
                        "end_offset": 9,
                        "type": "word",
                        "position": 3,
                        "positionLength": 2,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                        "posType": "COMPOUND",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c+",
                        "start_offset": 6,
                        "end_offset": 8,
                        "type": "word",
                        "position": 3,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "+",
                        "start_offset": 8,
                        "end_offset": 9,
                        "type": "word",
                        "position": 4,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "언어",
                        "start_offset": 10,
                        "end_offset": 12,
                        "type": "word",
                        "position": 5,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "가르치",
                        "start_offset": 14,
                        "end_offset": 17,
                        "type": "word",
                        "position": 7,
                        "leftPOS": "VV(Verb)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VV(Verb)"
                    },
                    {
                        "token": "학원",
                        "start_offset": 19,
                        "end_offset": 21,
                        "type": "word",
                        "position": 9,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "있",
                        "start_offset": 23,
                        "end_offset": 24,
                        "type": "word",
                        "position": 11,
                        "leftPOS": "VA(Adjective)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VA(Adjective)"
                    }
                ]
            }
        ]
    }
}

synonyms filter 추가)

주의 할 사항은 user_dic.txt 에 정의 되지 않은 단어의 경우 의도한 결과가 나오지 않을 수 있습니다.

{
    "tokenizer": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed",
        "discard_punctuation": "true",
        "user_dictionary_rules": ["c++ c+", "c샤프", "c샵", "삼성전자", "세종", "세종시 세종 시"]
    },
    "filter": [
        {
            "type": "synonym_graph",            
            "synonyms": [ 
                "삼성전자, 삼전",
                "c샤프, c샵"
            ]
        },
        {        
            "type": "nori_part_of_speech",
            "stoptags": [
                "E",
                "IC",
                "J",
                "MAG", "MAJ", "MM",
                "SP", "SSC", "SSO", "SC", "SE",
                "XPN", "XSA", "XSN", "XSV",
                "UNA", "NA", "VSV"
            ]
        },
        {
            "type": "nori_readingform"
        }        
    ],
    "text": "世宗市에서 c++, c샤프 언어를 가르치는 삼성전자 학원이 있나요?",
    "attributes" : ["posType", "leftPOS", "rightPOS", "morphemes", "reading"],
    "explain": false        
}

nori_userdict.txt)

user_dictionary_rules 를 user_dictionary 로 변경해서 설정을 하게 되면 아래와 같습니다.

  • "user_dictionary": "nori_userdict.txt"
    • 위 파일은 elasticsearch 가 설치된 위치의 config 경로 아래 위치 합니다.
c++ c+
c샤프
c샵
삼성전자
세종
세종시 세종 시

 

:

[Elasticsearch] Document Indexing 관련

Elastic/Elasticsearch 2021. 4. 5. 15:36

Elasticsearch 에서 Indexing 관련해서 봐두면 좋은 Class 입니다.

 

  • InternalEngine
    • Node 레벨에서 선언 되며, Elasticsearch 에서의 대부분의 Operation 에 대한 정의가 되어 있습니다.
  • NodeClient
    • Elasticsearch Cluster 구성 시 Node 에 해당 합니다.
  • IndexShard
    • 물리적인 Index 의 Operation 에 대한 정의가 되어 있습니다.
  • Translog
    • Commit 되지 않은 색인 작업 내역에 대한 Operation 정의가 되어 있습니다.

Flush 에 대한 대략적인 흐름)

    Commit 하면 tranlog 를 indexWriter 가 segments 파일에 write 하고 tranlog 는 flush 되면서 refresh 동기화가 이루어 집니다.
    (Synced flush 의 경우 refresh 가 먼저 수행 됩니다.)

 

:

[Metricbeat] Module enable/disable.

Elastic/Beats 2021. 2. 15. 10:05

[Enable]
$ ./metricbeat modules enable apache mysql

 

[Disable]
$ ./metricbeat modules disable apache mysql

:

[Elastic] Enterprise Search - App Search JSON Document.

Elastic 2021. 2. 9. 11:30

App Search 에서 JSON Document 색인을 하려다 보니, 에러가 발생을 해서 기록해 둡니다.

Elasticsearch 에서는 문제가 안되는 부분 입니다.

 

Field 명 작성 시 주의 사항)

- lowercase 로 작성이 되어야 합니다.

 

처음 부터 문서 설계 시 lowercase 로 설계 하시기 바랍니다.

이미 대소문자가 섞여 있는 것들에 대한 추가 작업이 필요 한데 왜 안쓰는지 알겠네요.

:

[Elastic] Workplace Search ...

Elastic 2021. 2. 4. 11:22

업무 협업 도구들이 많이 나오면서 생산 되는 정보와 문서에 대한 검색 Needs 가 생기는 건 자연 스러운 현상이라고 생각 합니다.

개인적으로는 대부분 오픈소스를 이용해서 개발 환경을 구성하고 사용을 하고 있다 보니 비용이 발생 하는 도구를 선택 하기가 쉽지 않은 건 또 다른 현실 인것 같습니다.

 

https://www.elastic.co/guide/en/workplace-search/current/workplace-search-install.html

 

Elastic 사에서 무료로 제공 하고 있는 Workplace Search 라는 것이 있습니다.

Enterprise Search 에서 제공 하는 서비스 입니다.

 

저는 개인적으로 Slack 에 대한 사용을 해보고 싶어서 구성해서 테스트를 진행하려 했는데요.

ㅠ.ㅠ

 

Slack 하필 Content Sources 중 현재 유료 버전에 속한 것중 하나 더라구요.

그냥 이거 무료로 풀어 주시면 참 좋을 텐데 말입니다.

 

Workplace Search 에서 무료로 추가해서 사용 할 수 있으면 너무 편하겠지만 이게 안된다고 하면,

그냥 Slack API 를 이용해서 메시지를 가져와 Elasticsearch 로 직접 색인 하는 방법으로 진행 하면 되긴 합니다.

 

어차피 Enterprise Search 에서 Workplace Search 라는 게 독립적으로 동작 하는 방식이 아닌

    Slack <---> Workplace Search Slack Content Source 등록 <---> Elasticsearch

와 같이 동작 하니까 말이죠.

 

그럼에도 불구하고 Slack, Gmail 을 제외 하고 나머지는 무료 이니 필요 하신 분들은 잘 활용 하시면 업무에 도움이 되지 않을까 생각 합니다.

 

:

[Elastic] Elasticsearch Cluster 구성하기 based on IaC

Elastic 2021. 2. 2. 12:04

https://github.com/HowookJeong/ecos-installer-web

배포 및 실행 가이드

Prerequisite)

  • bastion machine 은 ubuntu 를 사용 합니다.
  • bastion 및 ec2 instance 에 ssh tunnuling 을 위한 key pairs 생성을 합니다.
  • 로컬 장비에서 실행 하기 위해 계정의 access/secret key 생성을 합니다.

Step 1) Local 실행 환경 구성

$ aws configure --profile ecos 

Put region : ap-northeast-2
Put output : json
Put access key : xxxxxxxxxxxxxx
Put secret key : xxxxxxxxxxxxxx
Put key pairs file to ~/.ssh/

 

Step 2) Terraform & Ansible 환경 설정

$ vi docker-compose.yml

 environment:
 ...중략...
   - serverPort=${SERVER_PORT}
   - configWorkingPath=/tmp/home/mzc/app
   - configTerraformAwsBastionIp=${BASTION_IP}
   - configTerraformAwsSecurityGroup=sg-xxxxxxxxxxxxxxxx
   - configTerraformAwsAz=ap-northeast-2a
   - configTerraformAwsAmi=ami-061b0ee20654981ab
   - configTerraformAwsSubnet=subnet-xxxxxxxxxxxxxxxx
   - configTerraformAwsKeyName=ec2key-gw
   - configTerraformAwsPemFile=ec2key-gw.pem
   - configTerraformAwsPathElasticsearch=/tmp/home/mzc/app/terraform/_CLUSTERNAME_
   - configTerraformAwsPathKibana=/tmp/home/mzc/app/terraform/_CLUSTERNAME_/kibana
   - configTerraformAwsBackendBucket=megatoi-terraform-state
   - configTerraformAwsBackendKeyElasticsearch=_CLUSTERNAME_/terraform.tfstate
   - configTerraformAwsBackendKeyKibana=_CLUSTERNAME_/kibana/terraform.tfstate
 volumes:
 ...중략...
   - /Users/계정/.aws:/root/.aws
   - /Users/계정/.ssh:/root/.ssh

 

Step 3) 배포 된 docker image load

$ sudo docker load -i ecos-installer-web-0.0.1.tar

 

Step 4) 컨테이너 실행/중지

$ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose up -d

$ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose down

Local 개발 환경

  • $ cd .aws
  • $ aws configure --profile ecos
  • $ vi config

[profile ecos]

region = ap-northeast-2

output = json

  • $ vi credentials

[ecos]

aws_access_key_id = xxxxxxxxxxxxxxxxx

aws_secret_access_key = xxxxxxxxxxxxxxxxxx

Project Docker Compose 설정

...중략...
    volumes:
      - /Users/mzc02-henryjeong/.aws:/root/.aws
      - /Users/mzc02-henryjeong/.ssh:/root/.ssh
      - /Users/mzc02-henryjeong/Temp/logs:/home/mzc/logs
      - /var/run/docker.sock:/var/run/docker.sock
      - /Users/mzc02-henryjeong/Works/app/terraform:/home/mzc/backup/terraform
...중략...
  • aws 접속 및 ssh 터널링을 위해 관련 path 에 대한 mount 를 합니다.

Build Step

  • $ ./gradlew clean build bootJar -Pprofile=dev -x test
  • $ docker build --build-arg BASTION_IP=xxx.xxx.xxx.xxx --tag ecos-installer-web:0.0.1 .
  • OR $ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose build
  • $ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose up
  • $ ENV=dev TAG=0.0.1 REDIRECT_HTTPS=true SERVER_PORT=8081 docker-compose down
  • $ docker image ls
  • $ docker rmi -f 7f52709a6615
  • $ docker exec -it ecos-installer-web /bin/sh
  • $ sudo docker save -o ecos-installer-web-0.0.1.tar ecos-installer-web:0.0.1
  • $ sudo docker load -i ecos-installer-web-0.0.1.tar

Terraform path 와 Elasticsearch Cluster 명명 규칙

  • Terraform File Path : /tmp/home/mzc/app/terraform/${CLUSTERNAME}/${TIMESTAMP}
  • Backend Key : ${CLUSTERNAME}/${TIMESTAMP}/terraform.tfstate
  • 신규 생성 시
    • Step 1) Terraform File Path : /tmp/home/mzc/app/terraform/elasticsearch
    • Step 1) Backend Key : elasticsearch/terraform.tfstate
  • 추가 시
    • Step 1) Terraform File Path : /tmp/home/mzc/app/terraform/elasticsearch/1598860075233
    • Step 1) Backend Key : elasticsearch/1598860075233/terraform.tfstate
    • Step 2) Backend Key : elasticsearch/1598860075233/terraform.tfstate
    • 기존 클러스터에 Join 시키기 위해 master ip 정보를 구해야 함

생성 및 설정

  • aws account access/secret key 생성
  • aws configure 설정
  • bastion 서버 생성
  • vpc 내 정보 설정
    • security group
    • subnet
    • az
    • ec2 네트워크 및 보안에서 키 페어 생성 및 등록 (keyName, keyPem)
    • ami
  • terraform 정보 설정
    • terraform working path 설정
    • terraform backend 설정
  • aws cluster instance 설정
    • node topology 설정 (node 유형)
    • instance type 설정 (cpu, mem, network 성능)
    • instance size 설정 (node 규모)
    • disk volume size 설정 (elasticsearch storage)
  • elasticsearch cluster 설정
    • cluster name 설정
    • 설치를 위한 elasticsearch version 지정
    • port 설정 (http, tcp)
    • path.data/logs 설정
  • ansible 설정
    • working path 설정
    • bastion ip 설정

Service Flow

  • TerraformService

    • terraform
      • createTerraformS3Backend
      • readTerraformTemplateForElasticsearch
      • writeTerraformTemplateForElasticsearch
      • runTerraformTemplateForElasticsearch
      • backupTerraformTemplateStateForElasticsearch (if it is not s3 backend)
  • ElasticsearchService

    • docker
      • createDockerComposeConfiguration
    • ansible
      • createAnsibleInventories
      • createAnsibleRoles

runAnsiblePlaybook

: