jjeong

[Elasticsearch] esrally 링크.

Elastic/Elasticsearch 2021. 4. 9. 08:31

https://github.com/elastic/rally
https://esrally.readthedocs.io/en/stable/

elasticsearch cluster 성능 점검용으로 활용 하면 좋아요.

:

[Elasticsearch] Elastic APM Quick 구성

Elastic/Elasticsearch 2021. 4. 6. 14:39

Elastic 사에서 제공 하는 다양한 도구와 서비스 들이 있습니다.

APM 이라는 아주 좋은 도구도 제공 하는데요.

Quick 하게 필요한 정보만 기록해 봅니다.

[Elastic APM Server]

https://www.elastic.co/guide/en/apm/server/current/overview.html
https://www.elastic.co/downloads/apm

[Elastic APM Agent]

https://www.elastic.co/guide/en/apm/agent/java/current/intro.html
https://search.maven.org/search?q=g:co.elastic.apm%20AND%20a:elastic-apm-agent

<intellij 에서 vm 옵션으로 등록합니다.>
-javaagent:/Users/mzc02-henryjeong/Works/elastic/apm-agent/elastic-apm-agent-1.22.0.jar -Delastic.apm.service_name=poc-service -Delastic.apm.application_packages=com.mzc.poc -Delastic.apm.server_url=http://localhost:8200

<Kibana 에서 Index Pattern 등록 하고 Discover 합니다.>

apm-{versin}-onboarding-*
apm-{versin}-span-*
apm-{versin}-error-*
apm-{versin}-transaction-*
apm-{versin}-profile-*
apm-{versin}-metric-*

- alias 로 자동 생성 되어 있음.

구성 시 사전 필요한 stack 은)

- Elasticsearch

- Kibana

- Spring Boot Web Application

:

[Elasticsearch] Nori Analyzer 테스트

Elastic/Elasticsearch 2021. 4. 5. 17:23

Nori Analyzer 기본 테스트 입니다.

공홈 참고문서)

www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html

기본 사전)

bitbucket.org/eunjeon/mecab-ko-dic/src/master/

POS Tag)

lucene.apache.org/core/8_8_0/analyzers-nori/org/apache/lucene/analysis/ko/POS.Tag.html

여기서 주의 할 점은 filter 선언 시 ~~postags~~ 가 아닌 stoptags 로 선언 하셔야 합니다.

제가 실수로 postags 로 작성을 했었네요. (수정해 두었습니다.)

_analyze API 를 이용해서 RESTful API 호출로 테스트 한 내용입니다.

{
    "tokenizer": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed",
        "discard_punctuation": "true",
        "user_dictionary_rules": ["c++ c+ +", "C샤프", "세종", "세종시 세종 시"]
    },
    "filter": [
        {        
            "type": "nori_part_of_speech",
            "stoptags": [
                "E",
                "IC",
                "J",
                "MAG", "MAJ", "MM",
                "SP", "SSC", "SSO", "SC", "SE",
                "XPN", "XSA", "XSN", "XSV",
                "UNA", "NA", "VSV"
            ]
        },
        {
            "type": "nori_readingform"
        }
    ],
    "text": "世宗市에서 c++ 언어를 가르치는 학원이 있나요?",
    "attributes" : ["posType", "leftPOS", "rightPOS", "morphemes", "reading"],
    "explain": true        
}

실행한 결과)

{
    "detail": {
        "custom_analyzer": true,
        "charfilters": [],
        "tokenizer": {
            "name": "__anonymous__nori_tokenizer",
            "tokens": [
                {
                    "token": "世宗",
                    "start_offset": 0,
                    "end_offset": 2,
                    "type": "word",
                    "position": 0,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": "세종",
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "市",
                    "start_offset": 2,
                    "end_offset": 3,
                    "type": "word",
                    "position": 1,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": "시",
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "에서",
                    "start_offset": 3,
                    "end_offset": 5,
                    "type": "word",
                    "position": 2,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "c++",
                    "start_offset": 6,
                    "end_offset": 9,
                    "type": "word",
                    "position": 3,
                    "positionLength": 2,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                    "posType": "COMPOUND",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "c+",
                    "start_offset": 6,
                    "end_offset": 8,
                    "type": "word",
                    "position": 3,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "+",
                    "start_offset": 8,
                    "end_offset": 9,
                    "type": "word",
                    "position": 4,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "언어",
                    "start_offset": 10,
                    "end_offset": 12,
                    "type": "word",
                    "position": 5,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "를",
                    "start_offset": 12,
                    "end_offset": 13,
                    "type": "word",
                    "position": 6,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "가르치",
                    "start_offset": 14,
                    "end_offset": 17,
                    "type": "word",
                    "position": 7,
                    "leftPOS": "VV(Verb)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "VV(Verb)"
                },
                {
                    "token": "는",
                    "start_offset": 17,
                    "end_offset": 18,
                    "type": "word",
                    "position": 8,
                    "leftPOS": "E(Verbal endings)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "E(Verbal endings)"
                },
                {
                    "token": "학원",
                    "start_offset": 19,
                    "end_offset": 21,
                    "type": "word",
                    "position": 9,
                    "leftPOS": "NNG(General Noun)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "NNG(General Noun)"
                },
                {
                    "token": "이",
                    "start_offset": 21,
                    "end_offset": 22,
                    "type": "word",
                    "position": 10,
                    "leftPOS": "J(Ending Particle)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "J(Ending Particle)"
                },
                {
                    "token": "있",
                    "start_offset": 23,
                    "end_offset": 24,
                    "type": "word",
                    "position": 11,
                    "leftPOS": "VA(Adjective)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "VA(Adjective)"
                },
                {
                    "token": "나요",
                    "start_offset": 24,
                    "end_offset": 26,
                    "type": "word",
                    "position": 12,
                    "leftPOS": "E(Verbal endings)",
                    "morphemes": null,
                    "posType": "MORPHEME",
                    "reading": null,
                    "rightPOS": "E(Verbal endings)"
                }
            ]
        },
        "tokenfilters": [
            {
                "name": "__anonymous__nori_part_of_speech",
                "tokens": [
                    {
                        "token": "世宗",
                        "start_offset": 0,
                        "end_offset": 2,
                        "type": "word",
                        "position": 0,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "세종",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "市",
                        "start_offset": 2,
                        "end_offset": 3,
                        "type": "word",
                        "position": 1,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "시",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c++",
                        "start_offset": 6,
                        "end_offset": 9,
                        "type": "word",
                        "position": 3,
                        "positionLength": 2,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                        "posType": "COMPOUND",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c+",
                        "start_offset": 6,
                        "end_offset": 8,
                        "type": "word",
                        "position": 3,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "+",
                        "start_offset": 8,
                        "end_offset": 9,
                        "type": "word",
                        "position": 4,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "언어",
                        "start_offset": 10,
                        "end_offset": 12,
                        "type": "word",
                        "position": 5,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "가르치",
                        "start_offset": 14,
                        "end_offset": 17,
                        "type": "word",
                        "position": 7,
                        "leftPOS": "VV(Verb)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VV(Verb)"
                    },
                    {
                        "token": "학원",
                        "start_offset": 19,
                        "end_offset": 21,
                        "type": "word",
                        "position": 9,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "있",
                        "start_offset": 23,
                        "end_offset": 24,
                        "type": "word",
                        "position": 11,
                        "leftPOS": "VA(Adjective)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VA(Adjective)"
                    }
                ]
            },
            {
                "name": "__anonymous__nori_readingform",
                "tokens": [
                    {
                        "token": "세종",
                        "start_offset": 0,
                        "end_offset": 2,
                        "type": "word",
                        "position": 0,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "세종",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "시",
                        "start_offset": 2,
                        "end_offset": 3,
                        "type": "word",
                        "position": 1,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": "시",
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c++",
                        "start_offset": 6,
                        "end_offset": 9,
                        "type": "word",
                        "position": 3,
                        "positionLength": 2,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": "c+/NNG(General Noun)++/NNG(General Noun)",
                        "posType": "COMPOUND",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "c+",
                        "start_offset": 6,
                        "end_offset": 8,
                        "type": "word",
                        "position": 3,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "+",
                        "start_offset": 8,
                        "end_offset": 9,
                        "type": "word",
                        "position": 4,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "언어",
                        "start_offset": 10,
                        "end_offset": 12,
                        "type": "word",
                        "position": 5,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "가르치",
                        "start_offset": 14,
                        "end_offset": 17,
                        "type": "word",
                        "position": 7,
                        "leftPOS": "VV(Verb)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VV(Verb)"
                    },
                    {
                        "token": "학원",
                        "start_offset": 19,
                        "end_offset": 21,
                        "type": "word",
                        "position": 9,
                        "leftPOS": "NNG(General Noun)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "NNG(General Noun)"
                    },
                    {
                        "token": "있",
                        "start_offset": 23,
                        "end_offset": 24,
                        "type": "word",
                        "position": 11,
                        "leftPOS": "VA(Adjective)",
                        "morphemes": null,
                        "posType": "MORPHEME",
                        "reading": null,
                        "rightPOS": "VA(Adjective)"
                    }
                ]
            }
        ]
    }
}

synonyms filter 추가)

주의 할 사항은 user_dic.txt 에 정의 되지 않은 단어의 경우 의도한 결과가 나오지 않을 수 있습니다.

{
    "tokenizer": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed",
        "discard_punctuation": "true",
        "user_dictionary_rules": ["c++ c+", "c샤프", "c샵", "삼성전자", "세종", "세종시 세종 시"]
    },
    "filter": [
        {
            "type": "synonym_graph",            
            "synonyms": [ 
                "삼성전자, 삼전",
                "c샤프, c샵"
            ]
        },
        {        
            "type": "nori_part_of_speech",
            "stoptags": [
                "E",
                "IC",
                "J",
                "MAG", "MAJ", "MM",
                "SP", "SSC", "SSO", "SC", "SE",
                "XPN", "XSA", "XSN", "XSV",
                "UNA", "NA", "VSV"
            ]
        },
        {
            "type": "nori_readingform"
        }        
    ],
    "text": "世宗市에서 c++, c샤프 언어를 가르치는 삼성전자 학원이 있나요?",
    "attributes" : ["posType", "leftPOS", "rightPOS", "morphemes", "reading"],
    "explain": false        
}

nori_userdict.txt)

user_dictionary_rules 를 user_dictionary 로 변경해서 설정을 하게 되면 아래와 같습니다.

"user_dictionary": "nori_userdict.txt"
- 위 파일은 elasticsearch 가 설치된 위치의 config 경로 아래 위치 합니다.

c++ c+
c샤프
c샵
삼성전자
세종
세종시 세종 시

:

[Elasticsearch] Document Indexing 관련

Elastic/Elasticsearch 2021. 4. 5. 15:36

Elasticsearch 에서 Indexing 관련해서 봐두면 좋은 Class 입니다.

InternalEngine
- Node 레벨에서 선언 되며, Elasticsearch 에서의 대부분의 Operation 에 대한 정의가 되어 있습니다.
NodeClient
- Elasticsearch Cluster 구성 시 Node 에 해당 합니다.
IndexShard
- 물리적인 Index 의 Operation 에 대한 정의가 되어 있습니다.
Translog
- Commit 되지 않은 색인 작업 내역에 대한 Operation 정의가 되어 있습니다.

Flush 에 대한 대략적인 흐름)

Commit 하면 tranlog 를 indexWriter 가 segments 파일에 write 하고 tranlog 는 flush 되면서 refresh 동기화가 이루어 집니다.
(Synced flush 의 경우 refresh 가 먼저 수행 됩니다.)

:

jjeong

[Python] 비동기 요청 예제

[Elasticsearch] esrally 링크.

[Elasticsearch] Elastic APM Quick 구성

[Elasticsearch] Nori Analyzer 테스트

[Elasticsearch] Document Indexing 관련

티스토리툴바