Elasticsearch 동의어/유의어 사전 활용
Elastic/Elasticsearch 2012. 12. 18. 22:22[title=Elasticsearch 동의어/유의어 설정]
- 색인 파일 생성 시 설정을 해줘야 함
- 기본 kr_analysis 가 적용되어 있어야 함
- 없을 경우 한국어 처리가 안됨
- synonym.txt 파일을 적절한 위치에 생성
- http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter.html
[title=synonym.txt 샘플]
Solr synonyms
The following is a sample format of the file:
# blank lines and lines starting with pound are comments.
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS. These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
#Equivalent synonyms may be separated with commas and give
#no explicit mapping. In this case the mapping behavior will
#be taken from the expand parameter in the schema. This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod
#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz
[색인파일 생성 샘플코드 - synonym 적용]
curl -XPUT 'http://localhost:9200/test' -d '{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1,
"index" : {
"analysis" : {
"analyzer" : {
"kr_analyzer" : {
"type" : "custom",
"tokenizer" : "kr_tokenizer",
"filter" : ["trim", "kr_filter", "kr_synonym"]
},
"kr_analyzer" : {
"type" : "custom",
"tokenizer" : "kr_tokenizer",
"filter" : ["trim", "kr_filter", "kr_synonym"]
}
},
"filter" : {
"kr_synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/synonym.txt"
}
}
}
}
}'
[title=색인파일 생성 샘플코드]
curl -XPUT 'http://10.101.254.223:9200/test' -d '{
"settings" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "kr_analyzer",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "/home/계정/apps/elasticsearch/plugins/analysis-korean/analysis/synonym.txt"
}
}
}
},
"mappings" : {
"docs" : {
"properties" : {
...................................(요기 부분은 다른 문서들 참고 하시면 됩니다.)
}
}
}
}
}
}'