'Elastic/Elasticsearch' 카테고리의 글 목록 (32 Page)

[Algorithm] TF (Term Frequency)

Elastic/Elasticsearch 2013. 5. 15. 14:12

참고 URL : http://kaistwebst.blog.me/130165776517

위 문서에 있는 것 처럼 하나의 문서에서 출현한 하나의 단어 출현 빈도수 입니다.

수식으로 표현 하면

Di : 문서

Wj : 단어(Term)

fij : 출현 단어 빈도 수

log2(1+fij)

예)

Di : "안녕하세요 검색 관련 색인 빈도 중 Term 빈도, Document 빈도"

Wj : "빈도"

fij : 2

log2(1+2)

DF 설명 : 색인어당 문서의 빈도 수 (색인어 A 가 들어 있는 문서 들 이라고 보면 됨)

:

[Elasticsearch] What's New in Elasticsearch 0.90?

Elastic/Elasticsearch 2013. 5. 6. 11:58

얼마전에 릴리즈 되고 0.90.0 에 대한 미디오 컨퍼런스(?)도 했었죠.

관련 영상과 발표 문서 공유 합니다.

[영상]

http://info.elasticsearch.com/Recorded_0.90_Webinar.html?mkt_tok=3RkMMJWWfF9wsRonu6rNZKXonjHpfsX56%2BQvWaaxlMI%2F0ER3fOvrPUfGjI4ATMBrI%2BSLDwEYGJlv6SgFQrHGMa1h17gOUhM%3D

[발표문서]

- 위 페이지에도 링크가 있습니다.

What's new in 0.90 5-3-12.pdf

[요약]

- Fuzzy Query 성능 향상

- Similarity 에서는 더 이상 td/idf 를 사용하지 않고 BM25 를 적용해서 사용

- fielddata cache 기능 및 성능 향상

- Multi Value 는 좀 무겁다.

- soft_refs 는 사용하지 마라 (빈번한 GC 사용과 느리다)

- Node Level Cache Control 을 해라

- Common Terms Query 활용

- Search -> Query -> Facets -> Filter -> Rescore 순

- Social Filter(Graph) 기능 향상 (Pareent docs - Child docs)

- Suggester (Shingle token filter) : term bigram 같은...

- Sort on Multi value

- Indices Alias 기능 활용

날로 좋아 지고 있내요.. ^^

:

[Elasticsearch] head plugin + http auth plugin

Elastic/Elasticsearch 2013. 4. 24. 13:26

elasticsearch head plugin 사용하시 외부로 ES 서버가 노출되어 있는 경우 인증 부분을 고민 해야 합니다.

보통은 중간에 proxy 서버를 두어서 gateway 역할을 수행하게 하고 외부로 노출 안시키고 서비스를 할 수 있는데요.

혹시라도 proxy 구조를 두기 어렵다면 기본 인증이라도 적용하는게 좋지 않을까 합니다.

이미 ES 플러그인으로 제공되는 것이 있어서 링크 공유 합니다.

[head plugin]

https://github.com/mobz/elasticsearch-head

[http basic auth plugin]

https://github.com/karussell/elasticsearch-http-basic

:

[Elasticsearch] Cluster 설정 중 rack 과 zone 알아보기.

Elastic/Elasticsearch 2013. 4. 23. 16:36

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/modules/cluster/

cluster 세팅 중 위 문서에 아래와 같은 설정이 있습니다.

Shard Allocation Awareness

sharding 과 replica 설정에 따라 clustering 환경에서 어떻게 배치 시킬것인지 관리 할 수 있도록 해주는 추가 설정 입니다.

즉, 특정 노드로 shard 를 배치 시키거나 replica 가 되도록 합니다.

아래는 이해를 돕기 위한 설정 예제 입니다.

[Rack 설정 예제]

- 1 cluster, 4 nodes

[node 공통]

cluster.name: cluster_a

cluster.routing.allocation.awareness.attributes: rack_id

[node 1]

node.rack_id: rack_1

node.name: node_1

[node 2]

node.rack_id: rack_2

node.name: node_2

[node 3]

node.rack_id: rack_1

node.name: node_3

[node 4]

node.rack_id: rack_2

node.name: node_4

[Zone 설정 예제]

- 1 cluster, 2 nodes

[node 공통]

cluster.name: cluster_a

cluster.routing.allocation.awareness.force.zone.values: zone_1, zone_2

cluster.routing.allocation.awareness.attributes: zone

[node 1]

node.zone: zone_1

node.name: node_1

[node 2]

node.zone: zone_1

node.name: node_2

※ Rack 의 경우 서로 다른 Rack 위치한 node 들에 대한 분산설정으로 같은 IDC 내 서로 다른 rack 이나 IDC 간 HA 구성이 가능 합니다.

※ Zone 의 경우 특정 zone 으로 동일 데이터가 몰려 SPOF 를 예방하기 위한 구성으로 사용이 가능 합니다.

직접 해보시면 쉽게 이해가 됩니다. :)

:

[Elasticsearch] node 별 색인 shard 와 replicas flow.

Elastic/Elasticsearch 2013. 4. 19. 16:27

쉽게 개념을 잡기 위해 Document 색인 요청 시 어떤 flow 로 색인 및 복제를 하는지 간단하게 표현해 봤습니다.

Find Primary Shard

↓

Perform Index into the Primary Shard

↓

Perform Replica to another Node (using async)

:

[Elasticsearch] Plugins - site 플러그인과 custom analyzer 플러그인 만들기

Elastic/Elasticsearch 2013. 4. 19. 10:55

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/modules/plugins/

elasticsearch를 사용하면서 가장 많이 사용하는 것이 head 와 kr lucene 형태소 분석기가 아닌가 싶습니다.

그럼 이런 것들은 어떻게 제작을 해야 하는지 궁금 할텐데요.

위 원문 아래쪽에 제공되는 모든 plugin 목록을 보여 주고 있습니다.

또는 아래 링크에서도 확인이 가능 합니다.

[git]

- https://github.com/elasticsearch

- https://github.com/search?q=elasticsearch&type=&ref=simplesearch

우선 head와 같은 site plugin 구성 부터 살펴 보겠습니다.

이건 사실 설명이 필요 없습니다. ^^;;

[_site plugin]

- plugin location : ES_HOME/plugins

- site plugin name : helloworld

- helloworld site plugin location : ES_HOME/plugins/helloworld

. helloworld 폴더 아래로 _site 폴더 생성

. _site 폴더 아래로 구현한 html, js, css 등의 파일을 위치 시키고 아래 링크로 확인 하면 됩니다.

- helloworld site plugin url

. http://localhost:9200/_plugin/helloworld/index.html

- elasticsearch server 와의 통신은 ajax 통신을 이용해서 필요한 기능들을 구현 하시면 됩니다.

[kr lucene analyzer plugin]

- 이미 관련 plugin 은 제공 되고 있습니다.

- 아래 링크 참고

- http://cafe.naver.com/korlucene

- https://github.com/chanil1218/elasticsearch-analysis-korean

- 적용하는 방법은 두 가지 입니다.

. First : elasticsearch-analysis-korean 을 설치 한다. (설치 시 es 버전을 맞춰 주기 위해서 별도 빌드가 필요 할 수도 있다.)

. Second : lucene kr analyzer 라이브러리를 이용해서 plugin 형태로 제작해서 설치 한다.

- 아래는 plugin 형태로 제작해서 설치한 방법을 기술 한 것입니다.

분석기 라이브러리를 사용하는 경우 kimchy 가 만들어 놓은 코드를 기본 템플릿으로 사용해서 구현 하시면 쉽고 빠르게 적용 하실 수 있습니다.

- https://github.com/elasticsearch/elasticsearch-analysis-smartcn

- 만들어 봅시다.

[프로젝트 구성]

- Eclipse 에서 Maven 프로젝트를 하나 생성 합니다.

[패키지 및 리소스 구성]

- org.elasticsearch.index.analysis

. KrLuceneAnalysisBinderProcessor.java

public class KrLuceneAnalysisBinderProcessor extends AnalysisModule.AnalysisBinderProcessor {

@Override

public void processAnalyzers(AnalyzersBindings analyzersBindings) {

analyzersBindings.processAnalyzer("krlucene_analyzer", KrLuceneAnalyzerProvider.class);

}

@Override

public void processTokenizers(TokenizersBindings tokenizersBindings) {

tokenizersBindings.processTokenizer("krlucene_tokenizer", KrLuceneTokenizerFactory.class);

}

@Override

public void processTokenFilters(TokenFiltersBindings tokenFiltersBindings) {

tokenFiltersBindings.processTokenFilter("krlucene_filter", KrLuceneTokenFilterFactory.class);

}

. 이 클래스는 analyzer, tokenizer, filter 를 name 기반으로 등록해 준다.

. settings 구성 시 analyzer, tokenizer, filter 에 명시 하는 name 부분에 해당한다.

. settings 에서 type 부분에는 패키지 full path 를 명시 하면 된다.

curl -XPUT http://localhost:9200/test -d '{

"settings" : {

"index": {

"analysis": {

"analyzer": {

"krlucene_analyzer": {

"type": "org.elasticsearch.index.analysis.KrLuceneAnalyzerProvider",

"tokenizer" : "krlucene_tokenizer",

"filter" : ["trim","lowercase", "krlucene_filter"]

}

}'

. KrLuceneAnalyzerProvider.java

public class KrLuceneAnalyzerProvider extends AbstractIndexAnalyzerProvider<KoreanAnalyzer> {

private final KoreanAnalyzer analyzer;

@Inject

public KrLuceneAnalyzerProvider(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) throws IOException {

super(index, indexSettings, name, settings);

analyzer = new KoreanAnalyzer(Lucene.VERSION.LUCENE_36);

}

@Override

public KoreanAnalyzer get() {

return this.analyzer;

}

. KrLuceneTokenFilterFactory.java

public class KrLuceneTokenFilterFactory extends AbstractTokenFilterFactory {

@Inject

public KrLuceneTokenFilterFactory(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {

super(index, indexSettings, name, settings);

}

@Override

public TokenStream create(TokenStream tokenStream) {

return new KoreanFilter(tokenStream);

}

. KrLuceneTokenizerFactory.java

public class KrLuceneTokenizerFactory extends AbstractTokenizerFactory {

@Inject

public KrLuceneTokenizerFactory(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {

super(index, indexSettings, name, settings);

}

@Override

public Tokenizer create(Reader reader) {

return new KoreanTokenizer(Lucene.VERSION.LUCENE_36, reader);

}

- org.elasticsearch.plugin.analysis.krlucene

. AnalysisKrLucenePlugin.java

. 이 클래스는 생성한 plugin 을 es 에 등록해 주는 역할을 한다.

. plugin 명을 analysis-krlucene 라고 했을 경우 아래와 같은 path 에 jar 파일을 위치 시켜야 합니다.

ES_HOME/plugins/analysis-krlucene

- src/main/assemblies/plugin.xml

<?xml version="1.0"?>

<id>plugin</id>

</formats>

<includeBaseDirectory>false</includeBaseDirectory>

<exclude>org.elasticsearch:elasticsearch</exclude>

</excludes>

</dependencySet>

<scope>provided</scope>

</dependencySet>

</dependencySets>

</assembly>

- src/main/resources/es-plugin.properties

plugin=org.elasticsearch.plugin.analysis.krlucene.AnalysisKrLucenePlugin

- 이렇게 해서 빌드를 하시고 생성된 jar 파일을 위에서 언급한 경로에 위치 시키고 ES 재시작 후 아래와 같이 테스트 해보시면 됩니다.

[테스트]

- test 인덱스 생성 (위에 생성 코드 참고)

- 테스트 URL

. http://localhost:9200/test/_analyze?analyzer=krlucene_analyzer&text=이것은 루씬한국어 형태소 분석기 플러그인 입니다.&pretty=1

{
  "tokens" : [ {
    "token" : "이것은",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "이것",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "루씬한국어",
    "start_offset" : 4,
    "end_offset" : 9,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "루씬",
    "start_offset" : 4,
    "end_offset" : 6,
    "type" : "word",
    "position" : 4
  }, {
    "token" : "한국어",
    "start_offset" : 6,
    "end_offset" : 9,
    "type" : "word",
    "position" : 5
  }, {
    "token" : "형태소",
    "start_offset" : 10,
    "end_offset" : 13,
    "type" : "word",
    "position" : 6
  }, {
    "token" : "분석기",
    "start_offset" : 14,
    "end_offset" : 17,
    "type" : "word",
    "position" : 7
  }, {
    "token" : "분석",
    "start_offset" : 14,
    "end_offset" : 16,
    "type" : "word",
    "position" : 8
  }, {
    "token" : "플러그인",
    "start_offset" : 18,
    "end_offset" : 22,
    "type" : "word",
    "position" : 9
  }, {
    "token" : "플러그",
    "start_offset" : 18,
    "end_offset" : 21,
    "type" : "word",
    "position" : 10
  }, {
    "token" : "입니다",
    "start_offset" : 23,
    "end_offset" : 26,
    "type" : "word",
    "position" : 11
  }, {
    "token" : "입니",
    "start_offset" : 23,
    "end_offset" : 25,
    "type" : "word",
    "position" : 12
  } ]

}

※ lucene 버전을 3.x 에서 4.x 로 올리고 싶으시다면 직접 코드 수정을 통해서 진행을 하시면 됩니다.

- elasticsearch-analysis-korean 의 경우는 고쳐야 할 부분이 좀 됩니다.

. 우선 루씬 한국어 형태소 소스코드를 3.x 에서 4.x 로 올리셔야 합니다.

. 관련 코드는 루씬 한국어 형태소 분석기 카페에 들어가 보시면 cvs 링크가 있습니다.

:pserver:anonymous@lucenekorean.cvs.sourceforge.net:/cvsroot/lucenekorean

. 추가로 es 버전도 올리고 싶으시다면 pom.xml 에서 코드를 수정해 주시기 바랍니다.

<elasticsearch.version>0.20.4</elasticsearch.version>

<lucene.version>3.6.2</lucene.version>

</properties>

- 직접 플러그인을 생성해서 적용하는 방법은 위와 같이 플러그인을 만드시고 루씬한국어 형태소 분석기 라이브러리만 버전에 맞게 넣어서 사용하시면 됩니다.

. 단, 플러그인의 pom.xml 에서 각 라이브러리의 version 은 맞춰 주셔야 겠죠.

:

[Elasticsearch] Query DSL - Filters

Elastic/Elasticsearch 2013. 4. 18. 10:46

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/query-dsl/

- 많이 사용되는 것들로 진행 합니다.

[Filters]

기본적으로 filtered query 에서 동작 방식을 소개 했기 때문에 이 점을 이해하고 보셔야 합니다.

[and/or]

- 쿼리 결과에 대한 추가 쿼리의 and/or 연산을 수행 합니다.

- 쿼리 결과를 cache 하고 싶을 경우 _cache:true 설정을 하면 됩니다.

[bool]

- boolean 쿼리를 추가 수행 합니다.

[exists]

- 결과에 대해서 항상 cache 합니다.

[ids]

- ids 를 포함한 문서를 필터 합니다.

[limit]

- shard 당 문서 수를 제한 합니다.

[type]

- document/mapping type 에 대한 filter 합니다.

[missing]

- 문서의 특정 필드 값이 no value 인 것을 filter 합니다.

- 지정된 field 는 null_value 를 갖습니다.

- 예제가 직관적이기 떄문에 추가 합니다.

{
    "constant_score" : {
        "filter" : {
            "missing" : { 
                "field" : "user",
                "existence" : true,
                "null_value" : true
            }
        }
    }
}

[not]

- 질의된 결과에 대해서 추가로 주어진 not filter 로 match 된 문서를 제외 합니다.

[numeric range]

- range filter와 유사하며, 어떤 수의 범위를 갖습니다.

- 주어진 parameters 는 아래와 같습니다.

Name	Description
`from`	The lower bound. Defaults to start from the first.
`to`	The upper bound. Defaults to unbounded.
`include_lower`	Should the first from (if set) be inclusive or not. Defaults to `true`
`include_upper`	Should the last to (if set) be inclusive or not. Defaults to `true`.
`gt`	Same as setting `from` and `include_lower` to `false`.
`gte`	Same as setting `from` and `include_lower` to `true`.
`lt`	Same as setting `to` and `include_upper` to `false`.
`lte`	Same as setting `to` and `include_upper` to `true`.

[prefix]

- phrase query 와 유사하며, prefix query 참고

[query]

- 추가 query 를 생성 할 수 있습니다.

[range]

- range query 참고

[script]

- script 를 이용한 filter 를 적용 할 수 있습니다.

[term]

- term query 참고

[terms]

- terms query 참고

- execution mode 를 지원 합니다.

- 기본 plain 그리고 bool, and, or 지원

[nested]

- nested query 참고

※ filter 의 경우 기본 query 에서 제공 하는 것과 거의 동일 하며,

이 API 의 목적은 한번 질의한 결과에 대해 별도의 filtering 을 하기 위함 입니다.

:

[Elasticsearch] Query DSL

Elastic/Elasticsearch 2013. 4. 17. 17:41

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/query-dsl/

- 많이 사용되는 것들로 진행 합니다.

[Query]

[match]

- 기본 boolean type 과 or 연산

[multi match]

- 여러개의 field 에 대한 match query

-field 선언 시 boosting 옵션 추가 (^2 라는 의미는 2배 더 중요하다는 의미)

[bool]

- must : 문서에 매칭 되는 것이 무조건 나타나야 함.

- should : minimum_number_should_match 값을 통해서 문서에 매칭 되는 것이 나타나야 함.

- must_not : 문서에 매칭 정보가 있으면 안됨. (exclusive)

[boosting]

- query 의 결과를 효과적으로 감소 시킬 수 있다.

- 원문 예제가 이해하는데 직관적이라 추가 합니다.

{
    "boosting" : {
        "positive" : {
            "term" : {
                "field1" : "value1"
            }
        },
        "negative" : {
            "term" : {
                "field2" : "value2"
            }
        },
        "negative_boost" : 0.2
    }
}

- negative 조건에 따라 원하지 않는 문서에 대한 스코어를 떨어 트릴수 있습니다.

[constant_score]

- query 된 모든 문서의 score 를 동일하게 한다.

[dis_max]

- multi_field 에 단어 검색 시 유용하다.

- multiple field 에 같은 term 을 포함 하고 있을 경우 tie_breaker 값으로 더 좋은 결과 판정을 합니다.

- explain 을 떠보면서 검색 결과에 따른 score 변화를 확인해 보는게 좋을 것 같내요.

[field]

- query_string 버전의 단순화 query 이며, 대부분의 parameter 를 사용할 수 있습니다.

[filtered]

- query 결과에 대한 filter 를 적용 합니다.

- 예제가 직관적이라 추가 합니다.

{
    "filtered" : {
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "filter" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        }
    }
}

[flt : fuzzy like this][flt_field]

- 지정한 field 들에 대해서 like 검색을 지원 합니다.

_ _field 가 들어 간 것은 아래 파라미터 중 fields 만 없습니다.

Parameter	Description
`fields`	A list of the fields to run the more like this query against. Defaults to the `_all` field.
`like_text`	The text to find documents like it, required.
`ignore_tf`	Should term frequency be ignored. Defaults to `false`.
`max_query_terms`	The maximum number of query terms that will be included in any generated query. Defaults to `25`.
`min_similarity`	The minimum similarity of the term variants. Defaults to `0.5`.
`prefix_length`	Length of required common prefix on variant terms. Defaults to `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyzer`	The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

[fuzzy]

- Levenshtein algorithm 을 기반으로 하는 유사도 검색을 지원 한다.

[match_all]

- 모든 문서를 출력한다.

[mlt : more like this][mlt_field]

- 지정한 field 들에 대해서 like 검색을 지원 합니다.

- term 단위 like 검색이라는 것을 유의 해야함.

- 결국 지정된 field 에는 term_vector 설정이 되어 있어야 함

Parameter	Description
`fields`	A list of the fields to run the more like this query against. Defaults to the `_all` field.
`like_text`	The text to find documents like it, required.
`percent_terms_to_match`	The percentage of terms to match on (float value). Defaults to `0.3` (30 percent).
`min_term_freq`	The frequency below which terms will be ignored in the source doc. The default frequency is `2`.
`max_query_terms`	The maximum number of query terms that will be included in any generated query. Defaults to `25`.
`stop_words`	An array of stop words. Any word in this set is considered “uninteresting” and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that “a stop word is never interesting”.
`min_doc_freq`	The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to `5`.
`max_doc_freq`	The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
`min_word_len`	The minimum word length below which words will be ignored. Defaults to `0`.
`max_word_len`	The maximum word length above which words will be ignored. Defaults to unbounded (`0`).
`boost_terms`	Sets the boost factor to use when boosting terms. Defaults to `1`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyzer`	The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

[prefix]

- 매치된 문서는 prefix를 포함한 term 을 갖는다. (not_analyzed)

[query_string]

- query parser 를 이용해서 검색 함.

Parameter	Description
`query`	The actual query to be parsed.
`default_field`	The default field for query terms if no prefix field is specified. Defaults to the `index.query.default_field` index settings, which in turn defaults to `_all`.
`default_operator`	The default operator used if no explicit operator is specified. For example, with a default operator of `OR`, the query `capital of Hungary` is translated to `capital OR of OR Hungary`, and with default operator of `AND`, the same query is translated to `capital AND of AND Hungary`. The default value is `OR`.
`analyzer`	The analyzer name used to analyze the query string.
`allow_leading_wildcard`	When set, `*` or `?` are allowed as the first character. Defaults to `true`.
`lowercase_expanded_terms`	Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it `true`.
`enable_position_increments`	Set to `true` to enable position increments in result queries. Defaults to `true`.
`fuzzy_max_expansions`	Controls the number of terms fuzzy queries will expand to. Defaults to `50`
`fuzzy_min_sim`	Set the minimum similarity for fuzzy queries. Defaults to `0.5`
`fuzzy_prefix_length`	Set the prefix length for fuzzy queries. Default is `0`.
`phrase_slop`	Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyze_wildcard`	By default, wildcards terms in a query string are not analyzed. By setting this value to `true`, a best effort will be made to analyze those as well.
`auto_generate_phrase_queries`	Default to `false`.
`minimum_should_match`	A percent value (for example `20%`) controlling how many “should” clauses in the resulting boolean query should match.
`lenient`	If set to `true` will cause format based failures (like providing text to a numeric field) to be ignored. (since 0.19.4).

Parameter	Description
`use_dis_max`	Should the queries be combined using `dis_max` (set it to `true`), or a `bool` query (set it to `false`). Defaults to `true`.
`tie_breaker`	When using `dis_max`, the disjunction max tie breaker. Defaults to `0`.

[range]

- string 은 TermRangeQuery

- numeric/date 는 NumericRangeQuery

Name	Description
`from`	The lower bound. Defaults to start from the first.
`to`	The upper bound. Defaults to unbounded.
`include_lower`	Should the first from (if set) be inclusive or not. Defaults to `true`
`include_upper`	Should the last to (if set) be inclusive or not. Defaults to `true`.
`gt`	Same as setting `from` to the value, and `include_lower` to `false`.
`gte`	Same as setting `from` to the value,and `include_lower` to `true`.
`lt`	Same as setting `to` to the value, and `include_upper` to `false`.
`lte`	Same as setting `to` to the value, and `include_upper` to `true`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.

[term]

- 어떤 term 을 포함한 결과를 갖는다. (term 은 not_analyzed)

- 결국 하나의 word 라고 보면 이해하는데 도움이 됨.

[terms]

- term query 의 IN 절 기능을 제공 한다.

[wildcard]

- * 과 ? 로 매칭을 할 수 있다.

- * 은 SQL 에서 % 와 비슷하며, ? 는 정규표현식의 single character 와 비슷하다.

- 사용 시 주의 할 점은 *, ? 를 첫 시작으로 사용하지 말라는 것이다.

[nested]

- nested mapping 된 parent 문서에 대한 검색을 지원 한다.

- 예제가 직관적이라 추가 합니다.

{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}

{
    "nested" : {
        "path" : "obj1",
        "score_mode" : "avg",
        "query" : {
            "bool" : {
                "must" : [
                    {
                        "match" : {"obj1.name" : "blue"}
                    },
                    {
                        "range" : {"obj1.count" : {"gt" : 5}}
                    }
                ]
            }
        }
    }
}

[indices]

- multiple indices 로 검색을 수행 할 수 있다.

- 역시 예제가 직관적이라 추가 합니다.

{
    "indices" : {
        "indices" : ["index1", "index2"],
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "no_match_query" : {
            "term" : { "tag" : "kow" }
        }
    }
}

- no_match_query 에 none 을 set 하면 no document, all 을 set 하면 match_all 이 된다.

※ Queries 가 너무 많아서 Filters 는 별도로 분리 합니다.

:

[Elasticsearch] Indices API - Cluster ....

Elastic/Elasticsearch 2013. 4. 17. 14:58

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/api/admin-cluster-*

cluster 관련 API 는 워낙 원문이 쉽고 직관적으로 잘 되어 있어서 따로 정리 하지 않습니다.

아래 링크들을 참고 하시기 바랍니다.

Cluster

:

[elasticsearch] Indices API - Warmers

Elastic/Elasticsearch 2013. 4. 17. 14:52

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/api/admin-indices-warmers/

이 API 는 대량 검색 즉, facet, sorting 시 설정 할 경우 성능 향상 효과가 있으며, bulk indexing 시에는 disable 하는 것이 성능에 좋습니다.

자세한 내용은 원문 참고 바랍니다.

:

jjeong

'Elastic/Elasticsearch'에 해당되는 글 385건

[Algorithm] TF (Term Frequency)

[Elasticsearch] What's New in Elasticsearch 0.90?

[Elasticsearch] head plugin + http auth plugin

[Elasticsearch] Cluster 설정 중 rack 과 zone 알아보기.

Shard Allocation Awareness

[Elasticsearch] node 별 색인 shard 와 replicas flow.

[Elasticsearch] Plugins - site 플러그인과 custom analyzer 플러그인 만들기

[Elasticsearch] Query DSL - Filters

Queries

Filters

[Elasticsearch] Query DSL

Queries

Filters

[Elasticsearch] Indices API - Cluster ....

Cluster

[elasticsearch] Indices API - Warmers

티스토리툴바