[Elasticsearch] Query DSL

Elastic/Elasticsearch 2013. 4. 17. 17:41

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/query-dsl/

- 많이 사용되는 것들로 진행 합니다.

[Query]

[match]

- 기본 boolean type 과 or 연산

[multi match]

- 여러개의 field 에 대한 match query

-field 선언 시 boosting 옵션 추가 (^2 라는 의미는 2배 더 중요하다는 의미)

[bool]

- must : 문서에 매칭 되는 것이 무조건 나타나야 함.

- should : minimum_number_should_match 값을 통해서 문서에 매칭 되는 것이 나타나야 함.

- must_not : 문서에 매칭 정보가 있으면 안됨. (exclusive)

[boosting]

- query 의 결과를 효과적으로 감소 시킬 수 있다.

- 원문 예제가 이해하는데 직관적이라 추가 합니다.

{
    "boosting" : {
        "positive" : {
            "term" : {
                "field1" : "value1"
            }
        },
        "negative" : {
            "term" : {
                "field2" : "value2"
            }
        },
        "negative_boost" : 0.2
    }
}

- negative 조건에 따라 원하지 않는 문서에 대한 스코어를 떨어 트릴수 있습니다.

[constant_score]

- query 된 모든 문서의 score 를 동일하게 한다.

[dis_max]

- multi_field 에 단어 검색 시 유용하다.

- multiple field 에 같은 term 을 포함 하고 있을 경우 tie_breaker 값으로 더 좋은 결과 판정을 합니다.

- explain 을 떠보면서 검색 결과에 따른 score 변화를 확인해 보는게 좋을 것 같내요.

[field]

- query_string 버전의 단순화 query 이며, 대부분의 parameter 를 사용할 수 있습니다.

[filtered]

- query 결과에 대한 filter 를 적용 합니다.

- 예제가 직관적이라 추가 합니다.

{
    "filtered" : {
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "filter" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        }
    }
}

[flt : fuzzy like this][flt_field]

- 지정한 field 들에 대해서 like 검색을 지원 합니다.

_ _field 가 들어 간 것은 아래 파라미터 중 fields 만 없습니다.

Parameter	Description
`fields`	A list of the fields to run the more like this query against. Defaults to the `_all` field.
`like_text`	The text to find documents like it, required.
`ignore_tf`	Should term frequency be ignored. Defaults to `false`.
`max_query_terms`	The maximum number of query terms that will be included in any generated query. Defaults to `25`.
`min_similarity`	The minimum similarity of the term variants. Defaults to `0.5`.
`prefix_length`	Length of required common prefix on variant terms. Defaults to `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyzer`	The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

[fuzzy]

- Levenshtein algorithm 을 기반으로 하는 유사도 검색을 지원 한다.

[match_all]

- 모든 문서를 출력한다.

[mlt : more like this][mlt_field]

- 지정한 field 들에 대해서 like 검색을 지원 합니다.

- term 단위 like 검색이라는 것을 유의 해야함.

- 결국 지정된 field 에는 term_vector 설정이 되어 있어야 함

Parameter	Description
`fields`	A list of the fields to run the more like this query against. Defaults to the `_all` field.
`like_text`	The text to find documents like it, required.
`percent_terms_to_match`	The percentage of terms to match on (float value). Defaults to `0.3` (30 percent).
`min_term_freq`	The frequency below which terms will be ignored in the source doc. The default frequency is `2`.
`max_query_terms`	The maximum number of query terms that will be included in any generated query. Defaults to `25`.
`stop_words`	An array of stop words. Any word in this set is considered “uninteresting” and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that “a stop word is never interesting”.
`min_doc_freq`	The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to `5`.
`max_doc_freq`	The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
`min_word_len`	The minimum word length below which words will be ignored. Defaults to `0`.
`max_word_len`	The maximum word length above which words will be ignored. Defaults to unbounded (`0`).
`boost_terms`	Sets the boost factor to use when boosting terms. Defaults to `1`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyzer`	The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

[prefix]

- 매치된 문서는 prefix를 포함한 term 을 갖는다. (not_analyzed)

[query_string]

- query parser 를 이용해서 검색 함.

Parameter	Description
`query`	The actual query to be parsed.
`default_field`	The default field for query terms if no prefix field is specified. Defaults to the `index.query.default_field` index settings, which in turn defaults to `_all`.
`default_operator`	The default operator used if no explicit operator is specified. For example, with a default operator of `OR`, the query `capital of Hungary` is translated to `capital OR of OR Hungary`, and with default operator of `AND`, the same query is translated to `capital AND of AND Hungary`. The default value is `OR`.
`analyzer`	The analyzer name used to analyze the query string.
`allow_leading_wildcard`	When set, `*` or `?` are allowed as the first character. Defaults to `true`.
`lowercase_expanded_terms`	Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it `true`.
`enable_position_increments`	Set to `true` to enable position increments in result queries. Defaults to `true`.
`fuzzy_max_expansions`	Controls the number of terms fuzzy queries will expand to. Defaults to `50`
`fuzzy_min_sim`	Set the minimum similarity for fuzzy queries. Defaults to `0.5`
`fuzzy_prefix_length`	Set the prefix length for fuzzy queries. Default is `0`.
`phrase_slop`	Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyze_wildcard`	By default, wildcards terms in a query string are not analyzed. By setting this value to `true`, a best effort will be made to analyze those as well.
`auto_generate_phrase_queries`	Default to `false`.
`minimum_should_match`	A percent value (for example `20%`) controlling how many “should” clauses in the resulting boolean query should match.
`lenient`	If set to `true` will cause format based failures (like providing text to a numeric field) to be ignored. (since 0.19.4).

Parameter	Description
`use_dis_max`	Should the queries be combined using `dis_max` (set it to `true`), or a `bool` query (set it to `false`). Defaults to `true`.
`tie_breaker`	When using `dis_max`, the disjunction max tie breaker. Defaults to `0`.

[range]

- string 은 TermRangeQuery

- numeric/date 는 NumericRangeQuery

Name	Description
`from`	The lower bound. Defaults to start from the first.
`to`	The upper bound. Defaults to unbounded.
`include_lower`	Should the first from (if set) be inclusive or not. Defaults to `true`
`include_upper`	Should the last to (if set) be inclusive or not. Defaults to `true`.
`gt`	Same as setting `from` to the value, and `include_lower` to `false`.
`gte`	Same as setting `from` to the value,and `include_lower` to `true`.
`lt`	Same as setting `to` to the value, and `include_upper` to `false`.
`lte`	Same as setting `to` to the value, and `include_upper` to `true`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.

[term]

- 어떤 term 을 포함한 결과를 갖는다. (term 은 not_analyzed)

- 결국 하나의 word 라고 보면 이해하는데 도움이 됨.

[terms]

- term query 의 IN 절 기능을 제공 한다.

[wildcard]

- * 과 ? 로 매칭을 할 수 있다.

- * 은 SQL 에서 % 와 비슷하며, ? 는 정규표현식의 single character 와 비슷하다.

- 사용 시 주의 할 점은 *, ? 를 첫 시작으로 사용하지 말라는 것이다.

[nested]

- nested mapping 된 parent 문서에 대한 검색을 지원 한다.

- 예제가 직관적이라 추가 합니다.

{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}

{
    "nested" : {
        "path" : "obj1",
        "score_mode" : "avg",
        "query" : {
            "bool" : {
                "must" : [
                    {
                        "match" : {"obj1.name" : "blue"}
                    },
                    {
                        "range" : {"obj1.count" : {"gt" : 5}}
                    }
                ]
            }
        }
    }
}

[indices]

- multiple indices 로 검색을 수행 할 수 있다.

- 역시 예제가 직관적이라 추가 합니다.

{
    "indices" : {
        "indices" : ["index1", "index2"],
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "no_match_query" : {
            "term" : { "tag" : "kow" }
        }
    }
}

- no_match_query 에 none 을 set 하면 no document, all 을 set 하면 match_all 이 된다.

※ Queries 가 너무 많아서 Filters 는 별도로 분리 합니다.

jjeong

[Elasticsearch] Query DSL

Queries

Filters

티스토리툴바