'query' 태그의 글 목록 (2 Page)

[Elasticsearch - The Definitive Guide] The match Query

Elastic/TheDefinitiveGuide 2015. 12. 10. 11:53

Match Query 와 Term Query 가 어떻게 다른지 간단하게 정리하는 차원에서 기록 합니다.

원문링크)

https://www.elastic.co/guide/en/elasticsearch/guide/current/match-query.html

Match Query Flow)

1. Check the field type

2. Analyze the query string (term query 로 재실행 됩니다.)

3. Find matching docs

4. Score each doc

Term Query Flow)

1. Find matching docs

2. Score each doc

보시면 아시겠지만 match query 보다 term query가 수행 단계가 적습니다.

결국 match query 를 실행 하더라도 term query 로 query rewrite 되기 때문에 검색 서비스 개발 시 잘 판단해서 사용하시면 좋을 듯 합니다.

보통은 front end 에서 smart query 또는 query preprocessing 이라고 해서 query stirng 에 대한 1차 가공 후 실제 검색엔진으로 질의시에는 term query 형태로 사용을 많이 합니다.

저작자표시 비영리 변경금지

:

[Elasticsearch - The Definitive Guide] Validating Queries

Elastic/TheDefinitiveGuide 2015. 11. 30. 16:23

사실 저도 많이 사용하지 않는 API이긴 합니다.

간혹 내부적으로 Query DSL 이 어떤식으로 Lucene Query로 해석 되는지 궁금할 때가 있는데요.

_validate query API를 이용하면 쉽게 번역이 가능 합니다.

아마도 query string 에 익숙하고 전문적으로 사용하시는 분들에게는 의미가 없을지도 모르겠지만 그럼에도 불구하고 유용한 API라고 생각합니다.

이유는 RDBMS 에서 Select 절 작성하고 나서 실행 전에 꼭 explain 떠 보시잖아요.

그런것과 같은 거라고 생각 하시면 되겠습니다.

[원문링크]

https://www.elastic.co/guide/en/elasticsearch/guide/current/_validating_queries.html

위 문서 보다는 references 문서가 좀 더 예제가 자세하니 링크 참고 하시면 되겠습니다.

[References]

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html

저작자표시 비영리 변경금지

:

[Lucene] CustomScoreQuery vs. DisjunctionMaxQuery

ITWeb/검색일반 2015. 11. 24. 15:05

검색 서비스를 운영하다 보면 문서에 대한 부스팅 작업이 필요 할 때가 있습니다.

부스팅의 목적은 다양할 수 있는데요.

한 문장으로 정리를 하면,

"특정 문서를 검색 결과 상위에 노출시키기 위해 사용"

한다고 보시면 됩니다.

문서에 대한 부스팅 작업은 아래와 같은 방법으로 구현이 가능 합니다.

1. 질의 부스팅

"Query time boosting" 이라는 것은 질의 시 특정 필드 또는 질의어에 대한 가중치를 부여하여 질의 시점에 적용하는 방식을 말합니다.

2. 색인 부스팅

"Index time boosting" 이라는 것은 색인 시 특정 필드 또는 문서에 대한 가중치를 부여하여 색인 시점에 적용하는 방식을 말합니다.

3. 필드 부스팅

"Field boosting" 이라는 것은 특정 필드에 대한 가중치가 또는 중요도가 높다는 것을 반영하여 적용하는 방식을 말합니다.

4. 도큐멘트 부스팅

"Document boosting" 이라는 것은 특정 문서 자체에 대한 가중치가 또는 중요도가 높다는 것을 반영하여 적용하는 방식을 말합니다.

5. 커스텀 부스팅

"Custom boosting" 이라는 것은 임의 문서에 대한 가중치와 스코어에 대한 조작을 통해 적용하는 방식을 말합니다.

부스팅에 대한 구현 방법을 살펴 보았는데요.

이것들은 아래의 API를 통해서 구현 하게 됩니다.

바로 DisjunctionMaxQuery 와 CustomScoreQuery 입니다.

물론 lucene에서 제공하는 다른 API 또는 Elasticsearch 나 Solr 에서 제공하는 다양한 방식으로 구현이 가능 합니다.

DisjunctionMaxQuery 를 이용해서 구현 가능한 것은

1. 질의 부스팅

3. 필드 부스팅

정도로 보입니다.

반대로 CustomScoreQuery 는 다양하게 구현이 가능합니다.

Elasticsearch 기준으로 보면 FunctionScoreQuery + DisjunctionMaxQuery 를 섞어서 사용이 가능 하기 때문에 2. 색인 부스팅을 제외 하고는 다 구현이 가능 하다고 보면 될 것 같습니다.

자 이제 급 마무리를 해보겠습니다.

뭐 말은 만드는 사람 맘이고 해석도 하는 사람 맘이니 저랑 다르게 생각하시는 분들이 계실 겁니다.

다만 내용이 틀렸거나 잘못되었다면 좀 알려주세요. ^^;

[Dis Max Query]

- 질의 시점에 사용을 합니다.

- Field 에 대한 가중치를 주어 부스팅을 합니다.

- 구현하기 쉽습니다.

- 순수하게 검색엔진에 맡겨처 처리 합니다.

[Custom(Function) Score Query]

- 질의 시점에 사용을 합니다.

- Field, Document 에 대한 가중치를 주어 부스팅을 합니다.

- 제공하는 API 가 많기 때문에 어렵지 않습니다.

- 다양한 방법으로 구현이 가능 합니다.

- 다양한 알고리즘 또는 요건을 만족 시키기 위해 사용을 합니다.

- 문서 내 특정 값을 이용하여 부스팅에 적용 할 수 있습니다.

- 복잡할 수록 성능이 느려 질 수 있습니다.

과거에 fulltext 검색으로 사용하는 게시판류 서비스 뭐 이런데에서는 dis max query + query string 을 즐겨 사용하곤 했는데요.

지금은 좀 더 정교하게 부스팅을 하기 위해서 custom(function) score query 를 많이 사용하는것 같습니다.

특히 쇼핑몰 같은 경우는 dis max 보다는 custom score 가 적합한 query라고 생각 합니다.

저작자표시 비영리 변경금지

:

[Elasticsearch] qeuries, filters 에 대해..

Elastic/Elasticsearch 2014. 6. 10. 14:29

제 블로그 어디엔가 적었던것 같은데 쓰고도 찾지를 못하겠내요.. ㅡ.ㅡ;;

공유한 적이 있는지 없는지 기억이 안나서 그냥 다시 써 봅니다.

(치매 방지를 위해서... ㅎㅎ)

berlin buzz words 에서 clinton gormley 가 발표한 자료에 있는 내용입니다.

여기에 살짝 살만 붙혔습니다.

[Queries]

- relevance

- full text

- not cached

- slower

[Filters]

- boolean yes/no

- exact values

- cached

- faster

이 둘의 차이는 특성에 맞춰서 사용을 하셔야 합니다.

즉, 검색하고자 하는 문서들에 대한 ranking 이나 relevance document 의 결과를 얻고자 한다면 filter 를 먼저 사용하시면 안됩니다.

일반적인 웹문서 검색이나 쇼핑 상품 검색 같은 곳에서는 사용할 수 없겠죠.

- 결과에 relevant _score 가 반영 되어 있습니다.

filters 를 이용해서 사용하기 좋은 형태는 로그성 데이터 입니다.

즉, 문서간의 relevance 를 고려하지 않아도 되는 그런 문서에 적합합니다. 또한 성능도 훨씬 빠르겠죠.

- 결과에 relevant _score 가 반영 되어 있지 않습니다.

분석 및 통계에 활용하고자 한다면 query 보다 filter를 이용해 구현 하면 성능 향상에 도움을 받으실 수 있으니 참고 하시면 좋겠습니다.

:

[elasticsearch] query optimizing....

Elastic/Elasticsearch 2014. 1. 23. 15:23

원문 : https://speakerdeck.com/elasticsearch/query-optimization-go-more-faster-better

filters are fast, cached, composable, short-circuit
no score is calculated, only inclusion / exclusion

term, terms, range query 에 대해 term, terms, range filter 로 대체 하여 사용.

[from]

{
    "query" : {
        "term" : {
            "field" : "value"
        }
    }
}

[to]

{
    "query" : {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "term" : {
                    "field" : "value"
                }
            }
        }
    }
}

Top level filter is slow.

{
"query" : { … },
"filter" : { … }
}

Don't use this unless you need it
(only useful with facets)

Using Count (더 빠름)

[from]

/{index}/_search
{
"query" : { … },
"size" : 0
}

[to]

/{index}/_search?search_type=count
{
"query" : { … }
}

Rescore API

1. Query/filter to quickly find top N results
2. Rescore with complex logic to find top 10

Do not EVER use these in a search script.

[to]

_source.field

_fields.field

두개 항목은 disk 에서 읽기 때문에 느립니다.

[from]

doc[field]

in-memory field data 를 읽기 때문에 빠릅니다.

:

[Elasticsearch] Query DSL - Filters

Elastic/Elasticsearch 2013. 4. 18. 10:46

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/query-dsl/

- 많이 사용되는 것들로 진행 합니다.

[Filters]

기본적으로 filtered query 에서 동작 방식을 소개 했기 때문에 이 점을 이해하고 보셔야 합니다.

[and/or]

- 쿼리 결과에 대한 추가 쿼리의 and/or 연산을 수행 합니다.

- 쿼리 결과를 cache 하고 싶을 경우 _cache:true 설정을 하면 됩니다.

[bool]

- boolean 쿼리를 추가 수행 합니다.

[exists]

- 결과에 대해서 항상 cache 합니다.

[ids]

- ids 를 포함한 문서를 필터 합니다.

[limit]

- shard 당 문서 수를 제한 합니다.

[type]

- document/mapping type 에 대한 filter 합니다.

[missing]

- 문서의 특정 필드 값이 no value 인 것을 filter 합니다.

- 지정된 field 는 null_value 를 갖습니다.

- 예제가 직관적이기 떄문에 추가 합니다.

{
    "constant_score" : {
        "filter" : {
            "missing" : { 
                "field" : "user",
                "existence" : true,
                "null_value" : true
            }
        }
    }
}

[not]

- 질의된 결과에 대해서 추가로 주어진 not filter 로 match 된 문서를 제외 합니다.

[numeric range]

- range filter와 유사하며, 어떤 수의 범위를 갖습니다.

- 주어진 parameters 는 아래와 같습니다.

Name	Description
`from`	The lower bound. Defaults to start from the first.
`to`	The upper bound. Defaults to unbounded.
`include_lower`	Should the first from (if set) be inclusive or not. Defaults to `true`
`include_upper`	Should the last to (if set) be inclusive or not. Defaults to `true`.
`gt`	Same as setting `from` and `include_lower` to `false`.
`gte`	Same as setting `from` and `include_lower` to `true`.
`lt`	Same as setting `to` and `include_upper` to `false`.
`lte`	Same as setting `to` and `include_upper` to `true`.

[prefix]

- phrase query 와 유사하며, prefix query 참고

[query]

- 추가 query 를 생성 할 수 있습니다.

[range]

- range query 참고

[script]

- script 를 이용한 filter 를 적용 할 수 있습니다.

[term]

- term query 참고

[terms]

- terms query 참고

- execution mode 를 지원 합니다.

- 기본 plain 그리고 bool, and, or 지원

[nested]

- nested query 참고

※ filter 의 경우 기본 query 에서 제공 하는 것과 거의 동일 하며,

이 API 의 목적은 한번 질의한 결과에 대해 별도의 filtering 을 하기 위함 입니다.

:

[Elasticsearch] Query DSL

Elastic/Elasticsearch 2013. 4. 17. 17:41

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.

잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)

[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/query-dsl/

- 많이 사용되는 것들로 진행 합니다.

[Query]

[match]

- 기본 boolean type 과 or 연산

[multi match]

- 여러개의 field 에 대한 match query

-field 선언 시 boosting 옵션 추가 (^2 라는 의미는 2배 더 중요하다는 의미)

[bool]

- must : 문서에 매칭 되는 것이 무조건 나타나야 함.

- should : minimum_number_should_match 값을 통해서 문서에 매칭 되는 것이 나타나야 함.

- must_not : 문서에 매칭 정보가 있으면 안됨. (exclusive)

[boosting]

- query 의 결과를 효과적으로 감소 시킬 수 있다.

- 원문 예제가 이해하는데 직관적이라 추가 합니다.

{
    "boosting" : {
        "positive" : {
            "term" : {
                "field1" : "value1"
            }
        },
        "negative" : {
            "term" : {
                "field2" : "value2"
            }
        },
        "negative_boost" : 0.2
    }
}

- negative 조건에 따라 원하지 않는 문서에 대한 스코어를 떨어 트릴수 있습니다.

[constant_score]

- query 된 모든 문서의 score 를 동일하게 한다.

[dis_max]

- multi_field 에 단어 검색 시 유용하다.

- multiple field 에 같은 term 을 포함 하고 있을 경우 tie_breaker 값으로 더 좋은 결과 판정을 합니다.

- explain 을 떠보면서 검색 결과에 따른 score 변화를 확인해 보는게 좋을 것 같내요.

[field]

- query_string 버전의 단순화 query 이며, 대부분의 parameter 를 사용할 수 있습니다.

[filtered]

- query 결과에 대한 filter 를 적용 합니다.

- 예제가 직관적이라 추가 합니다.

{
    "filtered" : {
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "filter" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        }
    }
}

[flt : fuzzy like this][flt_field]

- 지정한 field 들에 대해서 like 검색을 지원 합니다.

_ _field 가 들어 간 것은 아래 파라미터 중 fields 만 없습니다.

Parameter	Description
`fields`	A list of the fields to run the more like this query against. Defaults to the `_all` field.
`like_text`	The text to find documents like it, required.
`ignore_tf`	Should term frequency be ignored. Defaults to `false`.
`max_query_terms`	The maximum number of query terms that will be included in any generated query. Defaults to `25`.
`min_similarity`	The minimum similarity of the term variants. Defaults to `0.5`.
`prefix_length`	Length of required common prefix on variant terms. Defaults to `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyzer`	The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

[fuzzy]

- Levenshtein algorithm 을 기반으로 하는 유사도 검색을 지원 한다.

[match_all]

- 모든 문서를 출력한다.

[mlt : more like this][mlt_field]

- 지정한 field 들에 대해서 like 검색을 지원 합니다.

- term 단위 like 검색이라는 것을 유의 해야함.

- 결국 지정된 field 에는 term_vector 설정이 되어 있어야 함

Parameter	Description
`fields`	A list of the fields to run the more like this query against. Defaults to the `_all` field.
`like_text`	The text to find documents like it, required.
`percent_terms_to_match`	The percentage of terms to match on (float value). Defaults to `0.3` (30 percent).
`min_term_freq`	The frequency below which terms will be ignored in the source doc. The default frequency is `2`.
`max_query_terms`	The maximum number of query terms that will be included in any generated query. Defaults to `25`.
`stop_words`	An array of stop words. Any word in this set is considered “uninteresting” and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that “a stop word is never interesting”.
`min_doc_freq`	The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to `5`.
`max_doc_freq`	The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
`min_word_len`	The minimum word length below which words will be ignored. Defaults to `0`.
`max_word_len`	The maximum word length above which words will be ignored. Defaults to unbounded (`0`).
`boost_terms`	Sets the boost factor to use when boosting terms. Defaults to `1`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyzer`	The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

[prefix]

- 매치된 문서는 prefix를 포함한 term 을 갖는다. (not_analyzed)

[query_string]

- query parser 를 이용해서 검색 함.

Parameter	Description
`query`	The actual query to be parsed.
`default_field`	The default field for query terms if no prefix field is specified. Defaults to the `index.query.default_field` index settings, which in turn defaults to `_all`.
`default_operator`	The default operator used if no explicit operator is specified. For example, with a default operator of `OR`, the query `capital of Hungary` is translated to `capital OR of OR Hungary`, and with default operator of `AND`, the same query is translated to `capital AND of AND Hungary`. The default value is `OR`.
`analyzer`	The analyzer name used to analyze the query string.
`allow_leading_wildcard`	When set, `*` or `?` are allowed as the first character. Defaults to `true`.
`lowercase_expanded_terms`	Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it `true`.
`enable_position_increments`	Set to `true` to enable position increments in result queries. Defaults to `true`.
`fuzzy_max_expansions`	Controls the number of terms fuzzy queries will expand to. Defaults to `50`
`fuzzy_min_sim`	Set the minimum similarity for fuzzy queries. Defaults to `0.5`
`fuzzy_prefix_length`	Set the prefix length for fuzzy queries. Default is `0`.
`phrase_slop`	Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is `0`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.
`analyze_wildcard`	By default, wildcards terms in a query string are not analyzed. By setting this value to `true`, a best effort will be made to analyze those as well.
`auto_generate_phrase_queries`	Default to `false`.
`minimum_should_match`	A percent value (for example `20%`) controlling how many “should” clauses in the resulting boolean query should match.
`lenient`	If set to `true` will cause format based failures (like providing text to a numeric field) to be ignored. (since 0.19.4).

Parameter	Description
`use_dis_max`	Should the queries be combined using `dis_max` (set it to `true`), or a `bool` query (set it to `false`). Defaults to `true`.
`tie_breaker`	When using `dis_max`, the disjunction max tie breaker. Defaults to `0`.

[range]

- string 은 TermRangeQuery

- numeric/date 는 NumericRangeQuery

Name	Description
`from`	The lower bound. Defaults to start from the first.
`to`	The upper bound. Defaults to unbounded.
`include_lower`	Should the first from (if set) be inclusive or not. Defaults to `true`
`include_upper`	Should the last to (if set) be inclusive or not. Defaults to `true`.
`gt`	Same as setting `from` to the value, and `include_lower` to `false`.
`gte`	Same as setting `from` to the value,and `include_lower` to `true`.
`lt`	Same as setting `to` to the value, and `include_upper` to `false`.
`lte`	Same as setting `to` to the value, and `include_upper` to `true`.
`boost`	Sets the boost value of the query. Defaults to `1.0`.

[term]

- 어떤 term 을 포함한 결과를 갖는다. (term 은 not_analyzed)

- 결국 하나의 word 라고 보면 이해하는데 도움이 됨.

[terms]

- term query 의 IN 절 기능을 제공 한다.

[wildcard]

- * 과 ? 로 매칭을 할 수 있다.

- * 은 SQL 에서 % 와 비슷하며, ? 는 정규표현식의 single character 와 비슷하다.

- 사용 시 주의 할 점은 *, ? 를 첫 시작으로 사용하지 말라는 것이다.

[nested]

- nested mapping 된 parent 문서에 대한 검색을 지원 한다.

- 예제가 직관적이라 추가 합니다.

{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}

{
    "nested" : {
        "path" : "obj1",
        "score_mode" : "avg",
        "query" : {
            "bool" : {
                "must" : [
                    {
                        "match" : {"obj1.name" : "blue"}
                    },
                    {
                        "range" : {"obj1.count" : {"gt" : 5}}
                    }
                ]
            }
        }
    }
}

[indices]

- multiple indices 로 검색을 수행 할 수 있다.

- 역시 예제가 직관적이라 추가 합니다.

{
    "indices" : {
        "indices" : ["index1", "index2"],
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "no_match_query" : {
            "term" : { "tag" : "kow" }
        }
    }
}

- no_match_query 에 none 을 set 하면 no document, all 을 set 하면 match_all 이 된다.

※ Queries 가 너무 많아서 Filters 는 별도로 분리 합니다.

:

Elasticsearch Query URI 예제 모음.

Elastic/Elasticsearch 2012. 12. 18. 22:39

- field search : http://localhost:9200/test/_search?q=msg:채팅&pretty=true

- multi field & sort & list search : http://localhost:9200/test/_search?q=msg:과장 AND rm_title:과장&sort=rm_ymdt:asc&from=0&size=10&pretty=true

- paging search & sort : http://localhost:9200/test/_search?source={"query":{"bool":{"must":[{"term":{"msg":"과장"}}],"must_not":[],"should":[]}},"from":0,"size":50,"sort":[{"rm_ymdt":"asc"}],"facets":{}}&pretty=true

- range search : http://localhost:9200/test/_search?source={"query":{"range":{"recv_ymdt":{"from":"20120820163946", "to":"20120911160444"}}}}&pretty=true

- http://localhost:9200/_plugin/head/ 이 페이지에서 structured query 를 통해 쿼리 생성이 가능 함.

:

jjeong

'query'에 해당되는 글 18건

[Elasticsearch - The Definitive Guide] The match Query

[Elasticsearch - The Definitive Guide] Validating Queries

[Lucene] CustomScoreQuery vs. DisjunctionMaxQuery

[Elasticsearch] qeuries, filters 에 대해..

[elasticsearch] query optimizing....

[Elasticsearch] Query DSL - Filters

Queries

Filters

[Elasticsearch] Query DSL

Queries

Filters

Elasticsearch Query URI 예제 모음.

티스토리툴바