'Term'에 해당되는 글 3건

  1. 2021.10.27 [Elasticsearch] Term vs Terms Query
  2. 2015.12.10 [Elasticsearch - The Definitive Guide] The match Query
  3. 2013.01.23 루씬 2.4.3 Field options for term vectors

[Elasticsearch] Term vs Terms Query

Elastic/Elasticsearch 2021. 10. 27. 16:14

보셔야 하는 클래스는

- TermQueryBuilder

- TermsQueryBuilder

입니다.

 

두 Query 의 큰 차이는 단독으로 사용 되었을 때 Scoring 이 어떻게 되느냐 인데요.

Term 은 Score 계산이 되어서 나오고 Terms 는 Constant Score Query 처럼 1.0 으로 나온다는 것입니다.

 

코드를 좀 더 따라 가다 보면 

- MapperFieldType

클래스 내 Query API 들에 대한 Interface 나 Implement 코드를 확인해 보실 수 있습니다.

 

아래는 Terms Query 에 대한 코드를 가져온 내용입니다.

    /** Build a constant-scoring query that matches all values. The default implementation uses a
     * {@link ConstantScoreQuery} around a {@link BooleanQuery} whose {@link Occur#SHOULD} clauses
     * are generated with {@link #termQuery}. */
    public Query termsQuery(Collection<?> values, @Nullable SearchExecutionContext context) {
        BooleanQuery.Builder builder = new BooleanQuery.Builder();
        for (Object value : values) {
            builder.add(termQuery(value, context), Occur.SHOULD);
        }
        return new ConstantScoreQuery(builder.build());
    }

뭐 혼자 기억 하기 위한 기록 이라서 이 정도까지만 기록해 두겠습니다.

 

:

[Elasticsearch - The Definitive Guide] The match Query

Elastic/TheDefinitiveGuide 2015. 12. 10. 11:53

Match Query 와 Term Query 가 어떻게 다른지 간단하게 정리하는 차원에서 기록 합니다.


원문링크)

https://www.elastic.co/guide/en/elasticsearch/guide/current/match-query.html


Match Query Flow)

1. Check the field type

2. Analyze the query string (term query 로 재실행 됩니다.)

3. Find matching docs

4. Score each doc


Term Query Flow)

1. Find matching docs

2. Score each doc


보시면 아시겠지만 match query 보다 term  query가 수행 단계가 적습니다.

결국 match  query 를 실행 하더라도 term query 로 query rewrite 되기 때문에 검색 서비스 개발 시 잘 판단해서 사용하시면 좋을 듯 합니다.


보통은 front end 에서 smart query 또는 query preprocessing 이라고 해서 query stirng 에 대한 1차 가공 후 실제 검색엔진으로 질의시에는 term query  형태로 사용을 많이 합니다.

:

루씬 2.4.3 Field options for term vectors

Elastic/Elasticsearch 2013. 1. 23. 19:02
퇴근 하면서 급하게 올리다 보니 아무런 내용도 없이 그냥 스크랩 내용만 등록을 했내요.

[요약하면]
Term vectors are a mix between an indexed field and a stored field. They’re similar to a stored field because you can quickly retrieve all term vector fields for a given document: term vectors are keyed first by document ID . But then, they’re keyed secondarily by term, meaning they store a miniature inverted index for that one document. Unlike a stored field, where the original

[어떤 경우에 사용하지]
Sometimes when you index a document you’d like to retrieve all its unique terms at search time. One common use is to speed up highlighting the matched tokens in stored fields. (Highlighting is covered more in sections 8.3 and 8.4.) Another use is to enable a link, “Find similar documents,” that when clicked runs a new search using the salient terms in an original document. Yet another example is automatic categorization of documents. Section 5.9 shows concrete examples of using term vectors once they’re in your index.


2.4.3 Field options for term vectors
Sometimes when you index a document you’d like to retrieve all its unique terms at search time. One common use is to speed up highlighting the matched tokens in stored fields. (Highlighting is covered more in sections 8.3 and 8.4.) Another use is to enable a link, “Find similar documents,” that when clicked runs a new search using the salient terms in an original document. Yet another example is automatic categorization of documents. Section 5.9 shows concrete examples of using term vectors once they’re in your index.
But what exactly are term vectors? Term vectors are a mix between an indexed field and a stored field. They’re similar to a stored field because you can quickly retrieve all term vector fields for a given document: term vectors are keyed first by document ID . But then, they’re keyed secondarily by term, meaning they store a miniature inverted index for that one document. Unlike a stored field, where the original
String content is stored verbatim, term vectors store the actual separate terms that were produced by the analyzer, allowing you to retrieve all terms for each field, and the frequency of their occurrence within the document, sorted in lexicographic order. Because the tokens coming out of an analyzer also have position and offset information (see section 4.2.1), you can choose separately whether these details are also stored in your term vectors by passing these constants as the fourth argument to the Field constructor:
TermVector.YES —Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information
TermVector.WITH_POSITIONS —Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets
TermVector.WITH_OFFSETS —Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions
TermVector.WITH_POSITIONS_OFFSETS —Stores unique terms and their counts, along with positions and offsets
TermVector.NO —Doesn’t store any term vector information
Note that you can’t index term vectors unless you’ve also turned on indexing for the field. Stated more directly: if Index.NO is specified for a field, you must also specify
TermVector.NO .

We’re done with the detailed options to control indexing, storing, and term vec-tors. Now let’s see how you can create a field with values other than String .


: