[lucene] precision_step 설정.
Elastic/Elasticsearch 2014. 1. 14. 17:43field 가 number type 인 경우 이 설정을 어떻게 해주느냐에 따라 검색 성능에 영향을 줄 수 있습니다.
계산식은 아래 보는 바와 같습니다.
업데이트 : http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/NumericRangeQuery.html
원문은 : http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/NumericRangeQuery.html#precisionStepDesc
Precision Step
You can choose any precisionStep
when encoding values. Lower step values mean more precisions and so more terms in index (and index gets larger). The number of indexed terms per value is (those are generated by NumericTokenStream
):
indexedTermsPerValue = ceil(bitsPerValue / precisionStep)
As the lower precision terms are shared by many values, the additional terms only slightly grow the term dictionary (approx. 7% for precisionStep=4
), but have a larger impact on the postings (the postings file will have more entries, as every document is linked to indexedTermsPerValue
terms instead of one). The formula to estimate the growth of the term dictionary in comparison to one term per value:
On the other hand, if the precisionStep
is smaller, the maximum number of terms to match reduces, which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while executing the query is:
int 형 field 일 경우 4 bytes = 32 bits 로
indexedTermsPerValue = ceil(42 / 4)
maxQueryTerms = [ ( 8 - 1 ) * (16 - 1 ) * 2 ] + (16 - 1 ) = 7 * 15 * 2 + 15 = 225