## queryWeight, queryNorm, fieldWeight, fieldNorm

Elastic/Elasticsearch 2013.12.06 10:55**reference : http://grokbase.com/t/lucene/solr-user/12cdvgz48t/score-calculation**

queryWeight = the impact of the query against the field

implementation: boost(query)*idf*queryNorm

boost(query) = boost of the field at query-time

Implication: hits in fields with higher boost get a higher score

Rationale: a term in field A could be more relevant than the same term in field B

idf = inverse document frequency = measure of how often the term appears across the index for this field

implementation: log(numDocs/(docFreq+1))+1

Implication: the greater the occurrence of a term in different documents, the lower its score

Rationale: common terms are less important than uncommon ones

numDocs
= the total number of documents in the index, not including those that
are marked as deleted but have not yet been purged. This is a constant
(the same value for all documents in the index).

docFreq = the number
of documents in the index which contain the term in this field. This is
a constant (the same value for all documents in the index containing
this field)

queryNorm = normalization factor so that queries can be compared

implementation: 1/sqrt(sumOfSquaredWeights)

Implication: doesn't impact the relevancy of this result

Rationale:
queryNorm is not related to the relevance of the document, but rather
tries to make scores between different queries comparable. This value is
equal for all results of the query

fieldWeight = the score of a term matching the field

implementation: tf*idf*fieldNorm

tf = term frequency in a field = measure of how often a term appears in the field

implementation: sqrt(freq)

Implication: the more frequent a term occurs in a field, the greater its score

Rationale: fields which contains more of a term are generally more relevant

freq = termFreq = amount of times the term occurs in the field for this document

fieldNorm = impact of a hit in this field

implementation: lengthNorm*boost(index)

lengthNorm = measure of the importance of a term according to the total number of terms in the field

implementation: 1/sqrt(numTerms)

Implication: a term matched in fields with less terms have a higher score

Rationale: a term in a field with less terms is more important than one with more

numTerms = amount of terms in a field

boost (index) = boost of the field at index-time

Implication: hits in fields with higher boost get a higher score

Rationale: a term in field A could be more relevant than the same term in field B

maxDocs
= the number of documents in the index, including those that are marked
as deleted but have not yet been purged. This is a constant (the same
value for all documents in the index)

Implication: (probably) doesn't play a role in the scoring calculation

coord = number of terms in the query that were found in the document (omitted if equal to 1)

implementation: overlap/maxOverlap

Implication: of the terms in the query, a document that contains more terms will have a higher score

Rationale: documents that match the most optional terms score highest

overlap = the number of query terms matched in the document

maxOverlap = the total number of terms in the query

FunctionQuery
= could be any kind of custom ranking function, which outcome is added
to, or multiplied with the default rank score.

Implication: various

Look at the EXPLAIN information to see how the final score is calculated.

#### 'Elastic > Elasticsearch' 카테고리의 다른 글

[Elasticsearch] shard reroute 하기.. (0) | 2013.12.16 |
---|---|

elasticsearch-hadoop 기능 테스트. (0) | 2013.12.09 |

queryWeight, queryNorm, fieldWeight, fieldNorm (0) | 2013.12.06 |

[elasticsearch] unassinged shard reroute... (0) | 2013.11.13 |

[Elasticsearch] logging.yml appender 추가 하기. (0) | 2013.10.23 |

[Lucene] score 계산식 알아보기. (0) | 2013.10.11 |