queryWeight, queryNorm, fieldWeight, fieldNorm

Elastic/Elasticsearch 2013. 12. 6. 10:55

reference : http://grokbase.com/t/lucene/solr-user/12cdvgz48t/score-calculation

queryWeight = the impact of the query against the field
implementation: boost(query)*idf*queryNorm


boost(query) = boost of the field at query-time
Implication: hits in fields with higher boost get a higher score
Rationale: a term in field A could be more relevant than the same term in field B


idf = inverse document frequency = measure of how often the term appears across the index for this field
implementation: log(numDocs/(docFreq+1))+1
Implication: the greater the occurrence of a term in different documents, the lower its score
Rationale: common terms are less important than uncommon ones
numDocs = the total number of documents in the index, not including those that are marked as deleted but have not yet been purged. This is a constant (the same value for all documents in the index).
docFreq = the number of documents in the index which contain the term in this field. This is a constant (the same value for all documents in the index containing this field)


queryNorm = normalization factor so that queries can be compared
implementation: 1/sqrt(sumOfSquaredWeights)
Implication: doesn't impact the relevancy of this result
Rationale: queryNorm is not related to the relevance of the document, but rather tries to make scores between different queries comparable. This value is equal for all results of the query


fieldWeight = the score of a term matching the field
implementation: tf*idf*fieldNorm


tf = term frequency in a field = measure of how often a term appears in the field
implementation: sqrt(freq)
Implication: the more frequent a term occurs in a field, the greater its score
Rationale: fields which contains more of a term are generally more relevant
freq = termFreq = amount of times the term occurs in the field for this document


fieldNorm = impact of a hit in this field
implementation: lengthNorm*boost(index)
lengthNorm = measure of the importance of a term according to the total number of terms in the field
implementation: 1/sqrt(numTerms)
Implication: a term matched in fields with less terms have a higher score
Rationale: a term in a field with less terms is more important than one with more
numTerms = amount of terms in a field
boost (index) = boost of the field at index-time
Implication: hits in fields with higher boost get a higher score
Rationale: a term in field A could be more relevant than the same term in field B


maxDocs = the number of documents in the index, including those that are marked as deleted but have not yet been purged. This is a constant (the same value for all documents in the index)
Implication: (probably) doesn't play a role in the scoring calculation


coord = number of terms in the query that were found in the document (omitted if equal to 1)
implementation: overlap/maxOverlap
Implication: of the terms in the query, a document that contains more terms will have a higher score
Rationale: documents that match the most optional terms score highest
overlap = the number of query terms matched in the document
maxOverlap = the total number of terms in the query


FunctionQuery = could be any kind of custom ranking function, which outcome is added to, or multiplied with the default rank score.
Implication: various


Look at the EXPLAIN information to see how the final score is calculated.

: