[lucene]phrase query
Elastic/Elasticsearch 2013. 6. 19. 14:22http://www.avajava.com/tutorials/lessons/how-do-i-query-for-words-near-each-other-with-a-phrase-query.html
slop 관련 설명이 잘 되어 있어 공유 합니다.
Here are some foods that Deron likes: hamburger french fries steak mushrooms artichokes
Query: contents:"french fries" Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Query: contents:"hamburger steak" Number of hits: 0 Query: contents:"hamburger steak"~1 Number of hits: 0 Query: contents:"hamburger steak"~2 Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Query: contents:"hamburger steak"~3 Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for 'french fries' using QueryParser Type of query: BooleanQuery Query: contents:french contents:fries Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for '"french fries"' using QueryParser Type of query: PhraseQuery Query: contents:"french fries" Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt Searching for '"hamburger steak"~1' using QueryParser Type of query: PhraseQuery Query: contents:"hamburger steak"~1 Number of hits: 0 Searching for '"hamburger steak"~2' using QueryParser Type of query: PhraseQuery Query: contents:"hamburger steak"~2 Number of hits: 1 Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Let's talk briefly about the console output. The first phrase query searches for "french" and "fries" with a slop of 0, meaning that the phrase search ends up being a search for "french fries", where "french" and "fries" are next to each other. Since this exists in deron-foods.txt, we get 1 hit.
In the second query, we search for "hamburger" and "steak" with a slop of 0. Since "hamburger" and "steak" don't exist next to each other in either document, we get 0 hits. The third query also involves a search for "hamburger" and "steak", but with a slop of 1. These words are not within 1 word of each other, so we get 0 hits.
The fourth query searches for "hamburger" and "steak" with a slop of 2. In the deron-foods.txt file, we have the words "... hamburger french fries steak ...". Since "hamburger" and "steak" are within two words of each other, we get 1 hit. The fifth phrase query is the same search but with a slop of 3. Since "hamburger" and "steak" are withing three words of each other (they are two words from each other), we get a hit of 1.
The next four queries utilize QueryParser. Notice that in the first of the QueryParser queries, we get a BooleanQuery rather than a PhraseQuery. This is because we passed QueryParser's parse() method "french fries" rather than "\"french fries\"". If we want QueryParser to generate a PhraseQuery, the search string needs to be surrounded by double quotes. The next query does search for "\"french fries\"" and we can see that it generates a PhraseQuery (with the default slop of 0) and gets 1 hit in response to the query.
The last two QueryParser queries demonstrate setting slop values. We can see that the slop values can be set the following the double quotes of the search string with a tilde (~) following by the slop number.
As we have seen, phrase queries are a great way to produce queries that have a degree of leeway to them in terms of the proximity and ordering of the words to be searched. The total allowed spacing between words can be controlled using the setSlop() method of PhaseQuery.