[Elasticsearch - The Definitive Guide] Deep Pagination

Elastic/TheDefinitiveGuide 2015. 11. 30. 11:55

기초적인 내용이지만 그냥 기록해 봅니다.


원문링크)

https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html


원문 Snippet)

[Deep Paging in Distributed Systems]

To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the requesting node, which then sorts all 50 results in order to select the overall top 10.


Now imagine that we ask for page 1,000—results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The requesting node then sorts through all 50,050 results and discards 50,040 of them!


You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query.


요약하면,

- There is a good reason that web search engines don’t return more than 1,000 results for any query.

- 불필요한 자원 및 연산에 대한 낭비로 문제가 될 수 있다.


이런 문제를 해결 하기 위해서는 cache 솔루션과 잘 조합해서 사용하시면 그나마 문제를 해결 하실수는 있습니다.

그럼에도 불구하고 근본적인 해결책은 아닙니다.

: