'Elastic/Elasticsearch'에 해당되는 글 385건

  1. 2014.08.26 [ElasticSearch] 자꾸 까먹어..ㅡ.ㅡ;; (preference)
  2. 2014.08.22 [ElasticSearch] start flow..
  3. 2014.08.11 [ElasticSearch] High CPU usage when idle #1940
  4. 2014.08.07 [ElasticSearch] float & double
  5. 2014.08.04 [ElasticSearch] 대용량 데이터 색인 시 optimize 튜닝 관점
  6. 2014.08.04 [ElasticSearch] Refresh/Flush/Optimize - by elasticsearch.org
  7. 2014.07.17 [ElasticSearch] primary / replica shard 활용 팁.
  8. 2014.07.15 [ElasticSearch] 색인 성능..
  9. 2014.07.02 [ElasticSearch] shard allocation 설정.
  10. 2014.07.02 [Elasticsearch] mapping type 템플릿 (numeric, string, date)

[ElasticSearch] 자꾸 까먹어..ㅡ.ㅡ;; (preference)

Elastic/Elasticsearch 2014. 8. 26. 14:26


http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-preference.html


preference

Controls a preference of which shard replicas to execute the search request on. By default, the operation is randomized between the shard replicas.

The preference is a query string parameter which can be set to:

_primary

The operation will go and be executed only on the primary
 shards.

_primary_first

The operation will go and be executed on the primary shard,
and if not available (failover), will execute on other shards.

_local

The operation will prefer to be executed on a local allocated
shard if possible.

_only_node:xyz

Restricts the search to execute only on a node with the
provided node id (xyz in this case).

_prefer_node:xyz

Prefers execution on the node with the provided node id
(xyz in this case) if applicable.

_shards:2,3

Restricts the operation to the specified shards. (2 and 3 
in this case). This preference can be combined
with other preferences but it has to appear first: _shards:2,3;_primary

Custom (string) value

A custom value will be used to guarantee that
the same shards will be used for the same custom value.
This can help with "jumping values" when hitting different
shards in different refresh states. A sample value can be
something like the web session id, or the user name.


:

[ElasticSearch] start flow..

Elastic/Elasticsearch 2014. 8. 22. 12:34

ElasticSearch 를 실행 하게 되면 아래와 같은 flow 로 실행이 됩니다.


Elasticsearch

      |

      V

Bootstrap

      |

      V

NodeBuilder

      |

      V

InternalNode


실제 node 의 실행은 InternalNode 의 start() 에서 이루어 집니다.

여기서 필요한 service 와 module 등록하고 실행 되는 것입니다.

:

[ElasticSearch] High CPU usage when idle #1940

Elastic/Elasticsearch 2014. 8. 11. 10:26

https://github.com/elasticsearch/elasticsearch/issues/1940

:

[ElasticSearch] float & double

Elastic/Elasticsearch 2014. 8. 7. 10:38

별건 아니구요..

1.0 은 실수 입니다.
그렇죠!

그럼 이게 float 일까요? double 일까요?
이런 멍청한 짓을 어제 했었습니다. ㅡ.ㅡ;;

ES 에 float 으로 정의해 놓고 데이터를 1.0 으로 넣었습니다.
그냥 ES 에서는 제대로 리턴해 주는데 이걸 JDBC ResultSet 에서 type 검사를 해서 넘겨주려고 하다보니 이게 double 로 나와서 type mismatch 로 인한 로직 오류를 범했내요.

바닥 공사가 부실해서 큰일 입니다.
이러다 싱크홀 생기면 큰일인데....

그래서 저는 그냥 double 을 사용하기로 결심했습니다. :)

:

[ElasticSearch] 대용량 데이터 색인 시 optimize 튜닝 관점

Elastic/Elasticsearch 2014. 8. 4. 18:04

elasticsearch.org 에 the definitive guide 에 좋은 내용이 있어 공유 합니다.

최근에 제가 대용량 데이터를 색인 하면서 사용한 방법 이기도 합니다.


아래 링크가 원문 입니다.

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/inside-a-shard.html


제일 중요한 부분은 장비스펙과 데이터 크기에 따른 shard sizing 입니다.

이건 추후 공유 드리기로 하구요. ^^;


원문 대비 제가 활용한 방법에 대해서 간단하게 정리 하도록 하겠습니다.


[한 줄 정리]

- optimize 대신  refresh 와  flush 를 이용한다.


※ optimize 실행은 가급적 대용량 데이터 색인 및 실시간 색인에서는 사용을 피하는 것이 좋습니다.

- 글에도 있지만 대용량 데이터 색인 시 I/O 자원과 CPU 자원을 많이 사용합니다.

The merging of big segments can use a lot of I/O and CPU, which can hurt search performance if left unchecked. By default, Elasticsearch throttles the merge process so that search still has enough resources available to perform well.



Be aware that merges triggered by the optimize API are not throttled at all. They can consume all of the I/O on your nodes, leaving nothing for search and potentially making your cluster unresponsive. If you plan on optimizing an index, you should use shard allocation (see Appendix A, TODO) to first move the index to a node where it is safe to run.


이런 이유로 저는 대용량 데이터 색인 및 검색을 위해서 optimize 를 수행하지 않고 refresh 와 flush 를 조합해서 bulk request 를 수행 하도록 했습니다.
(거의 색인 시간 만큼 optimize 수행을 하다 보니 시스템 리소스를 효율적으로 사용하는데 제한이 있었습니다.)

[Bulk Request Flow]
Step 1. update settings
refresh_interval disable (-1)
replica disable (0)
Step 2. bulk request
Step 3. flush & refresh
Step 4. update settings
refresh_interval rollback
replica rollback
- 여기서 step 4 에서 replica rollback 부분은 서비스중에 돌리기 어려울 경우 별도 scheduling 을 통해서 rollback 해줘도 됩니다.
- 다만, 장애 발생 시 서비스에 심각한 문제가 되는 경우라면 바로 설정을 적용해 주어야 합니다.


[Bulk Request Size]

원문) http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/bulk.html

how big is too big?edit

The entire bulk request needs to be loaded into memory by the node which receives our request, so the bigger the request, the less memory available for other requests. There is an optimal size of bulk request. Above that size, performance no longer improves and may even drop off.

The optimal size, however, is not a fixed number. It depends entirely on your hardware, your document size and complexity, and your indexing and search load. Fortunately, it is easy to find this sweetspot:

Try indexing typical documents in batches of increasing size. When performance starts to drop off, your batch size is too big. A good place to start is with batches of between 1,000 and 5,000 documents or, if your documents are very large, with even smaller batches.

It is often useful to keep an eye on the physical size of your bulk requests. One thousand 1kB documents is very different than one thousand 1MB documents. A good bulk size to start playing with is around 5-15MB in size.


:

[ElasticSearch] Refresh/Flush/Optimize - by elasticsearch.org

Elastic/Elasticsearch 2014. 8. 4. 17:48

원본) http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/inside-a-shard.html


1. near real-time search

refresh apiedit

In Elasticsearch, this lightweight process of writing and opening a new segment is called a refresh. By default, every shard is refreshed automatically once every second. This is why we say that Elasticsearch has near real-time search: document changes are not visible to search immediately, but will become visible within one second.


2. making changes persistent

flush apiedit

The action of performing a commit and truncating the translog is known in Elasticsearch as a flush. Shards are flushed automatically every 30 minutes, or when the translog becomes too big. See thetranslog documentation for settings that can be used to control these thresholds.


3. segment merging 

optimize apiedit

The optimize API is best described as the forced merge API. It forces a shard to be merged down to the number of segments specified in the max_num_segments parameter. The intention is to reduce the number of segments (usually to 1) in order to speed up search performance.



:

[ElasticSearch] primary / replica shard 활용 팁.

Elastic/Elasticsearch 2014. 7. 17. 16:49

ElasticSearch 발표 자료들 중에서 괜찮은 내용이 있어서 올려 봅니다.


- More primary shards

- faster indexing

- scalability

- More replicas

- faster searching

- more failover


원본은 https://speakerdeck.com/asm89/elasticsearch

:

[ElasticSearch] 색인 성능..

Elastic/Elasticsearch 2014. 7. 15. 14:36

장비 : 32 코어, 64G, 6대

파일 크기 : 8.5GB

문서 수 : 66,661,310개

문서 당 : 100개 이상 필드에 20개 이상 색인(not_analyzed) 필드의 경우..

노드당 : 초당 26,453개 색인.

총 색인 시간 : 7분

:

[ElasticSearch] shard allocation 설정.

Elastic/Elasticsearch 2014. 7. 2. 17:36

shard rebalancing 제어하기 위해... 


curl -XPUT localhost:19200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.enable" : "none" } }'

curl -XPUT localhost:19200/_cluster/settings -d '{ "persistent" : { "index.routing.allocation.enable" : "none" } }'

:

[Elasticsearch] mapping type 템플릿 (numeric, string, date)

Elastic/Elasticsearch 2014. 7. 2. 15:49

# numeric type

"" : {"type" : "long", "store" : "no", "index" : "not_analyzed", "index_options" : "docs", "ignore_malformed" : true, "include_in_all" : false},

"" : {"type" : "long", "store" : "yes", "index" : "no", "index_options" : "docs", "ignore_malformed" : true, "include_in_all" : false},


# date type

"" : {"type" : "date", "format" : "yyyyMMddHHmmss", "store" : "no", "index" : "not_analyzed", "index_options" : "docs", "ignore_malformed" : true, "include_in_all" : false},

"" : {"type" : "date", "format" : "yyyyMMddHHmmss", "store" : "yes", "index" : "no", "index_options" : "docs", "ignore_malformed" : true, "include_in_all" : false},


# string type

"" : {"type" : "string", "store" : "no", "index" : "not_analyzed", "norms": {"enabled" : false}, "index_options" : "docs", "include_in_all" : false},

"" : {"type" : "string", "store" : "yes", "index" : "no", "include_in_all" : false},

: