'Optimize'에 해당되는 글 4건

  1. 2014.08.04 [ElasticSearch] Refresh/Flush/Optimize - by elasticsearch.org
  2. 2013.09.17 [Elasticsearch] Bulk Indexing 후 Data Consistency 관리.
  3. 2013.04.17 [elasticsearch] Indices API - Optimize
  4. 2013.02.13 This is about elasticsearch optimization.

[ElasticSearch] Refresh/Flush/Optimize - by elasticsearch.org

Elastic/Elasticsearch 2014. 8. 4. 17:48

원본) http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/inside-a-shard.html


1. near real-time search

refresh apiedit

In Elasticsearch, this lightweight process of writing and opening a new segment is called a refresh. By default, every shard is refreshed automatically once every second. This is why we say that Elasticsearch has near real-time search: document changes are not visible to search immediately, but will become visible within one second.


2. making changes persistent

flush apiedit

The action of performing a commit and truncating the translog is known in Elasticsearch as a flush. Shards are flushed automatically every 30 minutes, or when the translog becomes too big. See thetranslog documentation for settings that can be used to control these thresholds.


3. segment merging 

optimize apiedit

The optimize API is best described as the forced merge API. It forces a shard to be merged down to the number of segments specified in the max_num_segments parameter. The intention is to reduce the number of segments (usually to 1) in order to speed up search performance.



:

[Elasticsearch] Bulk Indexing 후 Data Consistency 관리.

Elastic/Elasticsearch 2013. 9. 17. 12:35

벌크 인덱싱 후 간혹 샤드간에 데이터가 왔다 갔다 할 때가 있습니다.

이건 elasticsearch 색인이 잘못 되었다고 보시면 안됩니다.

인덱스와 샤드의 status 정보를 보면

- num_docs

- max_docs

- deleted_docs

이런게 있죠.


기본적으로 벌크 색인 시 empty index 에 색인을 하게 될 텐데요.

색인 도중 문제가 발생해서 일부 색인 데이터가 들어가고 다시 같은 인덱스에 색인을 하게 되면  updateDocument 를 실행 하게 됩니다.

뭐 루씬 API 보시면 이제 add/update 가 하나의 API 로 동작 하긴 합니다.

암튼 이렇게 되다 보니 deleted_docs 가 발생 하게 되고 num_docs 와 max_docs 수치가 안맞게 됩니다.

즉, num_docs = max_docs - deleteed_docs 와 같습니다.


이런 관계로 색인 완료 시 recovery 또는 optimize 작업을 통해서 data consistency 를 보정해 주어야 합니다.

그럼 이걸 어떻게 할까요?


아래 명령어로 해보시고  ES  API 문서 참고 하시면 되겠습니다.

(거의 대부분 default value 입니다.)


[Java API]

new OptimizeRequest()

    .indices(indice)

    .flush(true)

    .onlyExpungeDeletes(false)

    .refresh(true)

    .waitForMerge(true)

    .operationThreading(BroadcastOperationThreading.THREAD_PER_SHARD)

    .maxNumSegments(1)



[Rest API]

curl -XPOST 'http://localhost:9200/indice/_optimize?only_expunge_deletes=false&max_num_segments=1&refresh=true&flush=true&wait_for_merge=true'



[Recovery]

curl -XPOST 'http://localhost:9200/indice/_close'

curl -XPOST 'http://localhost:9200/indice/_open'


client.admin()

    .indices()

    .close(new CloseIndexRequest("indice"));


client.admin()

    .indices()

    .close(new OpenIndexRequest("indice"));



[Reference]

http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize/

http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close/


:

[elasticsearch] Indices API - Optimize

Elastic/Elasticsearch 2013. 4. 17. 12:52

본 문서는 개인적인 테스트와 elasticsearch.org 그리고 community 등을 참고해서 작성된 것이며,

정보 교환이 목적입니다.


잘못된 부분에 대해서는 지적 부탁 드립니다.

(예시 코드는 성능 및 보안 검증이 되지 않았습니다.)



[elasticsearch API 리뷰]

원문 링크 : http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize/


이 API 는 indices 를 최적화 시킨다.

검색 시 성능 향상을 가져 옴.


아래는 원문의 Request Parameters 이다.

Request Parameters

The optimize API accepts the following request parameters:

NameDescription
max_num_segmentsThe number of segments to optimize to. To fully optimize the index, set it to 1. Defaults to simply checking if a merge needs to execute, and if so, executes it.
only_expunge_deletesShould the optimize process only expunge segments with deletes in it. In Lucene, a document is not deleted from a segment, just marked as deleted. During a merge process of segments, a new segment is created that does not have those deletes. This flag allow to only merge segments that have deletes. Defaults to false.
refreshShould a refresh be performed after the optimize. Defaults to true.
flushShould a flush be performed after the optimize. Defaults to true.
wait_for_mergeShould the request wait for the merge to end. Defaults to true. Note, a merge can potentially be a very heavy operation, so it might make sense to run it set to false.

- 설명이 잘나와 있어서 요약만 합니다.

- fully optimize 를 위해서는 max_num_segments : 1

- only_expunge_deletes 가 true 이면 삭제 마킹만

- wait_for_merge false 를 이용해서 대용량 처리


파라미터를 이용한 샘플 코드를 보기로 하겠습니다.


[Java 예제코드]

response = builder.setIndices("blog")

.setMaxNumSegments(1) // full optimize 를 위해서는 1, 기본은 설정을 하지 않으면 simply checking 을 

.setOnlyExpungeDeletes(false) // 기본 false

.setWaitForMerge(true)

.setOperationThreading(BroadcastOperationThreading.THREAD_PER_SHARD)

.execute()

.actionGet();

log.debug("{}", response.getSuccessfulShards());


OptimizeRequest 와 OptimizeRequestBuilder 이 두가지를 이용해서 구현 가능 함.


:

This is about elasticsearch optimization.

Elastic/Elasticsearch 2013. 2. 13. 16:11

Summary is,

- JVM heap size is setup 50% on your system memory.

- index store type is setup by mmapfs. (in case of 64bits solaris)

- adjust index cache config. 

- adjust index merge config.

 

http://blog.bugsense.com/post/35580279634/indexing-bigdata-with-elasticsearch


http://www.slideshare.net/kucrafal/scaling-massive-elastic-search-clusters-rafa-ku-sematext

Page 27.

    index.cache.field.max_size

    index.cache.field.expire

    index.cache.field.type: soft

Page 30.

    JVM Optimization

    -XX:+UseParNewGC

    -XX:+UseConcMarkSweepGC

    -XX:+CMSParallelRemarkEnabled

    

https://gist.github.com/deverton/2970285


JVM Swap 방지

    bootstrap.mlockall: true


http://stackoverflow.com/questions/13757398/elasticsearch-bulk-indexing-gets-slower-over-time-with-constant-number-of-indexe    

    Set the ES_HEAP_SIZE environment variable so that the JVM uses the same value for minimum and maximum memory. Configuring the JVM to have different minimum and maximum values means that each time the JVM needs additional memory (up to the maximum), it will block the Java process to allocate it. Combined with the old Java version, this explains the pauses that our nodes exhibited when introduced to higher load and continuous memory allocation when they were opened up to public searches. The elasticsearch team recommends a setting of 50% of system RAM.    

    

http://www.elasticsearch.org/guide/reference/index-modules/store.html

        using  index.store.type: mmapfs

        

http://jprante.github.com/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html  

 

       





: