'elasticsearch'에 해당되는 글 420건

  1. 2013.09.04 [elasticsearch] filter & facet 사용 시 주의 사항.
  2. 2013.09.02 [Elasticsearch] IndexCacheModule...
  3. 2013.09.02 [Elasticsearch] FiledMapper Default.
  4. 2013.08.23 [Elasticsearch] Highlight 기능.
  5. 2013.08.14 [Elasticsearch] 두 가지 timeout 설정..
  6. 2013.08.14 [Elasticsearch] json bulk insert.
  7. 2013.08.13 [Elasticsearch] threadpool 설정.
  8. 2013.08.12 [Elasticsearch] NetworkService.java 설정 정보.
  9. 2013.08.12 [Elasticsearch] facet script plugin 작성 예제.
  10. 2013.08.12 [Elasticsearch] 0.90.3 release..

[elasticsearch] filter & facet 사용 시 주의 사항.

Elastic/Elasticsearch 2013. 9. 4. 18:41

주의 사항이라고 할 것 까지는 없지만.. 가끔 당황 스러울때가 있으니 참고 정도 하시면 될 것 같습니다.

elasticsearch 에서

- filter 기능은 검색 질의 후 결과 셋에 대해서 filtering 을 해주는 기능

- facet 의 검색 질의 후 결과에 대한 groupby 연산을 해주는 기능

이라고 보시면 됩니다.


검색 질의 시 filter 기능을 사용하지 않고 구한 facet 결과는 당연히 질의 결과에 대한 facet result 를 줄 것이구요.

filter 기능을 적용하고 동일한 facet 을 적용했을 경우 아래와 같이 처리 됩니다.

- 검색 결과는 filtering 된 값이 나옴

- facet 결과는 초기 검색 식에 의해 나온 결과 셋의 facet result 가 나옴


filter 한 결과와 동일하게 나오게 하고 싶으시다면 filter 조건을 facetFilter 로 적용해 주시면 됩니다.

:

[Elasticsearch] IndexCacheModule...

Elastic/Elasticsearch 2013. 9. 2. 17:40

index cache module 뭐가 있는지 한번 볼까요?


        new FilterCacheModule(settings).configure(binder());

        new IdCacheModule(settings).configure(binder());

        new QueryParserCacheModule(settings).configure(binder());

        new DocSetCacheModule(settings).configure(binder());


검색 질의와 결과 셋에 대한 cache 가 위에서 보면

        new QueryParserCacheModule(settings).configure(binder());

        new DocSetCacheModule(settings).configure(binder());

이 두 놈이 비슷해 보입니다.


근데 소스 들어가서 보면 정말 원하는 내용은 아니라는 걸 알 수 있습니다.

QueryParserCacheModule 은 요청한 질의에 대한 Query 를 매번 만들지 않고 cache 에서 가져 올 수 있는 기능이라 뭐 괜찮아 보이구요.

DocSetCacheModule 은 docId 기반의 cache 라서 ㅎㅎ document 단위로 cache 정보를 읽어 오는 기능 입니다.

즉, 검색 질의에 대한 결과셋을 cache 하고 싶으신 경우는 그냥 memcached 나 redis 같은걸 이용해서 cache 하시면 되겠습니다.

:

[Elasticsearch] FiledMapper Default.

Elastic/Elasticsearch 2013. 9. 2. 14:41

[IndexFieldMapper.java]

            FIELD_TYPE.setIndexed(true);

            FIELD_TYPE.setTokenized(false);

            FIELD_TYPE.setStored(false);

            FIELD_TYPE.setOmitNorms(true);

            FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);


보시면 아시겠죠.. 

indexed true & tokenized false 니까 not_analyzed  설정이 되는 것이구요.

당연히 store no 가 되겠내요.


그냥 참고하세요.

:

[Elasticsearch] Highlight 기능.

Elastic/Elasticsearch 2013. 8. 23. 16:11

[lucene]

Highlighter.java

FastVectorHighlighter.java


[elasticsearch]

PlainHighlighter.java

FastVectorHighlighter.java


실제 term 에 highlight tag 를 적용하는 건.. highlightTerm() method 입니다.

SimpleHTMLFormatter.java

GradientFormatter.java

참고하세요.


이넘들이 highlight 하기 위해서는 기본 두 개의 정보가 필요 합니다.

CharTermAttribute.java

OffsetAttribute.java


이것들이 어떻게 동작하는지는 소스코드를 보시면 되겠습니다.

간단하게는... 

1. stored 원문을 가져옵니다.

2. char term 과 offset 정보를 이용해서 원문을 재구성 합니다.

2.1 재구성 할때 highlightTerm() 에서 재구성된 term 을 만들어 줍니다.


뭐 상세한건 소스를 보시는게 건강에 좋습니다. :)

:

[Elasticsearch] 두 가지 timeout 설정..

Elastic/Elasticsearch 2013. 8. 14. 16:31

elasticsearch 에서 검색 수행 시 두 가지 timeout 설정이 있습니다.

1. shard 간 데이터를 aggregation 할 때의 timeout

2. 검색 수행 시 timeout

잘 활용 하시면 응답 시간 관리에 도움이 됩니다. :)

ㅎㅎ

:

[Elasticsearch] json bulk insert.

Elastic/Elasticsearch 2013. 8. 14. 12:09

참고 사이트 : http://www.elasticsearch.org/guide/reference/api/bulk/


[Run]

curl -s -XPOST 'http://192.168.56.104:9200/test/_bulk' --data-binary @test.json


[test.json]

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" , "_routing" : "1" } }

{"docid":1, "title":"title1"}

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" , "_routing" : "1" } }

{"docid":2, "title":"title2"}

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" , "_routing" : "1" } }

{"docid":3, "title":"title3"}



:

[Elasticsearch] threadpool 설정.

Elastic/Elasticsearch 2013. 8. 13. 18:51

[Server Side Setting]

- elasticsearch.yml

- http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html


[threadpool bounded]

indices.memory.index_buffer_size: 50%


# Search pool

threadpool.search.type: fixed

threadpool.search.size: 20

threadpool.search.queue_size: 100

 

# Bulk pool

threadpool.bulk.type: fixed

threadpool.bulk.size: 60

threadpool.bulk.queue_size: 300

 

# Index pool

threadpool.index.type: fixed

threadpool.index.size: 20

threadpool.index.queue_size: 100


[threadpool unbounded]

threadpool.index.queue_size: -1


:

[Elasticsearch] NetworkService.java 설정 정보.

Elastic/Elasticsearch 2013. 8. 12. 13:34

TCP 관련 설정만 참고해 봅니다.


public static final String LOCAL = "#local#";


    private static final String GLOBAL_NETWORK_HOST_SETTING = "network.host";

    private static final String GLOBAL_NETWORK_BINDHOST_SETTING = "network.bind_host";

    private static final String GLOBAL_NETWORK_PUBLISHHOST_SETTING = "network.publish_host";


    public static final class TcpSettings {

        public static final String TCP_NO_DELAY = "network.tcp.no_delay";

        public static final String TCP_KEEP_ALIVE = "network.tcp.keep_alive";

        public static final String TCP_REUSE_ADDRESS = "network.tcp.reuse_address";

        public static final String TCP_SEND_BUFFER_SIZE = "network.tcp.send_buffer_size";

        public static final String TCP_RECEIVE_BUFFER_SIZE = "network.tcp.receive_buffer_size";

        public static final String TCP_BLOCKING = "network.tcp.blocking";

        public static final String TCP_BLOCKING_SERVER = "network.tcp.blocking_server";

        public static final String TCP_BLOCKING_CLIENT = "network.tcp.blocking_client";

        public static final String TCP_CONNECT_TIMEOUT = "network.tcp.connect_timeout";


        public static final ByteSizeValue TCP_DEFAULT_SEND_BUFFER_SIZE = null;

        public static final ByteSizeValue TCP_DEFAULT_RECEIVE_BUFFER_SIZE = null;

        public static final TimeValue TCP_DEFAULT_CONNECT_TIMEOUT = new TimeValue(30, TimeUnit.SECONDS);

    }


elasticsearch를 서버로 사용 시 netty blocking 현상이 보일 경우 위 설정을 검토해 보시면 도움이 될 것으로 판단 됩니다.

기본 설정은 elasticsearch.yml 에 하시면 됩니다.


예를 들어

network.tcp.blocking: true


:

[Elasticsearch] facet script plugin 작성 예제.

Elastic/Elasticsearch 2013. 8. 12. 10:47

https://github.com/imotov/elasticsearch-facet-script


hadoop M/R 비슷하게 구현을 한 것 같내요. :)

위 예제는 필드에 A~Z 문자가 얼마나 나오는지 계산 하는 예제 입니다.

:

[Elasticsearch] 0.90.3 release..

Elastic/Elasticsearch 2013. 8. 12. 10:42

http://www.elasticsearch.org/download/


breaking changes:

  • Java Client: Renamed IndicesAdminClient.existsAliases() to IndicesAdminClient.aliasesExist #3330

new features:

  • Support for the pattern replace char filter has been added #3197
  • A new API to check if there are pending cluster tasks has been added #3368
  • A new completion suggestion based on prefix suggestions has been added (this is experiemental#3376

enhancements:

  • Support for named filters has been added #3097
  • The has_child query has been optimized to execute faster when matching parent count is low #3190
  • Integer field data implementations have been merged #3220
  • The rescore query now supports a score_mode #3258
  • Mget fields parameter can now be a string or an array #3270
  • XContentParser/Generator now can handle simple arrays #3279
  • Bulk deletes now contain a found field #3320
  • Zen discovery cluster events now have an urgent priority #3361
  • An own channel for pings has been added in order to be independent from huge cluster state updates #3362
  • Cluster state update APIs now respect the master_timeout much better #
  • A new dedicated thread pool for the optimize API has been added #3366
  • FastVectorHighlighter now supports complex queries (such as multi phrase queries with two terms at the same position) #3357
  • The recursion level of the hunspell filter is now configurable in the mapping #3369
  • Every distribution now contains information about its git build #3370
  • The dynamic flag in the root object mapper can now be configured dynamically on runtime #3384
  • Less cluster state changes if auto_expand_replicas is set #3399
  • Open/Close index API now supports an ackknowledgement from other nodes instead of simply waiting for the change in the cluster state #3400
  • Whenever analyzing strings, elasticsearch now uses Lucene methods introduced with Lucene 4.4, which reuse internal data structures#3409
  • In addition, the formerly used methods have been deprecated #3411
  • Improved alias handling in the cluster state (much faster if you have tens of thousands of aliases) #3410
  • The delete API now waits until a shard is removed from disk #3413
  • Rerouting of shards now happens on a shard started event #3417
  • The Index Template API now is more RESTful, supports HEAD and returns a proper 404 if it does not exist #3434
  • HighlightBuilder is now consistent with REST API #3435
  • The header response (including the successful/failed shards) has been streamlined between different requests #3441

bug fixes:

  • Timestamp index settings in a mapping are now correctly returned #3174
  • Field data now supports more than 2B ordinals per segment #3189
  • TokenStreams were reset twice when highlighting #3200
  • The geo_shape filter now handles multiple shapes per document correctly#3242
  • PluginManager fixes
    • The PluginManager now parses parameters correctly again (regression from 0.90.1) #3245
    • Calling the PluginManager while having a non-existing plugins directory is now handled #3253
  • The index warmer setting to is now configurable at runtime #3246
  • The order of fields in a suggest request can now be arbitrary #3247
  • More-like-this now correctly returns an error message if used with numeric fields (that error can be simply ignored as well) #3252
  • The parent option is now taken into account for delete requests #3257
  • The Update APIs doc_as_upsert option is now taken into account correctly #3265
  • Mget requests do not abort completely anymore if any index is missing #3267
  • Parent is taken into account in exists request #3276
  • Removed java dependency from debian package, so arbitrarily installed java can be used #3284
  • Partial fields filtering could return false matches #3288
  • Caching of top_childrenhas_child and has_parent queryies could lead to a ClassCastException #3290
  • Script based sorting was applied after pagination #3309
  • Unallocated indexes cannot be closed immediately to prevent indices which cannot be opened anymore #3313
  • Thai analyzer now makes use of stopwords #3342
  • Unset top level filter now behaves the same as inside a filtered query #3356
  • Pattern replace filter now has an empty default set to ensure same behaviour on upgrades #3359
  • Alias validation on adding aliases has been improved #3363
  • Uncaught exceptions on cluster state updates could lead to hanging request #3364
  • FuzzyLikeThisFieldQueryBuilder defaults are now consistent with the REST API #3374
  • Updatting a mapping with ignore_conflicts could hang and timeout #3381
  • Setting index.gc_deletes on runtime is working properly now #3396
  • MoreLikeThisFieldQueryBuilder defaults are now consistent with the REST API #3402
  • Query/Filter facet counter is now 64bit #3419
  • The pid file was not properly overwritten if it already existed #3425
  • Search in a shard group while relocation final flip happens could have failed #3427
  • UpsertRequests now contain all metadata fields (parent, routing, etc.) #3444
  • Retry_on_conflict setting in a bulk request could lead to an NPE #3447


: