'elasticsearch' 태그의 글 목록 (36 Page)

[elasticsearch] filter & facet 사용 시 주의 사항.

Elastic/Elasticsearch 2013. 9. 4. 18:41

주의 사항이라고 할 것 까지는 없지만.. 가끔 당황 스러울때가 있으니 참고 정도 하시면 될 것 같습니다.

elasticsearch 에서

- filter 기능은 검색 질의 후 결과 셋에 대해서 filtering 을 해주는 기능

- facet 의 검색 질의 후 결과에 대한 groupby 연산을 해주는 기능

이라고 보시면 됩니다.

검색 질의 시 filter 기능을 사용하지 않고 구한 facet 결과는 당연히 질의 결과에 대한 facet result 를 줄 것이구요.

filter 기능을 적용하고 동일한 facet 을 적용했을 경우 아래와 같이 처리 됩니다.

- 검색 결과는 filtering 된 값이 나옴

- facet 결과는 초기 검색 식에 의해 나온 결과 셋의 facet result 가 나옴

filter 한 결과와 동일하게 나오게 하고 싶으시다면 filter 조건을 facetFilter 로 적용해 주시면 됩니다.

:

[Elasticsearch] IndexCacheModule...

Elastic/Elasticsearch 2013. 9. 2. 17:40

index cache module 뭐가 있는지 한번 볼까요?

new FilterCacheModule(settings).configure(binder());

new IdCacheModule(settings).configure(binder());

new QueryParserCacheModule(settings).configure(binder());

new DocSetCacheModule(settings).configure(binder());

검색 질의와 결과 셋에 대한 cache 가 위에서 보면

new QueryParserCacheModule(settings).configure(binder());

new DocSetCacheModule(settings).configure(binder());

이 두 놈이 비슷해 보입니다.

근데 소스 들어가서 보면 정말 원하는 내용은 아니라는 걸 알 수 있습니다.

QueryParserCacheModule 은 요청한 질의에 대한 Query 를 매번 만들지 않고 cache 에서 가져 올 수 있는 기능이라 뭐 괜찮아 보이구요.

DocSetCacheModule 은 docId 기반의 cache 라서 ㅎㅎ document 단위로 cache 정보를 읽어 오는 기능 입니다.

즉, 검색 질의에 대한 결과셋을 cache 하고 싶으신 경우는 그냥 memcached 나 redis 같은걸 이용해서 cache 하시면 되겠습니다.

:

[Elasticsearch] FiledMapper Default.

Elastic/Elasticsearch 2013. 9. 2. 14:41

[IndexFieldMapper.java]

FIELD_TYPE.setIndexed(true);

FIELD_TYPE.setTokenized(false);

FIELD_TYPE.setStored(false);

FIELD_TYPE.setOmitNorms(true);

FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);

보시면 아시겠죠..

indexed true & tokenized false 니까 not_analyzed 설정이 되는 것이구요.

당연히 store no 가 되겠내요.

그냥 참고하세요.

:

[Elasticsearch] Highlight 기능.

Elastic/Elasticsearch 2013. 8. 23. 16:11

[lucene]

Highlighter.java

FastVectorHighlighter.java

[elasticsearch]

PlainHighlighter.java

FastVectorHighlighter.java

실제 term 에 highlight tag 를 적용하는 건.. highlightTerm() method 입니다.

SimpleHTMLFormatter.java

GradientFormatter.java

참고하세요.

이넘들이 highlight 하기 위해서는 기본 두 개의 정보가 필요 합니다.

CharTermAttribute.java

OffsetAttribute.java

이것들이 어떻게 동작하는지는 소스코드를 보시면 되겠습니다.

간단하게는...

1. stored 원문을 가져옵니다.

2. char term 과 offset 정보를 이용해서 원문을 재구성 합니다.

2.1 재구성 할때 highlightTerm() 에서 재구성된 term 을 만들어 줍니다.

뭐 상세한건 소스를 보시는게 건강에 좋습니다. :)

:

[Elasticsearch] 두 가지 timeout 설정..

Elastic/Elasticsearch 2013. 8. 14. 16:31

elasticsearch 에서 검색 수행 시 두 가지 timeout 설정이 있습니다.

1. shard 간 데이터를 aggregation 할 때의 timeout

2. 검색 수행 시 timeout

잘 활용 하시면 응답 시간 관리에 도움이 됩니다. :)

ㅎㅎ

:

[Elasticsearch] json bulk insert.

Elastic/Elasticsearch 2013. 8. 14. 12:09

참고 사이트 : http://www.elasticsearch.org/guide/reference/api/bulk/

[Run]

curl -s -XPOST 'http://192.168.56.104:9200/test/_bulk' --data-binary @test.json

[test.json]

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" , "_routing" : "1" } }

{"docid":1, "title":"title1"}

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2" , "_routing" : "1" } }

{"docid":2, "title":"title2"}

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3" , "_routing" : "1" } }

{"docid":3, "title":"title3"}

:

[Elasticsearch] threadpool 설정.

Elastic/Elasticsearch 2013. 8. 13. 18:51

[Server Side Setting]

- elasticsearch.yml

- http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html

[threadpool bounded]

indices.memory.index_buffer_size: 50%

# Search pool

threadpool.search.type: fixed

threadpool.search.size: 20

threadpool.search.queue_size: 100

# Bulk pool

threadpool.bulk.type: fixed

threadpool.bulk.size: 60

threadpool.bulk.queue_size: 300

# Index pool

threadpool.index.type: fixed

threadpool.index.size: 20

threadpool.index.queue_size: 100

[threadpool unbounded]

threadpool.index.queue_size: -1

:

[Elasticsearch] NetworkService.java 설정 정보.

Elastic/Elasticsearch 2013. 8. 12. 13:34

TCP 관련 설정만 참고해 봅니다.

public static final String LOCAL = "#local#";

private static final String GLOBAL_NETWORK_HOST_SETTING = "network.host";

private static final String GLOBAL_NETWORK_BINDHOST_SETTING = "network.bind_host";

private static final String GLOBAL_NETWORK_PUBLISHHOST_SETTING = "network.publish_host";

public static final class TcpSettings {

public static final String TCP_NO_DELAY = "network.tcp.no_delay";

public static final String TCP_KEEP_ALIVE = "network.tcp.keep_alive";

public static final String TCP_REUSE_ADDRESS = "network.tcp.reuse_address";

public static final String TCP_SEND_BUFFER_SIZE = "network.tcp.send_buffer_size";

public static final String TCP_RECEIVE_BUFFER_SIZE = "network.tcp.receive_buffer_size";

public static final String TCP_BLOCKING = "network.tcp.blocking";

public static final String TCP_BLOCKING_SERVER = "network.tcp.blocking_server";

public static final String TCP_BLOCKING_CLIENT = "network.tcp.blocking_client";

public static final String TCP_CONNECT_TIMEOUT = "network.tcp.connect_timeout";

public static final ByteSizeValue TCP_DEFAULT_SEND_BUFFER_SIZE = null;

public static final ByteSizeValue TCP_DEFAULT_RECEIVE_BUFFER_SIZE = null;

public static final TimeValue TCP_DEFAULT_CONNECT_TIMEOUT = new TimeValue(30, TimeUnit.SECONDS);

}

elasticsearch를 서버로 사용 시 netty blocking 현상이 보일 경우 위 설정을 검토해 보시면 도움이 될 것으로 판단 됩니다.

기본 설정은 elasticsearch.yml 에 하시면 됩니다.

예를 들어

network.tcp.blocking: true

:

[Elasticsearch] facet script plugin 작성 예제.

Elastic/Elasticsearch 2013. 8. 12. 10:47

https://github.com/imotov/elasticsearch-facet-script

hadoop M/R 비슷하게 구현을 한 것 같내요. :)

위 예제는 필드에 A~Z 문자가 얼마나 나오는지 계산 하는 예제 입니다.

:

[Elasticsearch] 0.90.3 release..

Elastic/Elasticsearch 2013. 8. 12. 10:42

http://www.elasticsearch.org/download/

breaking changes:

Java Client: Renamed IndicesAdminClient.existsAliases() to IndicesAdminClient.aliasesExist #3330

new features:

Support for the pattern replace char filter has been added #3197
A new API to check if there are pending cluster tasks has been added #3368
A new completion suggestion based on prefix suggestions has been added (this is experiemental) #3376

enhancements:

Support for named filters has been added #3097
The has_child query has been optimized to execute faster when matching parent count is low #3190
Integer field data implementations have been merged #3220
The rescore query now supports a score_mode #3258
Mget fields parameter can now be a string or an array #3270
XContentParser/Generator now can handle simple arrays #3279
Bulk deletes now contain a found field #3320
Zen discovery cluster events now have an urgent priority #3361
An own channel for pings has been added in order to be independent from huge cluster state updates #3362
Cluster state update APIs now respect the master_timeout much better #
A new dedicated thread pool for the optimize API has been added #3366
FastVectorHighlighter now supports complex queries (such as multi phrase queries with two terms at the same position) #3357
The recursion level of the hunspell filter is now configurable in the mapping #3369
Every distribution now contains information about its git build #3370
The dynamic flag in the root object mapper can now be configured dynamically on runtime #3384
Less cluster state changes if auto_expand_replicas is set #3399
Open/Close index API now supports an ackknowledgement from other nodes instead of simply waiting for the change in the cluster state #3400
Whenever analyzing strings, elasticsearch now uses Lucene methods introduced with Lucene 4.4, which reuse internal data structures#3409
In addition, the formerly used methods have been deprecated #3411
Improved alias handling in the cluster state (much faster if you have tens of thousands of aliases) #3410
The delete API now waits until a shard is removed from disk #3413
Rerouting of shards now happens on a shard started event #3417
The Index Template API now is more RESTful, supports HEAD and returns a proper 404 if it does not exist #3434
HighlightBuilder is now consistent with REST API #3435
The header response (including the successful/failed shards) has been streamlined between different requests #3441

bug fixes:

Timestamp index settings in a mapping are now correctly returned #3174
Field data now supports more than 2B ordinals per segment #3189
TokenStreams were reset twice when highlighting #3200
The geo_shape filter now handles multiple shapes per document correctly#3242
PluginManager fixes
- The PluginManager now parses parameters correctly again (regression from 0.90.1) #3245
- Calling the PluginManager while having a non-existing plugins directory is now handled #3253
The index warmer setting to is now configurable at runtime #3246
The order of fields in a suggest request can now be arbitrary #3247
More-like-this now correctly returns an error message if used with numeric fields (that error can be simply ignored as well) #3252
The parent option is now taken into account for delete requests #3257
The Update APIs doc_as_upsert option is now taken into account correctly #3265
Mget requests do not abort completely anymore if any index is missing #3267
Parent is taken into account in exists request #3276
Removed java dependency from debian package, so arbitrarily installed java can be used #3284
Partial fields filtering could return false matches #3288
Caching of top_children, has_child and has_parent queryies could lead to a ClassCastException #3290
Script based sorting was applied after pagination #3309
Unallocated indexes cannot be closed immediately to prevent indices which cannot be opened anymore #3313
Thai analyzer now makes use of stopwords #3342
Unset top level filter now behaves the same as inside a filtered query #3356
Pattern replace filter now has an empty default set to ensure same behaviour on upgrades #3359
Alias validation on adding aliases has been improved #3363
Uncaught exceptions on cluster state updates could lead to hanging request #3364
FuzzyLikeThisFieldQueryBuilder defaults are now consistent with the REST API #3374
Updatting a mapping with ignore_conflicts could hang and timeout #3381
Setting index.gc_deletes on runtime is working properly now #3396
MoreLikeThisFieldQueryBuilder defaults are now consistent with the REST API #3402
Query/Filter facet counter is now 64bit #3419
The pid file was not properly overwritten if it already existed #3425
Search in a shard group while relocation final flip happens could have failed #3427
UpsertRequests now contain all metadata fields (parent, routing, etc.) #3444
Retry_on_conflict setting in a bulk request could lead to an NPE #3447

:

jjeong

'elasticsearch'에 해당되는 글 420건