[Elasticsearch] 1.5.0 released 살펴보기.

Elastic/Elasticsearch 2015. 3. 24. 10:35

3월 23일 릴리즈 되었내요.

https://www.elastic.co/blog/elasticsearch-1-5-0-released

가장 눈에 띄는 기능은 블로그에도 있지만 아래 두 개 기능 입니다.

1. Inner hits

기존에 has_child query 사용 시 불편했던 점을 개선한 내용입니다.

이걸 통해서 join 기능 구현이 좀 더 편해 졌습니다.

old)

has_child query 사용 시 parent document 결과만 나와서 child document 에 대해서는 추가 질의를 해야만 했습니다.

new)

has_child query 사용 시 inner_hits 파라미터 추가로 parent document 와 child document 가 함께 결과로 리턴 됩니다.

좀 더 자세한 내용은 아래 문서 참고 하세요.

http://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-request-inner-hits.html

2. Shadow replica

Shared file system 을 이용해서 shard 에 대한 추가 저장 이나 색인 작업을 하지 않고 노드 추가만으로 검색 처리 성능을 향상 시킬수 있는 방법 입니다.

shadow replica 는 기본 read-only 이며 기본 개념은 view 와 비슷 하다고 생각 하시면 됩니다.

검색 성능을 높이고 싶다면 사용해 보는 것도 좋을 것 같내요.

좀 더 자세한 내용은 아래 문서 참고 하세요.

http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-shadow-replicas.html

그 이외 많은 개선 사항들이 있는데 그건 블로그 원문을 보시면 될 것 같습니다.

저작자표시 비영리 변경금지

:

[Elasticsearch] index vs. indice vs. indices ????

Elastic/Elasticsearch 2015. 3. 18. 12:08

뭘 중요하거나 특별한건 아니지만 용어에 대한 정리가 필요 할 수 있어 그냥 적어 봅니다.

흔히 검색엔진에서는 인덱스(index) 또는 컬렉션(collection) 이라고 합니다.

검색 대상 문서들에 대한 색인 집합(?) 이라고 보시면 될 것 같습니다.

그럼 elasticsearch에서는 어떻게 사용을 할까요?

elasticsearch 뿐만 아니라 대부분의 검색엔진에서는 개별 검색 대상 그룹을 인덱스 또는 컬렉션 이라고 부릅니다.

그렇기 때문에 elasticsearch에서도 인덱스(index)라고 부르고 있고 이런 인덱스들의 복수 묶음을 indices 라고 하고 있습니다.

indice라는 표현이 간혹 나오기도 하는데 현재는 index와 indices로 정리가 된것 같아 동일하게 맞춰 볼까 합니다.

(indice 가 딱히 틀렸다고 하기도 애매 합니다.)

[표기]

index (O)

indice (△)

indices (O)

아래는 그냥 참고하시라고 퍼왔습니다.

Ref.

http://en.wiktionary.org/wiki/indice

http://grammarist.com/usage/indexes-indices/

Indexes vs. indices

Indexes and indices are both accepted and widely used plurals of the noun index. Both appear throughout the English-speaking world, butindices prevails in varieties of English from outside North America, while indexes is more common in American and Canadian English. Meanwhile,indices is generally preferred in mathematical, financial, and technical contexts, while indexes is relatively common in general usage.

Neither form is wrong. Both have been in English many centuries (and though indexes is now most common in American English, it predates the United States by centuries). It’s true that indices is the plural of index in Latin, but index is an English word when English speakers use it—and it is a longstanding one at that—so we can pluralize it according to the conventions of English.

저작자표시 비영리 변경금지

:

로그생성 프로그램 - makelogs

Elastic/Logstash 2015. 3. 11. 16:35

logstash에서 제공하는 generator가 쓸만하지 않아서 아래 도구 사용합니다.

[링크]

https://www.npmjs.com/package/makelogs

to install

npm install -g makelogs

to run

makelogs --count=10m --days=-2,+10

[Command Help]

jeong-ui-MBP:makelogs hwjeong$ makelogs --help

A utility to generate sample log data.

Usage: makelogs [options]

Options:

--count, -c Total event that will be created, accepts expressions like "1m" for 1 million (b,m,t,h) [default: 14000]

--days, -d The number of days ± today or two numbers, seperated by a comma, like "-1,+10" or "-10,+100" [default: 1]

--host, -h The host name and port [default: "localhost:9200"]

--auth user:password when you want to connect to a secured elasticsearch cluster over basic auth [default: null]

--shards, -s The number of primary shards [default: 1]

--replicas, -r The number of replica shards [default: 0]

--dry Test/Parse your arguments, but don't actually do anything [default: false]

--help This help message

--reset Clear all logstash-* indices before genrating logs

--verbose Log more info to the console

--trace Log every request to elastisearch, including request bodies. BE CAREFULL

[예제]

makelogs -c 100 -d 0 -h localhost:9200 -s 1 -r 0 --reset

makelogs -c 100 --days=-1,+1 -h localhost:9200 -s 1 -r 0 --reset

설명) -d 0 는 오늘 날짜 기준으로 데이터 생성

logstash-20150311

설명) --days=-1,+1 은 오늘 날짜 기준으로 앞뒤로 하나씩 인덱스를 더 생성

logstash-20150310

logstash-20150311

logstash-20150312

저작자표시 비영리 변경금지

:

실무 예제로 배우는 Elasticsearch 검색엔진(활용편)

Elastic/Elasticsearch 2015. 3. 9. 18:43

4월에 종이책이 나옵니다.

그리고 책에 대한 부족한 부분이나 틀린 내용들에 대해서는 지속적으로 업데이트 하도록 하겠습니다.

블로그와 페북 커뮤니티를 통해서 제가 도움 드릴 수 있는 부분들은 열심히 지원도 하겠습니다.

감사합니다.

eBook & DRM-free

쉽고 빠르게 배울 수 있는 Elasticsearch 검색엔진 활용서

전작인 『실무 예제로 배우는 Elasticsearch 검색엔진 <기본편>』에서는 Elasticsearch의 기본 개념과 설치 방법, 검색서비스 구성을 다뤘고, 이번 <활용편>에서는 <기본편>에서 다루지 못한 확장 기능과 다양한 서비스의 활용 방법, Elasticsearch의 성능 최적화 방법을 소개한다.

검색엔진을 이용한 다양한 기술과의 접목과 활용, 사용자 정의 기능을 구현해서 적용할 수 있는 플러그인 구현 방법까지 Elasticsearch를 적극적으로 활용할 수 있는 방법을 보여주며, 기본적인 성능 최적화 방법과 가이드를 제공하여 대용량 트래픽의 처리와 안정성을 확보하는 데 도움을 줄 수 있도록 구성되어 있다.

이 책은 설치와 구성 등 기본적인 내용은 다루지 않으므로 Elasticsearch의 기본 내용을 알고 싶다면 『실무 예제로 배우는 Elasticsearch 검색엔진(기본편)』(한빛미디어, 2014)이 더 적합하다.

대상 독자

검색 서비스 개발에 관심 있는 기획자 또는 개발자
상용 검색엔진을 오픈 소스 검색엔진으로 대체하길 원하는 서비스 관리자 또는 개발자
Elasticsearch의 고급 기능과 성능 최적화 등 활용 방법을 자세히 알고 싶은 개발자

[지은이] 정호욱

지난 13년 동안 야후코리아, NHN Technology, 삼성전자에서 커뮤니티, 소셜 검색, 광고 검색 관련 서비스를 개발해 오면서 검색엔진을 활용한 다양한 프로젝트를 수행하였다. 현재 빅 데이터 전문 기업인 그루터에서 오픈 소스 기반 검색엔진 개발자로 근무하고 있다. elasticsearch 기술에 대한 정보와 경험을 현재 개인 블로그(http://jjeong.tistory.com)를 통해 공유하고 있다.

chapter 1 검색 기능 확장
1.1 자동 완성
1.2 Percolator
1.3 Join
1.4 River
1.5 정리

chapter 2 검색 데이터 분석
2.1 Bucket Aggregation
2.2 Metric Aggregation
2.3 정리

chapter 3 Plugin
3.1 Plugin 제작
3.2 REST Plugin 만들기
3.3 Analyzer Plugin 만들기
3.4 정리

chapter 4 Hadoop 연동
4.1 MapReduce 연동
4.2 Hive 연동
4.3 정리

chapter 5 ELK 연동
5.1 Logstash
5.2 Elasticsearch
5.3 Kibana
5.4 정리

chapter 6 SQL 활용하기
6.1 RDB 관점의 Elasticsearch
6.2 SQL 정의하기
6.3 SQL 변환하기
6.4 JDBC Driver 만들기
6.5 정리

chapter 7 Elasticsearch 성능 최적화
7.1 하드웨어 관점
7.2 Document 관점
7.3 Operation 관점
7.4 정리

저작자표시 비영리 변경금지

:

[Kibana] structured aggregation query...

Elastic/Elasticsearch 2015. 1. 19. 12:53

한줄 정리)

- sub aggregation 과 같은 복잡한 분석 질의는 3.x에서 지원 되지 않으며, 4.x에서 지원 된다.

kibana를 많이 사용하시는 분들은 잘 아실 것 같습니다.

kibana 3.x 까지는 sub aggregation과 같은 좀 복잡한 aggregation 을 사용할 수 없었습니다.

저도 3.x를 가지고 구성을 하려다 보니 막혀 버리더라구요.

그래서 복잡한 aggregation 질의를 사용하고자 한다면 kibana 4.x 로 올려서 사용하시면 될 것 같습니다.

아직은 베타 버전이긴 하지만 조만간 베타 딱지 떼지 않을까 싶내요.

아래는 kibana 3.x에서 제공하고 있는 aggregation 종류 입니다.

여기 보여 지는걸 저도 다 사용은 안해봤구요.

가장 많이 사용하는게 아마도

- column

- histogram

- stats

- terms

정도가 아닐까 합니다. (아닐 수도 있구요. ^^;)

여기서 테스트 하던 것 중 trends panel type은 더 이상 사용되지 않는 것 같습니다.

4.x 에서는 아예 빠졌으니까요.

저작자표시 비영리 변경금지

:

[Elasticsearch] SearchType.SCAN 기능 테스트

Elastic/Elasticsearch 2014. 12. 9. 18:58

일단 가볍게 코드 부터 keeping 합니다.

SearchResponse searchResponse = client.prepareSearch(INDICE_NAME)

.setSearchType(SearchType.SCAN)

.setQuery(new MatchAllQueryBuilder())

.setSize(FETCH_SIZE)

.setScroll(TimeValue.timeValueMinutes(10))

.execute().actionGet(TimeValue.timeValueMinutes(10));

while (true) {

searchResponse = client.prepareSearchScroll(searchResponse.getScrollId()).setScroll(TimeValue.timeValueMinutes(10)).execute().actionGet(TimeValue.timeValueMinutes(10));

if (searchResponse.getHits().hits().length == 0) {

break;

}

뭐 간단 합니다.

shard 별로 FETCH_SIZE 만큼씩 요청시 데이터를 가져 오는 로직 입니다.

기타 응용이나 자세한 설명은 다음에.. ^^

로직만 확인하세요.

저작자표시 비영리 변경금지

:

[Elasticsearch] dynamic template 이란?

Elastic/Elasticsearch 2014. 12. 5. 11:12

Elasticsearch에서 제공하는 편리한 기능 중 하나 입니다.

이해하기 위해서는 dynamic mapping 부터 보고 들어 가셔야 합니다.

[원본 링크]

Dynamic mapping

- http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html

☞ 간단 설명

색인 필드의 데이터에 대한 data type을 지정하지 않고 동적으로 elasticsearch 에서 적절한 type을 지정해 주는 기능 입니다.

예)

field1:"1" 은 string 으로

field2: 1 은 long 으로 type 맵핑이 됩니다.

그럼 dynamic template 에 대해서 알아 보겠습니다.

elasticsearch에서는 customizing dynamic mapping 하위에 subset으로 기술 되어 있습니다.

[원본 링크]

Dynamic template

- http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html#dynamic-templates

dynamic template 이란?

이 기능은 말 그대로 type mapping 을 사전에 기술해 놓지 않고 동적으로 dynamic mapping에 의해 정의 되는 시점에 동적으로 구성을 하게 되는 내용입니다.

쉽게 말해 mapping type을 미리 선언하지 않고 패턴이나 분석 특성에 맞춰 구성하게 되는 것입니다.

주로 사용하는 경우는 대략 2 가지 정도로 보입니다.

1) 같은 유형의 indice/type을 time series 로 생성 및 관리 할 때 매번 만들지 않고 자동으로 mapping 생성을 하고자 할때.

2) 특정 조건 또는 패턴에 일치하는 field에 대해 자동으로 mapping 속성을 지정하고자 할때.

[_default_]

- 이 type이 dynamic template의 기본 type이 됩니다.

- 별도 구성 없이 동적으로 type을 생성하게 되면 이 정보를 기준으로 mapping type 정의가 됩니다.

내부 properties 설명은 elasticsearch 샘플 자료로 설명을 하겠습니다.

PUT /my_index
{
    "mappings": {
        "my_type": {
            "dynamic_templates": [
                { "es": {
                      "match":              "*_es", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "string",
                          "analyzer":       "spanish"
                      }
                }},
                { "en": {
                      "match":              "*", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "string",
                          "analyzer":       "english"
                      }
                }}
            ]
}}}

- "my_type" 기본 type 이외 사용자가 정의한 type에도 dynamic template 설정이 가능 합니다.

- "match" 는 field에 대한 동적 매칭을 정의 하는 것이며, 패턴 사용이 가능 합니다. 예제에서는 field 명에서 _es 로 끝나는 모든 필드를 의미 합니다.

- "match_mapping_type"은 dynamic mapping 을 통해서 지정된 type 이 뭔지 확인 하는 것입니다. 예제에서는 string 으로 mapping 된 것을 의미 하며, "match"와 함께 해석을 해야 합니다. 즉, *_es 필드명 이어야 하고 string type이면 아래 mapping 정보를 가진다는 의미가 됩니다.

- "mapping"은 type mapping 시 각 field 에 속성을 정의 하게 됩니다. 이 정의 하는 부분에 대한 값을 설정하게 됩니다.

전체적으로 위 예제의 의미는 모든 문자열 필드에 대한 형태소분석기는 english를 적용하고, 필드명에 _es 가 들어간 문자열 필드에 대해서는 형태소분석기로 spanish를 적용하라는 것입니다.

dynamic template 과 mapping 기능은 매우 유용하게 활용이 가능 합니다.

이해 하시는데 도움이 되면 좋겠내요.

저작자표시 비영리 변경금지

:

[Elasticsearch] simple dynamic template 테스트

Elastic/Elasticsearch 2014. 12. 4. 18:01

[Setting 설정]

{

"settings" : {

"number_of_shards" : 3,

"number_of_replicas" : 0

}

- 생성

curl -XPUT 'http://localhost:9200/dynamic_template -d @setting.json

[Mapping 설정]

{

"_source" : {

"enabled" : "true"

},

"_all" : {

"enabled" : "false"

},

"dynamic_templates": [

{

"string_match": {

"mapping": {

"index": "not_analyzed",

"type": "string"

},

"match_mapping_type": "string",

"match": "*"

}

],

"properties" : {}

}

- 생성

curl -XPUT http://localhost:9200/dynamic_template/test1/_mapping -d @mapping.json

curl -XPUT http://localhost:9200/dynamic_template/test2/_mapping -d @mapping.json

[Add document]

curl -XPOST http://127.0.0.1:9200/dynamic_template/test1 -d '{"docid" : "1"}'

curl -XPOST http://127.0.0.1:9200/dynamic_template/test2 -d '{"docid" : 1}'

이후 test1과 test2에 설정된 docid 필드에 대한 type을 확인을 해보면 아래와 같이 나오게 된다.

{

"dynamic_template": {

"mappings": {

"test2": {

"dynamic_templates": [

{

"string_match": {

"mapping": {

"index": "not_analyzed",

"type": "string"

},

"match": "*",

"match_mapping_type": "string"

}

],

"_all": {

"enabled": false

},

"properties": {

"docid": {

"type": "long"

}

},

"test1": {

"dynamic_templates": [

{

"string_match": {

"mapping": {

"index": "not_analyzed",

"type": "string"

},

"match": "*",

"match_mapping_type": "string"

}

],

"_all": {

"enabled": false

},

"properties": {

"docid": {

"index": "not_analyzed",

"type": "string"

}

저작자표시 비영리 변경금지

:

[Elasticsearch] embedded elasticsearrch server 테스트.

Elastic/Elasticsearch 2014. 11. 19. 16:59

내가 만드는 application 에 es 를 포함시켜 놓고 사용하고 싶을 때 사용하시면 됩니다.

이걸 어디에 사용하느냐는 알아서들 하시구요. ^^

기본 로직만 담아서 main 함수에 넣었으니 입맛에 맞게 수정 하시면 되겠내요.

[Embedded Elasticsearch Server]

public class EmbeddedElasticsearchServer {

public static void main(String[] args) {

ImmutableSettings.Builder settings = ImmutableSettings.settingsBuilder();

settings.put("node.name", "embedded-local-node");

settings.put("path.data", "data/index");

Node node = NodeBuilder.nodeBuilder()

.settings(settings)

.clusterName("embedded-local-cluster")

.data(true)

.local(true)

.node();

Client client = node.client();

CreateIndexRequest request = Requests.createIndexRequest("embedded-index").settings(settings);

CreateIndexResponse response = client.admin().indices().create(request).actionGet();

client.close();

// node.close() 가 호출 되면 embedded elasticsearch daemon 은 stop 된다.

node.close();

}

보시면 기본 flow 는 이렇습니다.

1. elasticsearch.yml 을 대신 할 settings를 설정 합니다.

2. 1번 settings 정보를 갖는 node 를 생성 합니다.

----> 여기까지만 하면 es daemon 이 실행 됩니다.

3. embedded es server 로 접속할 client를 생성 합니다.

4. client를 이용해 index를 생성 합니다.

5. client 접속을 끊습니다.

----> client 가 close 되었지 daemon 은 그대로 listen 하고 있습니다.

6. embedded es server 를 종료 합니다.

----> node.close() 해야만 데몬이 종료 합니다.

여기까지가 기본 이구요.

활용은 각자 알아서 하시는 걸로 ^^

저작자표시 비영리 변경금지

:

[Elasticsearch] ElasticsearchIntegrationTest pom.xml 구성하기.

Elastic/Elasticsearch 2014. 11. 19. 16:15

이게 그냥 es source 받아서 돌리면 잘 됩니다.

ElasticsearchIntegrationTest 를 상속 받아서 테스트 하기 위해서는 테스트 프로젝트의 pom.xml 에 dependency 설정을 잘해야 합니다.

es쪽 문서에는 나와 있지 않습니다.

뭐 오류 수정해서 코드에 반영하면 되니까 문서를 고칠 필요는 없을 수도 있겠내요.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/using-elasticsearch-test-classes.html

아래는 제가 테스트한 pom.xml 입니다.

[pom.xml]

저작자표시 비영리 변경금지

:

jjeong

'Elastic'에 해당되는 글 498건

[Elasticsearch] 1.5.0 released 살펴보기.

[Elasticsearch] index vs. indice vs. indices ????

Indexes vs. indices

로그생성 프로그램 - makelogs

to run

실무 예제로 배우는 Elasticsearch 검색엔진(활용편)

[Kibana] structured aggregation query...

[Elasticsearch] SearchType.SCAN 기능 테스트

[Elasticsearch] dynamic template 이란?

[Elasticsearch] simple dynamic template 테스트

[Elasticsearch] embedded elasticsearrch server 테스트.

[Elasticsearch] ElasticsearchIntegrationTest pom.xml 구성하기.

티스토리툴바