jjeong

[Elasticsearch] shard allocation 운영 테스트용 REST 코드

Elastic/Elasticsearch 2015. 7. 2. 18:18

매번 작성 하기 귀찮아서 그냥 남겨 봅니다.

[shard allocation disable]

curl -XPUT localhost:9200/_cluster/settings -d '{

"persistent" : {

"cluster.routing.allocation.disable_allocation" : true

}

}'

[모든 종류의 shard allocation disable]

curl -XPUT localhost:9200/_cluster/settings -d '{

"transient" : {

"cluster.routing.allocation.enable" : "none"

}

}'

[shard allocation enable]

curl -XPUT localhost:9200/_cluster/settings -d '{

"persistent" : {

"cluster.routing.allocation.disable_allocation" : false

}

}'

[모든 종류의 shard allocation enable]

curl -XPUT localhost:9200/_cluster/settings -d '{

"transient" : {

"cluster.routing.allocation.enable" : "all"

}

}'

[모든 인덱스의 replica shard disable]

curl -XPUT localhost:9200/*/_settings -d '{"number_of_replicas":0}'

저작자표시 비영리 변경금지

:

[Elasticsearch] NodeInfo 에서의 IP 필드에 대해서.

Elastic/Elasticsearch 2015. 7. 1. 20:23

또 까먹을까봐 작성해 봅니다.

NodeInfo 에서 제공하고 있는 IP 필드에 대해서는 임의 수정이 가능 하지 않습니다.

이유는 데몬 실행 시 내부적으로 노드의 정보를 읽어서 구성을 해주기 때문인데요.

관련 코드는 아래 두 개의 클래스를 보시면 됩니다.

- DiscoveryNode.java

- NetworkUtils.java

[Code snippet]

private final static InetAddress localAddress;

static {
    InetAddress localAddressX;
    try {
        localAddressX = InetAddress.getLocalHost();
    } catch (Throwable e) {
        logger.warn("failed to resolve local host, fallback to loopback", e);
        localAddressX = InetAddress.getLoopbackAddress();
    }
    localAddress = localAddressX;
}

저작자표시 비영리 변경금지

:

[Elasticsearch] Multi types into Index.

Elastic/Elasticsearch 2015. 6. 30. 11:47

6월에 바쁘다는 핑계로 글을 하나도 못 올렸내요.

그런 의미에서 하나 올려 볼까 합니다.

오늘의 주제는 Elasticsearch에서 하나의 Index에 여러개의 Type을 생성 사용할 경우 주의할 점입니다.

Elasticsearch 와 RDB 와는 자주 비교가 됩니다.

개념을 쉽게 잡아 주기 위해서 인데요.

가볍게 다시 한번 비교해 보고 넘어 가겠습니다.

Elasticsearch	RDB
Index	Database
Type	Table
Mapping	Schema
Document	Row
Field	Column

※ 더 있지만 이 정도로 정리 하겠습니다.

이제 오늘의 주제 입니다.

Elasticsearch에서는 하나의 Index 생성 시 여러개의 Type을 생성 할 수 있습니다.

동일하게, RDB 도 하나의 Database에 여러개의 Table을 생성 할 수 있습니다.

하지만 여기서 Elasticsearch의 Type과 Database의 Table 사이에는 조금 다른 점이 있는데요.

RDB 에서 Table 간 Column은 이름이 같더라도 데이터 형이나 인덱스 유형등에 대해서 서로 독립적으로 사용이 됩니다.

하지만 Elasticsearch의 Type 간 Field는 이름이 같게 되면 데이터 형이나 인덱스 유형등도 같아야 한다는 것입니다.

이유는 내부적으로 Lucene 에서는 Type 간 같은 이름의 Field는 하나의 Field 로 사용이 되기 때문입니다.

단, Type 간 Field 이름이 다르다면 이건 문제가 되지 않습니다. (당연한 이야기 겠죠.)

예)

IndexA - Type1, Type2 가 있다고 가정하겠습니다.

Type1 에 geo 라는 field 가 있고 데이터 형이 geo_point 라고 가정 하겠습니다.

Type2 에도 geo 라는 field 가 있고 데이터 형이 string 이라고 가정을 하면 어떻게 될까요?

"mappings": {

"type1": {

"properties": {

"geo": {

"type": "geo_point"

}

},

"type2": {

"properties": {

"geo": {

"type": "string"

}

이 경우는 에러가 발생을 합니다.

같은 필드에 서로 다른 데이터 형을 선언 했기 때문인데요.

이렇게 사용 하시면 안됩니다.

정확한 사용은 서로 다른 type 에 같은 field 가 있다면 데이터 형도 동일하게 선언을 해주셔야 합니다.

"mappings": {

"type1": {

"properties": {

"geo": {

"type": "geo_point"

}

},

"type2": {

"properties": {

"geo": {

"type": "geo_point"

}

※ 데이터 형만을 가지고 설명을 드렸지만 기타 다른 옵션도 동일하게 선언하셔서 사용하시길 추천 드립니다.

저작자표시 비영리 변경금지

:

[Apache Tajo] Apache Tajo 데스크탑 + Zeppelin 연동 하기

ITWeb/Apache Tajo 2015. 5. 27. 18:35

Apache Tajo 데스크탑 버전과 Zeppelin을 이용한 분석 환경 구성입니다.

모든 설치 및 사용 가이드는 각각의 홈페이지에 자세히 나와 있습니다.

단지 제가 실행한 로그만 모아서 글 남겨 봅니다.

Apache Tajo 데스크탑 설치 하기

1. 다운로드 및 설치 가이드

http://www.gruter.com/blog/getting-started-with-tajo-on-your-desktop/

2. 압축 해제 하기

$ tar -xvzf tajo-0.11.0-desktop-3.0.tar.gz

$ ln -s tajo-0.11.0-desktop-3.0 tajo

$ cd tajo

3. 설정 하기

$ bin/configure.sh

Enter JAVA_HOME [required]

/Library/Java/JavaVirtualMachines/jdk1.7.0_55.jdk/Contents/Home

Would you like advanced configure? [y/N]

y

Enter tajo.rootdir [default: file:///Users/hwjeong/temp/kgloballondon/tajo/data/tajo]

Enter tajo.staging.directory [default: file:///Users/hwjeong/temp/kgloballondon/tajo/data/staging]

Enter tajo.worker.tmpdir.locations [default: /Users/hwjeong/temp/kgloballondon/tajo/data/tempdir]

Enter heap size(MB) for worker [default: 1024]

Done. To start Tajo, run /Users/hwjeong/temp/kgloballondon/tajo/bin/startup.sh

- JAVA_HOME 설정은 개발환경에 맞춰 설정 하시면 됩니다.

- advanced configure 설정은 따로 하지 않으셔도 되지만 어떤 항목이 있는지 확인하기 위해 "y"를 선택했습니다.

- OSX 사용자의 경우 아래와 같이 확인 하시면 됩니다.

$ /usr/libexec/java_home -v 1.7

또는

$ /usr/libexec/java_home -v 1.6

4. Tajo 실행 하기

$ bin/startup.sh

starting master, logging to /Users/hwjeong/temp/kgloballondon/tajo/bin/../logs/tajo-hwjeong-master-jeong-ui-MBP.out

Tajo master starting....Connection to localhost port 26003 [tcp/*] succeeded!

Tajo master started.

starting worker, logging to /Users/hwjeong/temp/kgloballondon/tajo/bin/../logs/tajo-hwjeong-worker-jeong-ui-MBP.out

Tajo worker started.

Tajo master web UI

http://localhost:26080

5. 테스트 데이터 등록

$ bin/make-test.sh

Databases and tables for test were successfully created.

6. Tajo Shell 명령어 실행

$ bin/tsql

default> \c tpc_h10m

You are now connected to database "tpc_h10m" as user "hwjeong".

tpc_h10m> \d

customer

lineitem

nation

orders

part

partsupp

region

supplier

tpc_h10m>

여기까지는 "다운로드 및 설치 가이드"에 나와 있는 것과 동일 합니다.

다만 제가 수행한 로그를 기록한 것 뿐입니다.

Zeppelin 설치 하기

1. 다운로드 및 설치 가이드

https://zeppelin.incubator.apache.org/docs/install/install.html

2. git clone 하기

$ git clone https://github.com/apache/incubator-zeppelin.git zeppelin

Cloning into 'zeppelin'...

remote: Counting objects: 21256, done.

remote: Total 21256 (delta 0), reused 0 (delta 0), pack-reused 21256

Receiving objects: 100% (21256/21256), 10.76 MiB | 1.75 MiB/s, done.

Resolving deltas: 100% (8584/8584), done.

Checking connectivity... done.

$ cd zeppelin

3. Local mode 로 빌드하기

$ sudo mvn clean install -DskipTests

- sudo 로 실행한 이유는 dependency 설치 시 권한 문제로 인한 오류를 예방하기 위해서 입니다.

4. Zeppelin 실행 하기

$ bin/zeppelin-daemon.sh start

Zeppelin start [ OK ]

- 중지는 start 대신 stop 하시면 됩니다.

$ bin/zeppelin-daemon.sh stop

Zeppelin stop [ OK ]

5. Zeppelin WebUI 접속 하기

http://localhost:8080/

Zeppelin에서 Apache Tajo SQL 사용 하기

1. Note 생성 하기

2. Tajo 질의 작성 하기

## Note 2AQG17JRB를 클릭 하세요.

%tajo select * from tpc_h10m.nation;

%tajo

SELECT n.n_name as nation, sum(o.o_totalprice) as order_amount

FROM tpc_h10m.customer c, tpc_h10m.nation n, tpc_h10m.orders o

WHERE c.c_nationkey = n.n_nationkey

and o.o_custkey = c.c_custkey

GROUP BY c.c_nationkey, n.n_name

ORDER BY n.n_name;

- "%tajo" 부분은 zeppelin의 interpreter binding 정보를 참고 하시면 되며, tajo를 지정한 내용입니다.

- tajo의 tsql에서 제공하는 "\명령어"는 지원되지 않기 때문에 사용에 유의 하셔야 합니다.

3. 질의 결과를 Graph로 보기

Command Shell 하단에 그래프 아이콘을 클릭 하시면 결과를 보실 수 있습니다.

여기까지 제공된 문서를 기반으로 구성해본 내용이였습니다.

더불어 Apache Tajo와 Zeppelin과의 통신은 JDBC Driver를 통해서 이루어 집니다.

저작자표시 비영리 변경금지

:

[Elasticsearch] Circuit Breaker

Elastic/Elasticsearch 2015. 5. 26. 11:33

참고 문서)

https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#fielddata-circuit-breaker

https://www.elastic.co/guide/en/elasticsearch/guide/current/_monitoring_individual_nodes.html#_circuit_breaker

https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html#_field_data_circuit_breaker

https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html#_circuit_breaker_fielddata_status_done_v1_0_0

결국 하나의 문서만 보셔도 됩니다.

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-fielddata.html

[Circuit Breaker 요약]

- fielddata cache 와 연관된 기능으로 OOM에 대한 대응 방법으로 제공하고 있습니다.

- circuit breaker limit size는 cache size 보다 커야 합니다.

- circuit breaker는 query에 필요로 하는 memory 크기를 예측/평가 하여 사전에 OOM 문제를 경험하지 않도록 해줍니다.

즉, 요청한 Query를 중시 시킵니다.

[Geek Igor]

http://igor.kupczynski.info/2015/04/06/fielddata.html

저작자표시 비영리 변경금지

:

jjeong

[Elasticsearch] shard allocation 운영 테스트용 REST 코드

[Elasticsearch] NodeInfo 에서의 IP 필드에 대해서.

[Elasticsearch] Multi types into Index.

[Apache Tajo] Apache Tajo 데스크탑 + Zeppelin 연동 하기

[Elasticsearch] Circuit Breaker

티스토리툴바