'Elastic/Elasticsearch' 카테고리의 글 목록 (25 Page)

[elasticsearch] alias 설정 시 주의 사항.

Elastic/Elasticsearch 2014. 2. 24. 13:46

[참고문서]

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.html

elasticsearch에서 제공하고 있는 alias 기능은 매우 유용하게 사용할 수 있습니다.

1. 기본기능 - 별칭

- 분리 구성된 여러개의 인덱스를 묶어 하나의 인덱스처럼 사용이 가능합니다. 물론 검색 시 여러 인덱스로 질의할 수 있으나 매개변수로 지정해야 하는 불편함이 존재 하게 됩니다.

- 말 그대로 alias 기능의 별칭은 검색 목적에 맞게 인덱스 구성을 할 수 있다는 장점을 제공해 줍니다.

2.검색기능 - 필터

- 별칭 지정 후 filter 기능을 이용하여 별도 색인 또는 데이터 가공없이 가상의 인덱스를 구성할 수 있게 됩니다. 어디서 활용이 가능 할까요? 실시간 분석이나 사전 색인데이터 필터링을 통해 기능 확장이 가능해 보입니다.

- 검색 조건에서 사용하는 필터와 같은 기능을 제공하기 때문에 활용 범위는 다양합니다.

3. 분산/게이트웨이기능 - 라우팅

- 색인 시 지정한 샤드로만 데이터를 모을수도 있고, 검색 시 지정한 샤드로만 검색 질의를 수행할 수 있습니다.

- 즉, 불필요한 데이터 분산(샤딩) 작업을 줄일 수 있는 기능 입니다.

※ 샤드 지정 시 string 으로 지정해야 정상적으로 설정이 됨.

전에도 언급한적이 있지만 별칭을 통해 전체색인한 인덱스를 변경할 경우 서비스 다운타임 없이 적용하는데 유용한 기능이니 참고 하시면 좋을 것 같습니다.

:

[elasticsearch] node.local 설정.

Elastic/Elasticsearch 2014. 2. 18. 10:33

개발환경에서 또는 그냥 내 개발 장비에서 테스트 할때 설정 하시면 됩니다.

불필요한 traffic 을 유발 하거나 warning 을 던지지 않기 떄문에 뭐 그래도 도움이 되지 않겠어요. :)

node.local: true 로 설정 하시면 되는데, 이 설정 말고 node.mode 설정이 또 있죠.

둘 중 하나만 설정 하셔도 됩니다.

[원문]

Is the node a local node. A local node is a node that uses a local (JVM level) discovery and transport. Other (local) nodes started within the same JVM (actually, class-loader) will be discovered and communicated with. Nodes outside of the JVM will not be discovered.

:

[elasticsearch] elasticsearch.yml 설정 properties...

Elastic/Elasticsearch 2014. 2. 17. 17:30

elasticsearch 1.0.0 기준 입니다.

그냥 default 만 보셔도 뭐 되지만 혹시라도 궁금하신 분들이 계실 수도 있어서 공유해 봅니다.

대부분 값은 default 설정 값이니 별도 튜닝은 하셔야 합니다.

여기서 어떤 값을 튜닝 하느냐에 따라서 성능이 좋아 질수도 나빠 진수도 있겠죠. :)

#[SERVER]
bootstrap.mlockall: true            # swap avoid, ulimit -l unlimited

#[CLUSTER]
#[ClusterName, NodeBuilder, TribeService settings]
cluster.name: gsshop_genie_cluster

#[ConcurrentRebalanceAllocationDecider]
cluster.routing.allocation.cluster_concurrent_rebalance: 2

#[ThrottlingAllocationDecider]
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 2

#[ClusterRebalanceAllocationDecider]
cluster.routing.allocation.allow_rebalance: "indices_all_active"

#[EnableAllocationDecider]
cluster.routing.allocation.enable: "all"
index.routing.allocation.enable: "all"

#[NODE]
#[DiscoveryNode, DiscoveryNodeService]
node.name: NODE_NAME
node.master: true
node.data: true
node.mode: network

#[INDEX - IndexDynamicSettingsModule]
#[IndexMetaData settings]
index.number_of_shards: 5
index.number_of_replicas: 1

#[settings]
index.mapper.dynamic: true

#[IndexStoreModule settings]
index.store.type: "mmapfs"

#[IndexDynamicSettingsModule settings]
index.compound_format: false
index.compound_on_flush: true
index.shard.check_on_startup: false        # true/fix/false

#[IndexFieldDataService, IndicesFilterCache settings]
index.fielddata.cache: "node"            # resident(in memory), soft(OOM control), node(default)
index.cache.filter.size: -1
index.cache.filter.expire: -1

#[InternalIndexShard settings]
index.refresh_interval: "1s"

#[IndexDynamicSettingsModule, TieredMergePolicyProvider settings]
index.merge.async: true
index.merge.policy.expunge_deletes_allowed: 10
index.merge.policy.floor_segment: 2mb
index.merge.policy.max_merge_at_once: 10
index.merge.policy.max_merge_at_once_explicit: 30
index.merge.policy.max_merged_segment: 5gb
index.merge.policy.segments_per_tier: 10
index.reclaim_deletes_weight: 2.0

#[TranslogModule, TranslogService, FsTranslog settings]
index.translog.fs.type: simple
index.translog.flush_threshold_ops: 5000
index.translog.flush_threshold_size: 200mb
index.translog.flush_threshold_period: 30m

#[LocalGatewayAllocator settings]
index.recovery.list_itmeout: "30s"
index.recovery.initial_shards: "quorum"

#[InternalEngine, IndexingMemoryController settings]
indices.memory.index_buffer_size: 10%

#[RecoverySettings settings]
index.shard.recovery.file_chunk_size: 512kb
index.shard.recovery.translog_ops: 1000
index.shard.recovery.translog_ops: 512kb
indices.recovery.max_bytes_per_sec: 0
indices.recovery.concurrent_streams: 3
index.shard.recovery.concurrent_small_file_streams: 2

#[IndicesFieldDataCache settings]
indices.fielddata.cache.size: 20%
indices.fielddata.cache.expire: 15m

#[IndicesFilterCache settings]
indices.cache.filter.size: "20%"
indices.cache.filter.expire: 15m
indices.cache.clean_interval: "60s"

#[AutoCreateIndex settings]
action.auto_create_index: true

#[TransportNodesShutdownAction settings]
action.disable_shutdown: false        # rest api parameter : delay(default 200ms)

#[TransportShardReplicationOperationAction settings]
action.replication_type: async
action.write_consistency: quorum

#[NETWORK]
#[NetworkService settings]
network.host: NODE_IP
network.tcp.no_delay: true
network.tcp.reuse_address: true

#[Transport, NettyTransport settings]
transport.host: NODE_IP
transport.tcp.port: 9300
transport.tcp.connect_timeout: 30s
transport.tcp.compress: true

#[NettyHttpServerTransport settings]
http.port: 9200
http.max_content_length: 100mb
http.compression: true
http.compression_level: 6

#[InternalNode, TribeService settings]
http.enabled: true

#[ThreadPool default settings]
threadpool.generic.type: cached
threadpool.generic.keep_alive: "30s"
threadpool.index.type: fixed
threadpool.index.size: 2                # availableProcessors
threadpool.index.queue_size: 200
threadpool.bulk.type: fixed
threadpool.bulk.size: 2                 # availableProcessors
threadpool.bulk.queue_size: 50
threadpool.get.type: fixed
threadpool.get.size: 2                    # availableProcessors
threadpool.get.queue_size: 1000
threadpool.search.type: fixed
threadpool.search.size: 6                # availableProcessors x 3
threadpool.search.queue_size: 1000
threadpool.suggest.type: fixed
threadpool.suggest.size: 2                    # availableProcessors
threadpool.suggest.queue_size: 1000
threadpool.percolate.type: fixed
threadpool.percolate.size: 2                    # availableProcessors
threadpool.percolate.queue_size: 1000
threadpool.management.type: scailing
threadpool.management.keep_alive: "5m"
threadpool.management.size: 5
threadpool.flush.type: scailing
threadpool.flush.keep_alive: "5m"
threadpool.flush.size: 2                # Math.min( ( ( availableProcessors + 1 ) / 2 ), 5 )
threadpool.merge.type: scailing
threadpool.merge.keep_alive: "5m"
threadpool.merge.size: 2                # Math.min( ( ( availableProcessors + 1 ) / 2 ), 5 )
threadpool.refresh.type: scailing
threadpool.refresh.keep_alive: "5m"
threadpool.refresh.size: 2                # Math.min( ( ( availableProcessors + 1 ) / 2 ), 10 )
threadpool.warmer.type: scailing
threadpool.warmer.keep_alive: "5m"
threadpool.warmer.size: 5
threadpool.snapshot.type: scailing
threadpool.snapshot.keep_alive: "5m"
threadpool.snapshot.size: 5
threadpool.optimize.type: fixed
threadpool.optimize.size: 1

#[GATEWAY]
#[GatewayModule, GatewayService settings]
gateway.type: local
gateway.recover_after_nodes: 1
gateway.recover_after_time: 1m
gateway.recover_after_data_nodes: 1
gateway.recover_after_master_nodes: 1
gateway.expected_nodes: 1
gateway.expected_data_nodes: 1
gateway.expected_master_nodes: 1

#[ZenDiscovery, ZenPingService, UnicastZenPing settings]
discovery.zen.ping.multicast.enabled: false
discovery.zen.minimum_master_nodes: 2            # master node N 일 경우 (N/2 + 1)
discovery.zen.ping.timeout: 3s
discovery.zen.ping.unicast.hosts: ["NODE_IP:NODE_PORT", ..., "NODE_IP:NODE_PORT"]
discovery.zen.ping.unicast.concurrent_connects: 10

:

[elasticsearch] tribe node into 1.0.0

Elastic/Elasticsearch 2014. 2. 17. 17:25

1.0.0 에 들어온 기능인데 이게 물건이내요.

문서만 일단 봤는데 ㅎㅎ 유용하게 써먹을 수 있을 것 같습니다.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-tribe.html

간단하게 설명 하면 cluster 간 federated search 기능을 제공 하는 것입니다. :)

node.client 가 왜 생겼는지도 알겠내요. (꼭 이것 때문에 생긴건 아니겠지만...)

:

[elasitcsearch] changed connection pool into the 1.0.0

Elastic/Elasticsearch 2014. 2. 17. 15:43

es 0.90.x 까지 사용하던 connection pool 설정이 1.0.0 에서 조금 바뀌었습니다.
ping, high 는 그대로이구요.
recovery, reg, bulk 가 추가 되었는데 reg는 middle 이 바뀐거고, bulk 는 low 가 바뀐겁니다.
용도에 맞게 이름을 변경했내요. 참고하세요 ~

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/transport/netty/NettyTransport.java

        this.connectionsPerNodeRecovery = componentSettings.getAsInt("connections_per_node.recovery", settings.getAsInt("transport.connections_per_node.recovery", 2));
        this.connectionsPerNodeBulk = componentSettings.getAsInt("connections_per_node.bulk", settings.getAsInt("transport.connections_per_node.bulk", 3));
        this.connectionsPerNodeReg = componentSettings.getAsInt("connections_per_node.reg", settings.getAsInt("transport.connections_per_node.reg", 6));
        this.connectionsPerNodeState = componentSettings.getAsInt("connections_per_node.high", settings.getAsInt("transport.connections_per_node.state", 1));
        this.connectionsPerNodePing = componentSettings.getAsInt("connections_per_node.ping", settings.getAsInt("transport.connections_per_node.ping", 1));

:

[elasitcsearch] deprecated _boost by 1.0.0

Elastic/Elasticsearch 2014. 2. 14. 18:18

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html#mapping-boost-field

_boost 도 1.0.0 에서는 삭제 되었내요.

대신 문서에 나와 있는 것 처럼 function_score 를 이용해서 사용하라는데 뭐 이건 그냥 query boosting script 내요.. ㅋ

:

[Elasticsearch] 1.0.0 릴리즈 이후... 변경된 것중 하나..

Elastic/Elasticsearch 2014. 2. 14. 17:27

elasticsearch 1.0.0 에서 이전에 사용하던 API 의 return 값과 다른 것들이 좀 보입니다.

사용하시면서 기존 구조만 믿고 그냥 쓰시면 큰 낭패를 보실 수도 있을 듯 하내요.

아마도 _cat (http://www.elasticsearch.org/blog/introducing-cat-api/) 기능이 들어 가면서 영향을 받은 것들이 좀 되구요.

settings, add document, get document 등도 조금씩 달라진게 있습니다.

그냥 참고들 하세요.

대표적으로 많이 바뀐 API 하나 맛보기로 보여 드립니다.

아래는 제가 쓰고 있는 책의 예제로 클러스터 구성하던걸 띄워 본거라.. 좀 길게 나오내요.. 그래서 중복된 것들은 삭제 했습니다.. ^^;

$ curl -XGET 'http://localhost:9200/_nodes?pretty=true'
{
"cluster_name" : "cluster_node",
"nodes" : {
    "-Rr1fhFpTy2tYjQk9qWSQw" : {
      "name" : "node2",
      "transport_address" : "inet[/127.0.0.1:9301]",
      "host" : "jeong-ui-MacBook-Pro.local",
      "ip" : "192.168.0.117",
      "version" : "1.0.0",
      "build" : "a46900e",
      "http_address" : "inet[localhost/127.0.0.1:9201]",
      "attributes" : {
        "master" : "true"
      },
      "settings" : {
        "index" : {
          "mapper" : {
            "dynamic" : "true"
          },
          "number_of_replicas" : "0",
          "number_of_shards" : "5",
          "refresh_interval" : "1s"
        },
        "gateway" : {
          "type" : "local"
        },
        "pidfile" : "es.pid",
        "network" : {
          "host" : "localhost"
        },
        "node" : {
          "data" : "true",
          "master" : "true",
          "name" : "node2"
        },
        "http" : {
          "port" : "9201",
          "enabled" : "true"
        },
        "transport" : {
          "tcp" : {
            "compress" : "true",
            "port" : "9301"
          }
        },
        "name" : "node2",
        "action" : {
          "disable_shutdown" : "true",
          "auto_create_index" : "true"
        },
        "path" : {
          "logs" : "/Users/hwjeong/server/app/elasticsearch/node2/logs",
          "home" : "/Users/hwjeong/server/app/elasticsearch/node2"
        },
        "cluster" : {
          "name" : "cluster_node"
        },
        "discovery" : {
          "zen" : {
            "minimum_master_nodes" : "2",
            "ping" : {
              "unicast" : {
                "hosts" : [ "localhost:9300", "localhost:9301", "localhost:9302" ]
              },
              "multicast" : {
                "enabled" : "false"
              }
            }
          }
        },
        "foreground" : "yes"
      },
      "os" : {
        "refresh_interval" : 1000,
        "available_processors" : 8,
        "cpu" : {
          "vendor" : "Intel",
          "model" : "MacBookPro10,1",
          "mhz" : 2400,
          "total_cores" : 8,
          "total_sockets" : 8,
          "cores_per_socket" : 16,
          "cache_size_in_bytes" : 256
        },
        "mem" : {
          "total_in_bytes" : 8589934592
        },
        "swap" : {
          "total_in_bytes" : 5368709120
        }
      },
      "process" : {
        "refresh_interval" : 1000,
        "id" : 2447,
        "max_file_descriptors" : 10240,
        "mlockall" : false
      },
      "jvm" : {
        "pid" : 2447,
        "version" : "1.6.0_65",
        "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version" : "20.65-b04-462",
        "vm_vendor" : "Apple Inc.",
        "start_time" : 1392355185303,
        "mem" : {
          "heap_init_in_bytes" : 268435456,
          "heap_max_in_bytes" : 1060372480,
          "non_heap_init_in_bytes" : 24317952,
          "non_heap_max_in_bytes" : 136314880,
          "direct_max_in_bytes" : 1060372480
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      },
      "thread_pool" : {
        "generic" : {
          "type" : "cached",
          "keep_alive" : "30s"
        },
        "index" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : "200"
        },
        "get" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : "1k"
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 4,
          "keep_alive" : "5m"
        },
        "merge" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 4,
          "keep_alive" : "5m"
        },
        "suggest" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : "1k"
        },
        "bulk" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : "50"
        },
        "optimize" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 4,
          "keep_alive" : "5m"
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 4,
          "keep_alive" : "5m"
        },
        "search" : {
          "type" : "fixed",
          "min" : 24,
          "max" : 24,
          "queue_size" : "1k"
        },
        "percolate" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : "1k"
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m"
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 4,
          "keep_alive" : "5m"
        }
      },
      "network" : {
        "refresh_interval" : 5000,
        "primary_interface" : {
          "address" : "192.168.0.117",
          "name" : "en0",
          "mac_address" : "28:CF:E9:14:C5:C9"
        }
      },
      "transport" : {
        "bound_address" : "inet[/127.0.0.1:9301]",
        "publish_address" : "inet[/127.0.0.1:9301]"
      },
      "http" : {
        "bound_address" : "inet[/127.0.0.1:9201]",
        "publish_address" : "inet[/127.0.0.1:9201]",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ ]
    },
    .................. 요기 부터 클러스터 노드 정보들로 중복 입니다.
    }
}
}

:

[elasticsearch] FieldQueryBuilder 삭제 from 1.0.0

Elastic/Elasticsearch 2014. 2. 13. 16:58

es 1.0.0 올리면서 사라진 API 입니다.
FieldQueryBuilder 가 없어졌내요.
beta2 까지 있었는데 RC1 부터 없어졌내요.

일단 fieldquerybuilder 는 기존에 querystringquerybuilder 랑 동일하다고 보시면 됩니다.

그렇기 때문에 중복되는 api 를 남겨둘 필요가 없었던게 아닌가 싶내요. ^^

그리고 추가적으로 하나 더...

기존에 bin/elasticsearch 하면 backgroud 로 실행이 되었는데 이제는 그냥 foreground 로 실행이 되내요..

background 로 실행 하고 싶으실 경우 아래와 같이 실행 하시면 됩니다.

bin/elasticsearch -Des.pidfile=es.pid > /dev/null 2>&1 &

:

[elasticsearch] wow 1.0.0 released!!

Elastic/Elasticsearch 2014. 2. 13. 10:48

http://www.elasticsearch.org/blog/1-0-0-released/

ㅎㅎ 드뎌 릴리즈 되었내요.

아래는 블로그에 올라와 있는 key features 입니다.

The main features available in 1.0 are:

Snapshot/Restore API

Backup or restore select indices or the whole cluster to a shared filesystem, S3 or HDFS via a simple API.

Blog: Introducing snapshot & restore, Docs: Snapshot/Restore
Aggregations

Aggregations are “facets” reborn, providing more powerful, more flexible real-time analytics. Aggregations can be combined with each other and nested, to slice and dice your data exactly the way you want it. Includes the new geohash grid aggregation for geo-clustering.

Blog: Data visualization with elasticsearch aggregations and D3, Docs: Aggregations
Distributed Percolation

Percolation is search reversed: instead of running a query to find matching docs, we can percolate a document to find matching queries. While percolation was already available before, this release makes percolation distributed, so that it will scale with your cluster. Percolation now supports highlighted search snippets, aggregations and bulk percolation.

Blog: Redesigned percolator, Docs: Percolator
cat API

Easy to read, console-based insight into what is happening in your cluster. Particularly useful to the sysadmin when the alarm goes off at 3am and JSON is too difficult to read.

Blog: Introducing the cat API, Docs: cat API
Federated search

The tribe node joins multiple clusters and acts as a federated client. Almost all operations are supported: distributed search, suggestions, percolation. You can even index into multiple clusters with the tribe node. Alternatively, you can set a tribe node to not allow any write operations, making it read-only.

Docs: Tribe node
Doc values

Some of those beautiful aggregations can use a lot of memory, especially when they involve text fields. Doc values store field values on disk rather than in memory, allowing you to run aggregations on much bigger datasets, at the cost of a little performance.

Blog: disk-based field data a.k.a. doc values, Docs: Doc Values
Circuit breaker

There are a few sharp edges in Elasticsearch, places where you can hurt yourself if you are not careful. We are working on adding “circuit breaker” functionality to prevent you from doing so. The first circuit breaker detects attempts to load too much fielddata into memory, which may cause OOM (out of memory) exceptions. More circuit breakers will follow.

Docs: Fielddata circuit breaker

0.90.11 에서 1.0.0 으로 버전 올렸는데 일단 문제 없이 마이그레이션은 바로 되내요.

일단 기본 검색 기능들도 잘 동작 하구요.

API 변경은 소스코드나 github 에 change log 를 봐야겠내요.

일단 기존에 구현하던 프로젝트에 1.0.0 적용한 바로는 ㅎㅎ 문제 없내요.

굿굿굿....

:

[elasticsearch] Data visualization with Elasticsearch aggregations and D3

Elastic/Elasticsearch 2014. 2. 12. 18:12

http://www.elasticsearch.org/blog/data-visualization-elasticsearch-aggregations/

데이터 분석에 치중하는 모습이 최근들어 많이 보입니다.
그렇다 보니 당연히 facet 기능 향상이 빠르게 되고 있내요.
여기에 d3 를 이용해서 data visualization 까지 지원 하고 있내요.

:

jjeong

'Elastic/Elasticsearch'에 해당되는 글 385건