'Elastic/Elasticsearch' 카테고리의 글 목록 (24 Page)

[Elasticsearch] parent / child 살짝 알아보기. (like join)

Elastic/Elasticsearch 2014. 4. 16. 16:01

활용편에 소개 할 내용중 하나 인데 오늘 살짝 맛보기만 조금 던져 볼까 합니다.

elasticsearch 를 가지고 like join 같은 기능을 구현 하고 싶으신 분들이 많이 있습니다.

그래서 예전에도 조금 언급은 했었는데요.

elasticsearch 로 비슷하게 구현 가능한 방법은 parent/child 와 nested 구조를 이용하는 것입니다.

오늘은 parent/child 만 알아 볼께요.

우선 레퍼런스 문서는...

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-parent-field.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-query.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-query.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-filter.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html

elasticsearch blog 글) 필독!!

http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/

참고 하시기 바랍니다.

1. parent/child 정의 하기.

"mappings" : {

"parent_type" : {

"properties" : {

"doc_id" : {"type" : "long", "store" : "no", "index" : "not_analyzed", "index_options" : "docs", "ignore_malformed" : true, "include_in_all" : false},

"content" : {"type" : "string", "store" : "no", "index" : "analyzed", "omit_norms" : false, "index_options" : "offsets", "term_vector" : "with_positions_offsets", "include_in_all" : false}

}

},

"child_type" : {

"_parent" : {

"type" : "parent_type"

},

"properties" : {

"keyword" : {"type" : "string", "store" : "no", "index" : "not_analyzed", "omit_norms" : true, "index_options" : "docs", "include_in_all" : false}

}

위 설정에서 보시는 것 처럼 parent type 과 child type 을 정의 하고 child type 에 parent type 을 지정하면 됩니다.

이 예제는 등록된 글에서 키워드를 추출해서 child 로 저장하는 것입니다.

RDBMS 관점으로 설명을 하자면,

child_type 이라는 테이블에서 질의를 하고 거기서 나온 결과 값을 가지고 parent_child 테이블에 inner join 하는 구조 입니다.

2. has_child 질의 하기

GET /article/_search

{

"query": {

"has_child": {

"type": "child_type",

"query": {

"term": {

"keyword": {

"value": "elasticsearch"

}

위 query 를 보면 child_type 에서 elasticsearch 라는 키워드가 있는 문서를 찾고 그 문서의 parent document 를 return 해 주게 됩니다.

ㅋ 결과를 안보여 드리니까.. 감이 잘 안오신다구요...

그럼 아래 글 보고 돌려 보시면 되겠습니다. :)

이건 웹상에 잘 정리가 된게 있어서 그냥 링크만 공유 합니다.

http://joelabrahamsson.com/grouping-in-elasticsearch-using-child-documents/

[parent/child 사용시 주의 점]

1. 검색 결과로 child 문서의 field 는 return 되지 않습니다.

2. memory 사용을 많이 합니다.

3. 아직 까지는 nested 보다 느립니다.

4. cross index 지원을 하지 않습니다. (즉, 하나의 index 에 type 으로만 지원 합니다.)

:

[홍보] 실무 예제로 배우는 Elasticsearch 검색엔진 - 기본편 종이책...

Elastic/Elasticsearch 2014. 4. 10. 14:09

드디어 종이책이 출간 되었습니다.

예스24, 알라딘, 교보, 인터파크에서 구매 할수 있다고 합니다.

가격은 13,200원 입니다.

http://www.yes24.com/24/Goods/12679909?Acode=101

:

[elasticsearch] percolator... 링크.

Elastic/Elasticsearch 2014. 4. 8. 17:41

Original Link : http://blog.qbox.io/elasticsesarch-percolator

Our Qbox team has been asked about the Percolate API, and we’re glad to share here an introduction on that very popular Elasticsearch feature. But before we get started a small amount of setup is involved. To make sure we’re working with the same environment, we’ll start with installing v1.0.1 of Elasticsearch.

Install and Start Elasticsearch v1.0.1

http://www.elasticsearch.org/download/

Distributed Percolation is a feature of the v1.x series of Elasticsearch. We’ll be using the current version of the v1.x series, v1.0.1 . If you’ve never Installed Elasticsearch, take a moment to watch or read our Elasticsearch Tutorial Ep.1 for detailed instructions.

Mapping and Data

http://sense.qbox.io/gist/28ff2f1031e6d4a5904604d24d26b0bad6238720

For this introduction we've provided a sense gist with runnable code examples in the link above. Once v1.0.1 of Elasticsearch is running locally, you may map your documents, and begin using the Percolate API examples below.

Percolate

Elasticsearch-percolate-API-1

The Percolate API is a commonly used utility in Elasticsearch for alerting and monitoring documents. “Search in reverse” is a good way to conceptualize what Percolation does. Searching with Elasticsearch is usually done by querying a set of documents for relevance to the search. Percolate works in the opposite way, however, Percolating your documents against registered queries (percolators) for matches.

Elasticsearch-percolate-API-2

v1.0.0 of Elasticsearch brought a major change to how the Percolate API distributes its registered queries. Percolator 0.90.x and previous versions have a single shard index restriction.With a single shard, performance continues to degrade as the number of registered queries grows.

To get around this bottleneck, queries could be partitioned against multiple single shard indices, or you could manipulate Percolate queries to reduce the execution time. Using these methods, however, still caused fundamental scaling limits for any Percolator index shard. Having to “get around” this bottleneck was a concern for the Elasticsearch team who wanted to make the Percolator distributed. v1.0.0 Distributed Percolation put these issues to bed, dropping the previous _percolator index shard restriction for a .percolator type in an index.

Distributed Percolation

Elasticsearch-percolate-API-3

A .percolator type gives users a distributed Percolator API environment for full shard distribution, and you can now configure the number of shards necessary for your Percolator queries, changing from a restricted single shard execution to a parallelized execution between all shards within that index. Multiple shards means support for routing and preference, just like the other Search APIs (except the Explain API).

Dropping the old _percolator index shard restriction does create breaking backwards compatibility with the 0.90.x Percolator, but breaking changes in Percolation are a great reason to make renovations and features.

Structure of a Percolator in v1.x

Registering a .percolator has changed little from a Percolator of the 0.90.x series. A more substantial change mentioned earlier is the .percolator is now a type in an index, as shown in the example below. In this Percolator we register a match query for the “sport” field containing “baseball.”

curl -XPUT 'localhost:9200/sports/.percolator/1' -d '{
   "query" : { 
       "match" : {
           "sport" : "baseball"
       }
   }
}'

Default mapping for a .percolator type is a query field type of object, with “enabled” set to false. (Enabled allows disabling of parsing and indexing on a named object.) It is worth noting that this new index type could exist on a dedicated Percolator index. Remember when using a dedicated Percolator index to include the mapping of the documents you _percolate. Without the correct mapping for the documents you _percolate, .percolator queries can be (and probably will be) parsed incorrectly.

Request:

curl -XGET "http://localhost:9200/sports/_mapping"

Response:

{
  "sports" : {
    "mappings" : {
      ".percolator" : {
        "_id" : {
        "index" : "not_analyzed"
      },
      "properties" : {
        "query" : {
          "type" : "object",
          "enabled" : false
        }
      }
    }
  }
}

Percolate

Running _percolate through this .percolator below will return a match if it meets a .percolator relevance. There are a few ways we can run our documents against our Percolator. First, we will use the very standard “doc” body to execute the _percolator API. Usually you would use this method on documents that do not already exist.

Percolator:

curl -XPUT 'localhost:9200/sports/.percolator/1' -d '{
   "query" : {
       "match" : {
           "sport" : "baseball"
       }
   }
}'

Percolating on a “doc” body:

curl -XPOST "http://localhost:9200/sports/athlete/_percolate/" -d '{
  "doc": {
     "name": "Jeff",
     "birthdate": "1990-4-1",
     "sport": "Baseball",
     "rating": 2,
     "location": "46.12,-68.55"
  }
}'

This sports index has a single .percolator with “_id”:”1” that our document matches. You can see in the response below that it took 1ms, 5 out of 5 shards were successful, and we matched one Percolator in the sports index with “_id”: “1”.

Response:

{
  "took": 1,
  "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
  },
  "total": 1,
  "matches": [
     {
        "_index": "sports",
        "_id": "1"
     }
  ]
}

Bulk Percolating documents can be achieved with the multi-Percolate API (similar to the bulk API). Format for the multi-Percolate API begins with a header specifying your index, type, and id. Followed after the header is your JSON document body itself. No JSON document is required when Percolating an existing document; only a reference to the “_id” of the document is required.

Request:

curl -XGET 'localhost:9200/sports/athlete/_mpercolate' --data-binary @multi-percolate.txt; echo

Multi-percolate.text:

{"percolate" : {"index" :”sport", "type" : "athlete"}}
{"doc" : {"name":"Michael", "birthdate":"1989-10-1", "sport":"Baseball", "rating": ["5", "4"],  "location":"46.22,-68.45"}}
{"percolate" : {"index" : twitter", "type" : "tweet", "id" : "1"}}
{}

To _percolate a single an existing document, simply mention the “_id” of the document to Percolate on

curl -XGET 'localhost:9200/sports/athlete/1/_percolate'

Another format for the standard _percolate response is count, which only responds with the total number of matches.

curl -XPOST "http://localhost:9200/sports/athlete/_percolate/count" -d '{
  "doc": {
     "name": "Jeff",
     "birthdate": "1990-4-1",
     "sport": "Baseball",
     "rating": 2,
     "location": "46.12,-68.55"
  }
}'

Response:

{
  "took": 3,
  "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
  },
  "total": 1
}

A way to specifically Percolate athletes with the sport baseball would be a filter. We could then create a .percolator on another field about which we are curious, say, a specific birthdate.

curl -XPOST "http://localhost:9200/sports/athlete/_percolate/" -d '{
  "doc": {
     "name": "Jeff",
     "birthdate": "1990-4-1",
     "sport": "Baseball",
     "rating": 2,
     "location": "46.12,-68.55"
  },
  "filter": {
     "term": {
        "sport": "baseball"
     }
  }
}'

curl -XPUT "http://localhost:9200/sports/.percolator/2" -d '{
   "query":{
       "match": {
          "birthdate": "1990-4-1"
       }
   }
}'

Other supported query string options for _percolate include size, track_scores, sort, facets, aggs, and highlight. Query and filter options only differ by query’s score being computed. The computed score can then be used to show the documents score, which is based on the query’s match to the Percolate query’s metadata. You can also use highlight, facets, or aggregations on these request bodies. Using size to specify the number of matches to return ( defaults to unlimited).

Distributed Percolation can be the solution for some of the most active databases in production today. Fascinating data and analytics can be gained from your real-time _percolate. With distribution, the Percolate API will only grow into more interesting use cases and ideas for Elasticsearch.

If you enjoyed this post, you’ll want to check out some of our other tutorials like An Introduction to Elasticsearch Aggregations for more.

You can use Elasticsearch v1.0.1 and 0.90.12 on Qbox today to try out Percolation on a dedicated cluster of your choosing. If you have any questions, feel free to leave a comment below or contact us. Runnable code of all the examples used in this tutorial can be found at thissense-gist.

:

[Elasticsearch] replica & shard 이해하기.

Elastic/Elasticsearch 2014. 4. 7. 23:16

es 를 다루면서 기본이 되는 내용인데 잘못 이해 하고 계시는 분들을 위해서 글 올려 봅니다.

제 책에도 언급이 되어 있습니다. ^^;

http://www.hanbit.co.kr/ebook/look.html?isbn=9788968486913

replica 는 말 그대로 index 에 대한 복제를 의미 합니다.

이 복제에 대한 의미를 active, standby 로 표현 하는 것은 잘 못된 표현 입니다.

replica 는 물리적인 index 의 shard 를 복제 하여 분산처리를 가능하게 해주고 SPOF(single point of failure) 에 대한 failover 개념으로 사용 되는 것입니다.

replica 와 shard 에 대해서는 인터넷 검색해 보시면 그림을 잘 그려 주신 분들이 너무나 많습니다.

그리고 slideshare 나 elasticsearch.org 에 들어가 보시면 많은 문서들이 있기 때문에 보시면 될 것 같습니다.

replica 는 기본적으로 primary 를 기반으로 복제를 하게 됩니다.

shard 에는 primary shard 와 replica shard 이렇게 두 가지가 존재 합니다.

모든 색인은 기본 primary shard 에서 이루어 지고 이것을 가지고 복제를 하게 되는 것이구요.

또한 검색 시 이 shard d 들은 성능에 많은 영향을 주게 됩니다.

0.90.x 제 기억으로는 0.90.5 까지만 해도 3가지 type 만 있었는데 1.x 또는 제가 확인 안한 0.90.5 이상 부터..

아래 유형들이 추가 되었습니다.

`_primary`::

The operation will go and be executed only on the primary

shards.

`_primary_first`::

The operation will go and be executed on the primary

shard, and if not available (failover), will execute on other shards.

`_local`::

The operation will prefer to be executed on a local

allocated shard if possible.

`_only_node:xyz`::

Restricts the search to execute only on a node with

the provided node id (`xyz` in this case).

`_prefer_node:xyz`::

Prefers execution on the node with the provided

node id (`xyz` in this case) if applicable.

`_shards:2,3`::

Restricts the operation to the specified shards. (`2`

and `3` in this case). This preference can be combined with other

preferences but it has to appear first: `_shards:2,3;_primary`

이 옵션들은 검색 요청 시 성능 또는 기능 요구사항에 따라 다양하게 활용을 할 수 있습니다.

기본 설정 값은 random 입니다.

이전까지 유효했던 설정은 _local, _primary 였구요. 위에 있는 옵션들은 추가 된건데 기능이 동작 하는지은 테스트 해보지는 않았습니다.

정리 하면,

replica)

primary shard 에 대한 복제 기능 설정을 하는 것이다.

replica shard 를 의미 한다.

기본 설정은 1 이다.

full replica 설정은 node size - 1 이다.

SPOF 대응을 위한 설정이다.

shard)

primary shard 와 replica shard 가 있다.

primary shard 에 기본적으로 색인이 되며, 이를 기준으로 replica shard 를 생성하게 된다.

shard 는 lucene 에서 사용하는 index 기준이다. ( 루씬 기준의 물리적인 인덱스 라고 보시면 됩니다. )

추가로, unassigned shard 도 여러가지 상황에 따라 발생을 하게 됩니다.

- 할당한 노드가 없을 때도 발생을 하고

- recovery 가 실패 했을 떄도 발생을 합니다.

- 그 외 여러가지 상황이 있을 수 있구요.

기본적으로 정상적인 shard 인데 unassigned shard 로 남아 있는 경우 수동으로 할당을 시켜 줄 수 있습니다.

shard reroute 를 통해서 할당해 주면 됩니다. (http://jjeong.tistory.com/909)

그리고 sharding 알고리즘은 기본적으로 각 노드에 순차적으로 shard 를 할당하는 구조 입니다. (http://jjeong.tistory.com/926)

요정도 까지만 정리하도록 하겠습니다.

es 의 기본 개념은 elasticsearch.org 의 glossary 를 참고하세요.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html

:

책 소개] 실무 예제로 배우는 Elasticsearch 검색엔진.(기본편)

Elastic/Elasticsearch 2014. 3. 25. 18:43

부끄럽지만 제가 쓴 책 소개를 하겠습니다.

Elasticsearch 관련된 개발 서적 입니다.

수준은 초급 개발자에 맞춰져 있고, 검색에 대한 기본 이해를 도와 줍니다.

입문 과정이기 떄문에 Elasticsearch 에서 제공하는 모든 API와 연관 컴포넌트들에 대한 것을 담고 있지는 않습니다. 이런 것들은 이후 활용편에서 다룰 예정이니 이것도 기대해 주시면 좋겠습니다.

종이책은 4월 10일 출간 예정입니다.

설치 그리고 설정, 색인, 검색 까지의 기본 스토리를 전제로 작성 되어 있으며, 기본 예제로 사용된 내용은 일반 쇼핑몰 데이터와 검색을 바탕으로 구성을 하였습니다.

출판사) 한빛미디어

책제목) 실무 예제로 배우는 Elasticsearch 검색엔진 - 기본편

링크) http://www.hanbit.co.kr/ebook/look.html?isbn=9788968486913

:

Elasticsearch) The Definitive Guide...

Elastic/Elasticsearch 2014. 3. 24. 09:55

지난 주에 관련 서적에 대해서 이야기를 들었는데 공식 블로그에 올라왔내요.

처음 접하시는 분들에게는 많은 도움이 되지 않을까 싶습니다.

아직은 abstract 밖에 없지만 조만간 업데이트 될것 같내요.

블로그 : http://www.elasticsearch.org/blog/elasticsearch-definitive-guide/

github : https://github.com/elasticsearch/elasticsearch-definitive-guide

:

[Elasticsearch] 관리도구 만들때 유용한 REST API

Elastic/Elasticsearch 2014. 3. 21. 01:12

elasticsearch 관리도구 만들때 유용한 REST API 입니다.

뭐 꼭 이런걸 만들 필요가 있나요 하시는 분들도 계실텐데요.

필요 할 수도 있습니다. ㅎㅎ

대표적인 오픈소스 기반의 플러그인은 잘 아시는 head, bigdesk, hq, paramedic 등등...

오피셜 버전으로는 marvel도 있으니...

중요한건 실시간 모니터링은 위 플러그인으로 되구요.

marvel 은 모니터링 데이터를 색인해서 저장하기 때문에 기간 단위 조회도 가능 하죠.

하지만 이 도구들의 단점이 있죠.

결국 모니터링 하고자 하는 노드에 설치가 되어야 한다는 문제점 입니다.

뭐 해결 방법은 많이 있겠죠.

가장 쉽게는 proxy 설정을 통해서 할 수도 있겠죠.

뭐 정리 하고 적어 봅시다.

쓸만한 REST API

[관리용 REST API]

- http://localhost:9200/_cluster/health?pretty=true

- http://localhost:9200/_nodes/stats?pretty=true

- http://localhost:9200/_stats?pretty=true

그 이외는 이전에 한번 언급한적이 있는 것 같은데 _cat 을 한번 검토해 보시는 것도 좋습니다.

:

[lucene] IndexReader/Fields의 term 추출.

Elastic/Elasticsearch 2014. 2. 25. 11:19

Reference :

http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexReader.html

https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/Fields.html

어느분이 물어 보셔서 공유 합니다.
현재 색인된 파일에서 색인된 term을 추출하고 싶으신 분들이 계신것 같습니다.

용도와 목적은 잘 모르겠지만 뭐 필요 하시니까 찾으시겠죠.

하지만 groupby 같은 기능을 구현하려고 하시는 거라면 그냥 facet을 사용하라고 추천 하고 싶습니다.

위에 링크 걸어 놓은 것 처럼 루씬의 IndexReader 를 이용해서 구현 가능 합니다.

IndexReader.getTermVectors(docID) : 문서 전체 term list

IndexReader.getTermVector(docId, field) : 문서의 특정 field 내 term list

문서에 설명이 잘 나와 있으니 참고하세요.

그리고 elasticsearch 에서는 ShardTermVectorService 에서 관련 기능을 제공하고 있습니다.

아래는 IndexReader 소스코드 입니다.

package org.apache.lucene.index;

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import java.io.Closeable;
import java.io.IOException;
import java.util.Collections;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;
import java.util.WeakHashMap;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.lucene.document.DocumentStoredFieldVisitor;
import org.apache.lucene.store.AlreadyClosedException;
import org.apache.lucene.util.Bits;
// javadocs

/** IndexReader is an abstract class, providing an interface for accessing an
index. Search of an index is done entirely through this abstract interface,
so that any subclass which implements it is searchable.

There are two different types of IndexReaders:
<ul>
<li>{@link AtomicReader}: These indexes do not consist of several sub-readers,
they are atomic. They support retrieval of stored fields, doc values, terms,
and postings.
<li>{@link CompositeReader}: Instances (like {@link DirectoryReader})
of this reader can only
be used to get stored fields from the underlying AtomicReaders,
but it is not possible to directly retrieve postings. To do that, get
the sub-readers via {@link CompositeReader#getSequentialSubReaders}.
Alternatively, you can mimic an {@link AtomicReader} (with a serious slowdown),
by wrapping composite readers with {@link SlowCompositeReaderWrapper}.
</ul>

IndexReader instances for indexes on disk are usually constructed
with a call to one of the static <code>DirectoryReader.open()</code> methods,
e.g. {@link DirectoryReader#open(org.apache.lucene.store.Directory)}. {@link DirectoryReader} implements
the {@link CompositeReader} interface, it is not possible to directly get postings.

 For efficiency, in this API documents are often referred to via
document numbers, non-negative integers which each name a unique
document in the index. These document numbers are ephemeral -- they may change
as documents are added to and deleted from an index. Clients should thus not
rely on a given document having the same number between sessions.


<a name="thread-safety"></a>NOTE: {@link
IndexReader} instances are completely thread
safe, meaning multiple threads can call any of its methods,
concurrently. If your application requires external
synchronization, you should not synchronize on the
<code>IndexReader</code> instance; use your own
(non-Lucene) objects instead.
*/
public abstract class IndexReader implements Closeable {

private boolean closed = false;
private boolean closedByChild = false;
private final AtomicInteger refCount = new AtomicInteger(1);

IndexReader() {
 if (!(this instanceof CompositeReader || this instanceof AtomicReader))
 throw new Error("IndexReader should never be directly extended, subclass AtomicReader or CompositeReader instead.");
}

/**
 * A custom listener that's invoked when the IndexReader
 * is closed.
 *
 * @lucene.experimental
 */
public static interface ReaderClosedListener {
 /** Invoked when the {@link IndexReader} is closed. */
 public void onClose(IndexReader reader);
}

private final Set<ReaderClosedListener> readerClosedListeners =
 Collections.synchronizedSet(new LinkedHashSet<ReaderClosedListener>());

private final Set<IndexReader> parentReaders =
 Collections.synchronizedSet(Collections.newSetFromMap(new WeakHashMap<IndexReader,Boolean>()));

/** Expert: adds a {@link ReaderClosedListener}. The
 * provided listener will be invoked when this reader is closed.
 *
 * @lucene.experimental */
public final void addReaderClosedListener(ReaderClosedListener listener) {
 ensureOpen();
 readerClosedListeners.add(listener);
}

/** Expert: remove a previously added {@link ReaderClosedListener}.
 *
 * @lucene.experimental */
public final void removeReaderClosedListener(ReaderClosedListener listener) {
 ensureOpen();
 readerClosedListeners.remove(listener);
}

/** Expert: This method is called by {@code IndexReader}s which wrap other readers
 * (e.g. {@link CompositeReader} or {@link FilterAtomicReader}) to register the parent
 * at the child (this reader) on construction of the parent. When this reader is closed,
 * it will mark all registered parents as closed, too. The references to parent readers
 * are weak only, so they can be GCed once they are no longer in use.
 * @lucene.experimental */
public final void registerParentReader(IndexReader reader) {
 ensureOpen();
 parentReaders.add(reader);
}

private void notifyReaderClosedListeners() {
 synchronized(readerClosedListeners) {
 for(ReaderClosedListener listener : readerClosedListeners) {
 listener.onClose(this);
 }
 }
}

private void reportCloseToParentReaders() {
 synchronized(parentReaders) {
 for(IndexReader parent : parentReaders) {
 parent.closedByChild = true;
 // cross memory barrier by a fake write:
 parent.refCount.addAndGet(0);
 // recurse:
 parent.reportCloseToParentReaders();
 }
 }
}

/** Expert: returns the current refCount for this reader */
public final int getRefCount() {
 // NOTE: don't ensureOpen, so that callers can see
 // refCount is 0 (reader is closed)
 return refCount.get();
}

/**
 * Expert: increments the refCount of this IndexReader
 * instance. RefCounts are used to determine when a
 * reader can be closed safely, i.e. as soon as there are
 * no more references. Be sure to always call a
 * corresponding {@link #decRef}, in a finally clause;
 * otherwise the reader may never be closed. Note that
 * {@link #close} simply calls decRef(), which means that
 * the IndexReader will not really be closed until {@link
 * #decRef} has been called for all outstanding
 * references.
 *
 * @see #decRef
 * @see #tryIncRef
 */
public final void incRef() {
 ensureOpen();
 refCount.incrementAndGet();
}

/**
 * Expert: increments the refCount of this IndexReader
 * instance only if the IndexReader has not been closed yet
 * and returns <code>true</code> iff the refCount was
 * successfully incremented, otherwise <code>false</code>.
 * If this method returns <code>false</code> the reader is either
 * already closed or is currently being closed. Either way this
 * reader instance shouldn't be used by an application unless
 * <code>true</code> is returned.
 * 
 * RefCounts are used to determine when a
 * reader can be closed safely, i.e. as soon as there are
 * no more references. Be sure to always call a
 * corresponding {@link #decRef}, in a finally clause;
 * otherwise the reader may never be closed. Note that
 * {@link #close} simply calls decRef(), which means that
 * the IndexReader will not really be closed until {@link
 * #decRef} has been called for all outstanding
 * references.
 *
 * @see #decRef
 * @see #incRef
 */
public final boolean tryIncRef() {
 int count;
 while ((count = refCount.get()) > 0) {
 if (refCount.compareAndSet(count, count+1)) {
 return true;
 }
 }
 return false;
}

/**
 * Expert: decreases the refCount of this IndexReader
 * instance. If the refCount drops to 0, then this
 * reader is closed. If an exception is hit, the refCount
 * is unchanged.
 *
 * @throws IOException in case an IOException occurs in doClose()
 *
 * @see #incRef
 */
public final void decRef() throws IOException {
 // only check refcount here (don't call ensureOpen()), so we can
 // still close the reader if it was made invalid by a child:
 if (refCount.get() <= 0) {
 throw new AlreadyClosedException("this IndexReader is closed");
 }

 final int rc = refCount.decrementAndGet();
 if (rc == 0) {
 boolean success = false;
 try {
 doClose();
 success = true;
 } finally {
 if (!success) {
 // Put reference back on failure
 refCount.incrementAndGet();
 }
 }
 reportCloseToParentReaders();
 notifyReaderClosedListeners();
 } else if (rc < 0) {
 throw new IllegalStateException("too many decRef calls: refCount is " + rc + " after decrement");
 }
}

/**
 * Throws AlreadyClosedException if this IndexReader or any
 * of its child readers is closed, otherwise returns.
 */
protected final void ensureOpen() throws AlreadyClosedException {
 if (refCount.get() <= 0) {
 throw new AlreadyClosedException("this IndexReader is closed");
 }
 // the happens before rule on reading the refCount, which must be after the fake write,
 // ensures that we see the value:
 if (closedByChild) {
 throw new AlreadyClosedException("this IndexReader cannot be used anymore as one of its child readers was closed");
 }
}

/** {@inheritDoc}
 * For caching purposes, {@code IndexReader} subclasses are not allowed
 * to implement equals/hashCode, so methods are declared final.
 * To lookup instances from caches use {@link #getCoreCacheKey} and
 * {@link #getCombinedCoreAndDeletesKey}.
 */
@Override
public final boolean equals(Object obj) {
 return (this == obj);
}

/** {@inheritDoc}
 * For caching purposes, {@code IndexReader} subclasses are not allowed
 * to implement equals/hashCode, so methods are declared final.
 * To lookup instances from caches use {@link #getCoreCacheKey} and
 * {@link #getCombinedCoreAndDeletesKey}.
 */
@Override
public final int hashCode() {
 return System.identityHashCode(this);
}

/** Retrieve term vectors for this document, or null if
 * term vectors were not indexed. The returned Fields
 * instance acts like a single-document inverted index
 * (the docID will be 0). */
public abstract Fields getTermVectors(int docID)
 throws IOException;

/** Retrieve term vector for this document and field, or
 * null if term vectors were not indexed. The returned
 * Fields instance acts like a single-document inverted
 * index (the docID will be 0). */
public final Terms getTermVector(int docID, String field)
 throws IOException {
 Fields vectors = getTermVectors(docID);
 if (vectors == null) {
 return null;
 }
 return vectors.terms(field);
}

/** Returns the number of documents in this index. */
public abstract int numDocs();

/** Returns one greater than the largest possible document number.
 * This may be used to, e.g., determine how big to allocate an array which
 * will have an element for every document number in an index.
 */
public abstract int maxDoc();

/** Returns the number of deleted documents. */
public final int numDeletedDocs() {
 return maxDoc() - numDocs();
}

/** Expert: visits the fields of a stored document, for
 * custom processing/loading of each field. If you
 * simply want to load all fields, use {@link
 * #document(int)}. If you want to load a subset, use
 * {@link DocumentStoredFieldVisitor}. */
public abstract void document(int docID, StoredFieldVisitor visitor) throws IOException;

/**
 * Returns the stored fields of the <code>n</code>th
 * <code>Document</code> in this index. This is just
 * sugar for using {@link DocumentStoredFieldVisitor}.
 * 
 * NOTE: for performance reasons, this method does not check if the
 * requested document is deleted, and therefore asking for a deleted document
 * may yield unspecified results. Usually this is not required, however you
 * can test if the doc is deleted by checking the {@link
 * Bits} returned from {@link MultiFields#getLiveDocs}.
 *
 * NOTE: only the content of a field is returned,
 * if that field was stored during indexing. Metadata
 * like boost, omitNorm, IndexOptions, tokenized, etc.,
 * are not preserved.
 *
 * @throws CorruptIndexException if the index is corrupt
 * @throws IOException if there is a low-level IO error
 */
// TODO: we need a separate StoredField, so that the
// Document returned here contains that class not
// IndexableField
public final StoredDocument document(int docID) throws IOException {
 final DocumentStoredFieldVisitor visitor = new DocumentStoredFieldVisitor();
 document(docID, visitor);
 return visitor.getDocument();
}

/**
 * Like {@link #document(int)} but only loads the specified
 * fields. Note that this is simply sugar for {@link
 * DocumentStoredFieldVisitor#DocumentStoredFieldVisitor(Set)}.
 */
public final StoredDocument document(int docID, Set<String> fieldsToLoad)
 throws IOException {
 final DocumentStoredFieldVisitor visitor = new DocumentStoredFieldVisitor(
 fieldsToLoad);
 document(docID, visitor);
 return visitor.getDocument();
}

/** Returns true if any documents have been deleted. Implementers should
 * consider overriding this method if {@link #maxDoc()} or {@link #numDocs()}
 * are not constant-time operations. */
public boolean hasDeletions() {
 return numDeletedDocs() > 0;
}

/**
 * Closes files associated with this index.
 * Also saves any new deletions to disk.
 * No other methods should be called after this has been called.
 * @throws IOException if there is a low-level IO error
 */
@Override
public final synchronized void close() throws IOException {
 if (!closed) {
 decRef();
 closed = true;
 }
}

/** Implements close. */
protected abstract void doClose() throws IOException;

/**
 * Expert: Returns the root {@link IndexReaderContext} for this
 * {@link IndexReader}'s sub-reader tree.
 * 
 * Iff this reader is composed of sub
 * readers, i.e. this reader being a composite reader, this method returns a
 * {@link CompositeReaderContext} holding the reader's direct children as well as a
 * view of the reader tree's atomic leaf contexts. All sub-
 * {@link IndexReaderContext} instances referenced from this readers top-level
 * context are private to this reader and are not shared with another context
 * tree. For example, IndexSearcher uses this API to drive searching by one
 * atomic leaf reader at a time. If this reader is not composed of child
 * readers, this method returns an {@link AtomicReaderContext}.
 * 
 * Note: Any of the sub-{@link CompositeReaderContext} instances referenced
 * from this top-level context do not support {@link CompositeReaderContext#leaves()}.
 * Only the top-level context maintains the convenience leaf-view
 * for performance reasons.
 */
public abstract IndexReaderContext getContext();

/**
 * Returns the reader's leaves, or itself if this reader is atomic.
 * This is a convenience method calling {@code this.getContext().leaves()}.
 * @see IndexReaderContext#leaves()
 */
public final List<AtomicReaderContext> leaves() {
 return getContext().leaves();
}

/** Expert: Returns a key for this IndexReader, so FieldCache/CachingWrapperFilter can find
 * it again.
 * This key must not have equals()/hashCode() methods, so "equals" means "identical". */
public Object getCoreCacheKey() {
 // Don't call ensureOpen since FC calls this (to evict)
 // on close
 return this;
}

/** Expert: Returns a key for this IndexReader that also includes deletions,
 * so FieldCache/CachingWrapperFilter can find it again.
 * This key must not have equals()/hashCode() methods, so "equals" means "identical". */
public Object getCombinedCoreAndDeletesKey() {
 // Don't call ensureOpen since FC calls this (to evict)
 // on close
 return this;
}

/** Returns the number of documents containing the
 * <code>term</code>. This method returns 0 if the term or
 * field does not exists. This method does not take into
 * account deleted documents that have not yet been merged
 * away.
 * @see TermsEnum#docFreq()
 */
public abstract int docFreq(Term term) throws IOException;

/**
 * Returns the total number of occurrences of {@code term} across all
 * documents (the sum of the freq() for each doc that has this term). This
 * will be -1 if the codec doesn't support this measure. Note that, like other
 * term measures, this measure does not take deleted documents into account.
 */
public abstract long totalTermFreq(Term term) throws IOException;

/**
 * Returns the sum of {@link TermsEnum#docFreq()} for all terms in this field,
 * or -1 if this measure isn't stored by the codec. Note that, just like other
 * term measures, this measure does not take deleted documents into account.
 *
 * @see Terms#getSumDocFreq()
 */
public abstract long getSumDocFreq(String field) throws IOException;

/**
 * Returns the number of documents that have at least one term for this field,
 * or -1 if this measure isn't stored by the codec. Note that, just like other
 * term measures, this measure does not take deleted documents into account.
 *
 * @see Terms#getDocCount()
 */
public abstract int getDocCount(String field) throws IOException;

/**
 * Returns the sum of {@link TermsEnum#totalTermFreq} for all terms in this
 * field, or -1 if this measure isn't stored by the codec (or if this fields
 * omits term freq and positions). Note that, just like other term measures,
 * this measure does not take deleted documents into account.
 *
 * @see Terms#getSumTotalTermFreq()
 */
public abstract long getSumTotalTermFreq(String field) throws IOException;

}

아래는 Fields 소스코드 입니다.

:

[elasticsearch] internal mapper 알아보기.

Elastic/Elasticsearch 2014. 2. 24. 17:01

core type에 대해서 알아 봤으니 internal field에 해당하는 field를 알아보겠습니다.

관련 패키지는 아래와 같습니다.

package org.elasticsearch.index.mapper.internal;

lucene 관련 소스도 core type에서 봤던것과 동일 합니다.

하지만 여기서는 추가 설명을 해드리도록 하겠습니다.

package org.apache.lucene.document;

public class FieldType implements IndexableFieldType {

..........

}

▶ setIndexed(boolean)

index yes, no 설정 입니다.

▶ setTokenized(boolean)

index analyzed, not_analyzed 설정 입니다.

▶ setStored(boolean)

store yes, no 설정입니다.

▶ setStoreTermVectors(boolean)

term_vector 에서 yes, no 설정 입니다.

▶ setStoreTermVectorOffsets(boolean)

term_vector 에서 with_offsets 설정 입니다.

▶ setStoreTermVectorPositions(boolean)

term_vector 에서 with_positions 설정 입니다.

▶ setOmitNorms(boolean)

norms.enabled 에서 true, false 설정 입니다.

elasticsearch에서 이름을 변경했는데요. 반대라고 이해 하시면 됩니다.

이 값은 보시면 AbstractFieldMapper 에서 기본 false 입니다. 즉 norms.enabled:true 라는 이야기가 됩니다.

뭐 당연한 이야기 입니다.

elasticsearch 에서는 기본 analyzed로 설정하기 때문입니다.

▶ setIndexOptions(IndexOptions value)

index_options 설정에서 docs(문서번호), freqs(문서번호 + term빈도수), positions(freqs + positions), offsets(positions + offsets) 설정 입니다.

analyzed 설정 시 기본 positions로 설정 되며, not_analyzed 설정 시 기본 docs로 설정 됩니다.

[All field]

public static final boolean ENABLED = true;

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(true);
    FIELD_TYPE.freeze();
}

[Boost field]

public static final String NAME = "_boost";
public static final Float NULL_VALUE = null;

public static final FieldType FIELD_TYPE = new FieldType(NumberFieldMapper.Defaults.FIELD_TYPE);

static {
FIELD_TYPE.setIndexed(false);
FIELD_TYPE.setStored(false);
}

[Id field]

static {
    FIELD_TYPE.setIndexed(false);
    FIELD_TYPE.setStored(false);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();
}

public static final String PATH = null;

[Index field]

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.setStored(false);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();
}

public static final EnabledAttributeMapper ENABLED_STATE = EnabledAttributeMapper.DISABLED;

[Parent field]

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.setStored(true);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();
}

[Routing field]

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.setStored(true);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();
}

public static final boolean REQUIRED = false;
public static final String PATH = null;

[Size field]

public static final EnabledAttributeMapper ENABLED_STATE = EnabledAttributeMapper.DISABLED;

public static final FieldType SIZE_FIELD_TYPE = new FieldType(IntegerFieldMapper.Defaults.FIELD_TYPE);

static {
SIZE_FIELD_TYPE.freeze();
}

[Source field]

public static final boolean ENABLED = true;
public static final long COMPRESS_THRESHOLD = -1;
public static final String FORMAT = null; // default format is to use the one provided

static {
    FIELD_TYPE.setIndexed(false);
    FIELD_TYPE.setStored(true);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();
}

[Timestamp field]

public static final FieldType FIELD_TYPE = new FieldType(DateFieldMapper.Defaults.FIELD_TYPE);

static {
    FIELD_TYPE.setStored(false);
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.freeze();
}

public static final EnabledAttributeMapper ENABLED = EnabledAttributeMapper.DISABLED;
public static final String PATH = null;
public static final FormatDateTimeFormatter DATE_TIME_FORMATTER = Joda.forPattern(DEFAULT_DATE_TIME_FORMAT);

[Ttl field]

public static final FieldType TTL_FIELD_TYPE = new FieldType(LongFieldMapper.Defaults.FIELD_TYPE);

static {
    TTL_FIELD_TYPE.setStored(true);
    TTL_FIELD_TYPE.setIndexed(true);
    TTL_FIELD_TYPE.setTokenized(false);
    TTL_FIELD_TYPE.freeze();
}

public static final EnabledAttributeMapper ENABLED_STATE = EnabledAttributeMapper.DISABLED;
public static final long DEFAULT = -1;

[Type field]

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.setStored(false);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();
}

[Uid field]

public static final FieldType NESTED_FIELD_TYPE;

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.setStored(true);
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(FieldInfo.IndexOptions.DOCS_ONLY);
    FIELD_TYPE.freeze();

    NESTED_FIELD_TYPE = new FieldType(FIELD_TYPE);
    NESTED_FIELD_TYPE.setStored(false);
    NESTED_FIELD_TYPE.freeze();
}

:

[elasticsearch] Core Type Mapper 기본값 알아보기.

Elastic/Elasticsearch 2014. 2. 24. 16:27

elasticsearch의 document type property 설정 시 core type에 대한 기본값 구성이 어떻게 되는지 살펴 보기로 하겠습니다.

우선 elasticsearch 관련 패키지는 아래와 같습니다.

package org.elasticsearch.index.mapper.core

lucene 은 아래 class를 참고하세요.

org.apache.lucene.document.FieldType

그럼 이제 부터 core type mapper 정보를 살펴 보겠습니다.

기본적으로 모든 field는 AbstractFieldMapper 를 상속 받아서 사용합니다.

[AbstractFieldMapper type]

static {
    FIELD_TYPE.setIndexed(true);
    FIELD_TYPE.setTokenized(true);
    FIELD_TYPE.setStored(false);
    FIELD_TYPE.setStoreTermVectors(false);
    FIELD_TYPE.setOmitNorms(false);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
    FIELD_TYPE.freeze();
}

[Number type]

static {
 FIELD_TYPE.setTokenized(false);
 FIELD_TYPE.setOmitNorms(true);
 FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
 FIELD_TYPE.setStoreTermVectors(false);
 FIELD_TYPE.freeze();
}
public static final Explicit<Boolean> IGNORE_MALFORMED = new Explicit<Boolean>(false, false);
public static final Explicit<Boolean> COERCE = new Explicit<Boolean>(true, false);

[String type]

public static final String NULL_VALUE = null;
public static final int POSITION_OFFSET_GAP = 0;
public static final int IGNORE_ABOVE = -1;

[Binary type]

static {
FIELD_TYPE.setIndexed(false);
FIELD_TYPE.freeze();
}

[Boolean type]

static {
    FIELD_TYPE.setOmitNorms(true);
    FIELD_TYPE.setIndexOptions(IndexOptions.DOCS_ONLY);
    FIELD_TYPE.setTokenized(false);
    FIELD_TYPE.freeze();
}

public static final Boolean NULL_VALUE = null;

[Byte type]

static {
FIELD_TYPE.freeze();
}

public static final Byte NULL_VALUE = null;

[Completion type]

static {
FIELD_TYPE.setOmitNorms(true);
FIELD_TYPE.freeze();
}

public static final boolean DEFAULT_PRESERVE_SEPARATORS = true;
public static final boolean DEFAULT_POSITION_INCREMENTS = true;
public static final boolean DEFAULT_HAS_PAYLOADS = false;
public static final int DEFAULT_MAX_INPUT_LENGTH = 50;

[Date type]

static {
FIELD_TYPE.freeze();
}

public static final FormatDateTimeFormatter DATE_TIME_FORMATTER = Joda.forPattern("dateOptionalTime", Locale.ROOT);
public static final String NULL_VALUE = null;
public static final TimeUnit TIME_UNIT = TimeUnit.MILLISECONDS;
public static final boolean ROUND_CEIL = true;

[Double type]

static {
FIELD_TYPE.freeze();
}

public static final Double NULL_VALUE = null;

[Float type]

static {
FIELD_TYPE.freeze();
}

public static final Float NULL_VALUE = null;

[Integer type]

static {
FIELD_TYPE.freeze();
}

public static final Integer NULL_VALUE = null;

[Long type]

static {
FIELD_TYPE.freeze();
}

public static final Long NULL_VALUE = null;

[Short type]

static {
FIELD_TYPE.freeze();
}

public static final Short NULL_VALUE = null;

:

jjeong

'Elastic/Elasticsearch'에 해당되는 글 385건

[Elasticsearch] parent / child 살짝 알아보기. (like join)

[홍보] 실무 예제로 배우는 Elasticsearch 검색엔진 - 기본편 종이책...

[elasticsearch] percolator... 링크.

Install and Start Elasticsearch v1.0.1

Mapping and Data

Percolate

Distributed Percolation

Structure of a Percolator in v1.x

Percolate

[Elasticsearch] replica & shard 이해하기.

책 소개] 실무 예제로 배우는 Elasticsearch 검색엔진.(기본편)

Elasticsearch) The Definitive Guide...

[Elasticsearch] 관리도구 만들때 유용한 REST API

[lucene] IndexReader/Fields의 term 추출.

[elasticsearch] internal mapper 알아보기.

[elasticsearch] Core Type Mapper 기본값 알아보기.

티스토리툴바