'Elastic' 카테고리의 글 목록 (49 Page)

Basic Indexing Options for Lucene.

Elastic/Elasticsearch 2013. 1. 7. 14:30

아래 내용은 lucene in action 에서 뽑아온 내용입니다.

Field options for indexing

▷
 
The options for indexing (Field.Index.*) control how the text in the field will be
made searchable via the inverted index. Here are the choices:
 
  - Index.ANALYZED—Use the analyzer to break the field’s value into a stream of separate tokens and make each token searchable. This option is useful for normal text fields (body, title, abstract, etc.).
  - Index.NOT_ANALYZED—Do index the field, but don’t analyze the String value. Instead, treat the Field’s entire value as a single token and make that token searchable. This option is useful for fields that you’d like to search on but that shouldn’t be broken up, such as URLs, file system paths, dates, personal names, Social Security numbers, and telephone numbers. This option is especially useful for enabling “exact match” searching. We indexed the id field in listings 2.1 and 2.3 using this option.
  - Index.ANALYZED_NO_NORMS—A variant of Index.ANALYZED that doesn’t store norms information in the index. Norms record index-time boost information in the index but can be memory consuming when you’re searching. Section 2.5.3 describes norms in detail.
  - Index.NOT_ANALYZED_NO_NORMS—Just like Index.NOT_ANALYZED, but also doesn’t store norms. This option is frequently used to save index space and memory usage during searching, because single-token fields don’t need the norms information unless they’re boosted.
  - Index.NO—Don’t make this field’s value available for searching.

Field options for storing fields

▷
 
The options for stored fields (Field.Store.*) determine whether the field’s exact value should be stored away so that you can later retrieve it during searching:
 
  - Store.YES—Stores the value. When the value is stored, the original String in its entirety is recorded in the index and may be retrieved by an IndexReader. This option is useful for fields that you’d like to use when displaying the search results (such as a URL, title, or database primary key). Try not to store very large fields, if index size is a concern, as stored fields consume space in the index.
  - Store.NO—Doesn’t store the value. This option is often used along with Index.ANALYZED to index a large text field that doesn’t need to be retrieved in its original form, such as bodies of web pages, or any other type of text document.

Field options for term vectors

▷
 
  - TermVector.YES—Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information 
  - TermVector.WITH_POSITIONS—Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets
  - TermVector.WITH_OFFSETS—Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions
  - TermVector.WITH_POSITIONS_OFFSETS—Stores unique terms and their counts, along with positions and offsets
  - TermVector.NO—Doesn’t store any term vector information
 
Note that you can’t index term vectors unless you’ve also turned on indexing for the field. Stated more directly: if Index.NO is specified for a field, you must also specify TermVector.NO.

Labels

:

elasticsearch topology 구성

Elastic/Elasticsearch 2013. 1. 3. 17:55

config/elasticsearch.yml 파일을 열어 보면 있는 내용입니다.
처음에 막 설치하기 문서에서는 테스트를 많이 못해 보고 잘 모르겠는 구성이 있었는데.. ㅎㅎ 이제 이해해서 다시 기술 합니다.

1. 검색 + 색인
node.master: true
node.data: true

2. 검색
node.master: false
node.data: false

3. 색인
node.master: false
node.data: true

4. 마스터 + 색인 deilivery
node.master: true
node.data: false

1, 2, 3번은 직관적으로 이해가 되지요.

1번은 해당 장비에서 검색과 색인을 직접 하는 구성이며,
2번은 검색 질의 request 를 받아서 data node 서버군으로 search 한 후 response 하는 구성이며,
3번은 data node 로 해당 서버에 직접 색인 파일을 생성하는 구성이 되겠습니다.

그럼 4번은요?????
그렇습니다.
클러스터링 설정을 했기 때문에 master 서버로만 사용이 되며 한마디로 색인 요청과 관리를 담당한다고 보시면 되겠죠.
즉 색인 request 를 받아서 data node 서버군으로 indexing 전달을 하는 구성입니다.

:

elasticsearch synonym 적용시 주의 사항.

Elastic/Elasticsearch 2012. 12. 24. 11:07

synonym 관련 글은 아래 참고 하시구요.
제가 적용 하면서 실수했던 내용을 공유 합니다.

1. cluster 구성을 했을 경우 synonym.txt 파일을 모든 서버에 생성을 해야 합니다.
: 당연한 이야기 인데 저는 클러스터 구성한걸 까맣게 잊고 서버 한대에만 적용해 놓고 왜 안되지 이러고 있었습니다. ㅡ.ㅡ;;

2. synonym.txt 파일 위치 지정
: elasticsearch.org 에도 있는데 문서를 제대로 안읽어 보고 파일을 어디에 놓아야 하는거야 하고 삽질을 했습니다.
: 기본 설치된 경로에서 config 폴더를 기준으로 상대경로로 인식 합니다. (analysis/synonym.txt)
: 소스 코드를 보면 full path 로 넣으셔도 됩니다. (소스를 보는 것도 좋은 에러 해결 방법 입니다.)
: Environment.java (org.elasticsearch.env 패키지 아래 있습니다.)

그리고 제가 적용한 방법은 solr 용으로 테스트 하였습니다.

:

검색 관련 짧은 기법 설명.

Elastic/Elasticsearch 2012. 12. 20. 10:16

그냥 facet search 보다 얻어 걸린 내용인데요.
스크랩용으로 ^^;

http://nandaro.tistory.com/
요기 가보시면 아래 내용 포함해서 더 좋은 글들이 많이 있습니다.

(가) 향상된 검색(Search) 방법

     - 전문검색(Full-Text Search)이 아닌 토픽(Topic)에 의한 의미검색(Semantic Search) 실행
        : 온톨로지를 이용한 추론(reasoning) 적용
     - 검색된 토픽의 분류(Classification)에 의한 그룹(Group)화된 검색 결과 표현
        : 온톨로지(Ontology)의 인스턴스(Instance)들로 표현된 검색결과 또는 인스턴스(Ontology Instance)의 자원(Occurrence)들로 표현된 결과
     - 토픽 검색을 위한 추천어(Suggest Topic) 제공
        : 검색을 위한 추천 키워드(주제어 자동완성 기능) 또는 검색 결과에서 연관성을 이용하여 네비게이션이 가능한 검색어 관련 추천어 제공
     - 패싯 분류(Facet Classification) 및 검색(Facet Search) 실행
        : 서로 다른 성질의 데이터중 서로 공통인 부분만으로 분류 또는 검색

(나) 네비게이션(Navigation) 방법

     - 단방향 계층적(Strict Hierarchy) 네비게이션을 회피(쌍방향 네비게이션)
     - 유용한 의미를 지닌 관련 콘텐츠로 링크(Link)
     - 그룹의 성격을 지닌 콘텐츠는 관련된 그룹으로 연결
     - 제작 과정을 간략화 : 수천개의 수작업 링크는 배제
     - 향상된 링크(연결) 관리 : 연결되지 않은(Broken) 링크 체크 및 배제

(다) 콘텐츠 통합(Information Integration)

     - 단일화된 뷰(View)에 표현되는 이형질의 콘텐츠를 통합하는 지식 허브(Hub)로서 토픽맵을 제작(분산된 지식도 통합 가능)
     - 사용자 사례 중심의 콘텐츠 표현
       : 온톨로지의 인스턴스와 관련되는 콘텐츠를 사용자 관점에서 표현(어플리케이션 통합도 가능)
     - 토픽맵 기반의 시맨틱웹 사이트에서 표현되는 통합된 정보의 제목 또는 콘텐츠를 사용자 관점에서 적절히 선택

:

Elasticsearch Query URI 예제 모음.

Elastic/Elasticsearch 2012. 12. 18. 22:39

- field search : http://localhost:9200/test/_search?q=msg:채팅&pretty=true

- multi field & sort & list search : http://localhost:9200/test/_search?q=msg:과장 AND rm_title:과장&sort=rm_ymdt:asc&from=0&size=10&pretty=true

- paging search & sort : http://localhost:9200/test/_search?source={"query":{"bool":{"must":[{"term":{"msg":"과장"}}],"must_not":[],"should":[]}},"from":0,"size":50,"sort":[{"rm_ymdt":"asc"}],"facets":{}}&pretty=true

- range search : http://localhost:9200/test/_search?source={"query":{"range":{"recv_ymdt":{"from":"20120820163946", "to":"20120911160444"}}}}&pretty=true

- http://localhost:9200/_plugin/head/ 이 페이지에서 structured query 를 통해 쿼리 생성이 가능 함.

:

Elasticsearch 동의어/유의어 사전 활용

Elastic/Elasticsearch 2012. 12. 18. 22:22

[title=Elasticsearch 동의어/유의어 설정]

- 색인 파일 생성 시 설정을 해줘야 함

- 기본 kr_analysis 가 적용되어 있어야 함

- 없을 경우 한국어 처리가 안됨

- synonym.txt 파일을 적절한 위치에 생성

- http://www.elasticsearch.org/guide/reference/index-modules/analysis/synonym-tokenfilter.html

[title=synonym.txt 샘플]

Solr synonyms

The following is a sample format of the file:

# blank lines and lines starting with pound are comments.

#Explicit mappings match any token sequence on the LHS of "=>"

#and replace with all alternatives on the RHS. These types of mappings

#ignore the expand parameter in the schema.

#Examples:

i-pod, i pod => ipod,

sea biscuit, sea biscit => seabiscuit

#Equivalent synonyms may be separated with commas and give

#no explicit mapping. In this case the mapping behavior will

#be taken from the expand parameter in the schema. This allows

#the same synonym file to be used in different synonym handling strategies.

#Examples:

ipod, i-pod, i pod

foozball , foosball

universe , cosmos

# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:

ipod, i-pod, i pod => ipod, i-pod, i pod

# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:

ipod, i-pod, i pod => ipod

#multiple synonym mapping entries are merged.

foo => foo bar

foo => baz

#is equivalent to

foo => foo bar, baz

[색인파일 생성 샘플코드 - synonym 적용]

curl -XPUT 'http://localhost:9200/test' -d '{

"settings" : {

"number_of_shards" : 5,

"number_of_replicas" : 1,

"index" : {

"analysis" : {

"analyzer" : {

"kr_analyzer" : {

"type" : "custom",

"tokenizer" : "kr_tokenizer",

"filter" : ["trim", "kr_filter", "kr_synonym"]

},

"kr_analyzer" : {

"type" : "custom",

"tokenizer" : "kr_tokenizer",

"filter" : ["trim", "kr_filter", "kr_synonym"]

}

},

"filter" : {

"kr_synonym" : {

"type" : "synonym",

"synonyms_path" : "analysis/synonym.txt"

}

}'

~~[title=색인파일 생성 샘플코드]~~

~~curl -XPUT 'http://10.101.254.223:9200/test' -d '{~~

~~"settings" : {~~

~~"number_of_shards" : 5,~~

~~"number_of_replicas" : 1~~

},

~~"index" : {~~

~~"analysis" : {~~

~~"analyzer" : {~~

~~"synonym" : {~~

~~"tokenizer" : "kr_analyzer",~~

~~"filter" : ["synonym"]~~

}

},

~~"filter" : {~~

~~"synonym" : {~~

~~"type" : "synonym",~~

~~"synonyms_path" : "/home/계정/apps/elasticsearch/plugins/analysis-korean/analysis/synonym.txt"~~

}

},

~~"mappings" : {~~

~~"docs" : {~~

~~"properties" : {~~

~~...................................(요기 부분은 다른 문서들 참고 하시면 됩니다.)~~

}

}'

:

elasticsearch uri 참고 링크

Elastic/Elasticsearch 2012. 12. 17. 10:00

색인 파일 저장위치 변경
http://www.elasticsearch.org/guide/reference/setup/configuration.html

검색 uri 형식 restful 형식
http://stackoverflow.com/questions/12195017/different-result-when-using-get-post-in-elastic-search

http://localhost:9200/_search&{"query":{"term":{"text":"john"}}}
http://localhost:9200/_search?source={"query":{"term":{"text":"john"}}}

date range search 형식
http://stackoverflow.com/questions/11351296/elastic-search-date-range-query
{
    "query" : {
        "range" : {
            "PublishTime" : {
                "from" : "20111201T000000",
                "to" : "20111203T235959"
            }
        }
    }
}

:

lucene 색인 옵션

Elastic/Elasticsearch 2012. 12. 10. 10:50

짧게 정리...

※ Store 옵션
데이터를 저장 할지에 대한 정의.
결국, 검색 후 화면에 출력을 할 것인지 말 것인지에 따라 정의.

Store.YES : 저장 함
Store.NO : 저장 안함
Store.COMPRESS : 압축 저장 함 (글 내용이 크거나, binary 파일)

※ Index 옵션
검색을 위한 색인을 할지에 대한 정의.
아래는 2.x 대 내용이니 패스, 4.0 을 보면 전부 deprecated 된 걸로 나오내요.
그래도 의미는 파악 하고 있음 좋겠죠.

Index.NO : 색인을 하지 않음 (검색 field 로 사용하지 않음)
Index.TOKENIZED : 검색 가능 하도록 색인 함, analyzer 에 의한 tokenized 수행을 통해 색인을 함.
Index.UN_TOKENIZED : 검색 가능 하도록 색인 함, 단 analyzer 에 의한 분석을 하지 않기 때문에 색인 속도가 빠름. (숫자나 분석이 필요 없는 경우)
Index.NO_NORMS : 검색 가능 하도록 색임 함, 단 색인 속도가 매우 빨라야 할 경우 사용하며, analyzer 에 의한 분석을 수행 하지 않고, field length normalize 를 수행 하지 않음.

http://lucene.apache.org/core/4_0_0/core/index.html

Enum Constant and Description
`ANALYZED` Deprecated. Index the tokens produced by running the field's value through an Analyzer.
`ANALYZED_NO_NORMS` Deprecated. Expert: Index the tokens produced by running the field's value through an Analyzer, and also separately disable the storing of norms.
`NO` Deprecated. Do not index the field value.
`NOT_ANALYZED` Deprecated. Index the field's value without using an Analyzer, so it can be searched.
`NOT_ANALYZED_NO_NORMS` Deprecated. Expert: Index the field's value without an Analyzer, and also disable the indexing of norms.

:

jjeong

'Elastic'에 해당되는 글 498건

Basic Indexing Options for Lucene.

elasticsearch topology 구성

elasticsearch plugin 설치하기

Reference URL

elasticsearch routing 적용하기