1. https://github.com/macluq/helloLucene에서 HelloLucene-mastsr.zip을
다운로드 받는다. 2. 압축을 해제 한다. 3. 마우스 오른쪽 버튼 -> import -> existing Maven
Projects 4. Maven install 5. /src/main/java/HelloLucene/HelloLucene.java
마우스 오른쪽 버튼 Run As > Java Application을 실행
lucene라이브러리를 사용하여 아래의 과정을 보여줌으로써 기본적인 개념을 잡을 수 있도록 도와줌 1. create the
index 2. query 3. search 4. display results
The options forindexing (Field.Index.*) control how the text in the field will be
made searchable via the inverted index. Here are the choices:
- Index.ANALYZED—Use the analyzer to breakthe field’s value into a stream of separate tokens and make each token searchable. This option is useful fornormal text fields (body, title, abstract, etc.).
- Index.NOT_ANALYZED—Do index the field, but don’t analyze the String value. Instead, treat the Field’s entire value as a single token and make that token searchable. This option is useful forfields that you’d like to search on but that shouldn’t be broken up, such as URLs, file system paths, dates, personal names, Social Security numbers, and telephone numbers. This option is especially useful forenabling “exact match” searching. We indexed the id field in listings 2.1and 2.3using thisoption.
- Index.ANALYZED_NO_NORMS—A variant of Index.ANALYZED that doesn’t store norms information in the index. Norms record index-time boost information in the index but can be memory consuming when you’re searching. Section 2.5.3describes norms in detail.
- Index.NOT_ANALYZED_NO_NORMS—Just like Index.NOT_ANALYZED, but also doesn’t store norms. This option is frequently used to save index space and memory usage during searching, because single-token fields don’t need the norms information unless they’re boosted.
- Index.NO—Don’t make thisfield’s value available forsearching.
Field options for storing fields
▷
The options forstored fields (Field.Store.*) determine whether the field’s exact value should be stored away so that you can later retrieve it during searching:
- Store.YES—Stores the value. When the value is stored, the original String in its entirety is recorded in the index and may be retrieved by an IndexReader. This option is useful forfields that you’d like to use when displaying the search results (such as a URL, title, or database primary key). Try not to store very large fields, ifindex size is a concern, as stored fields consume space in the index.
- Store.NO—Doesn’t store the value. This option is often used along with Index.ANALYZED to index a large text field that doesn’t need to be retrieved in its original form, such as bodies of web pages, or any other type of text document.
Field options for term vectors
▷
- TermVector.YES—Records the unique terms that occurred, and their counts, in each document, but doesn’t store any positions or offsets information
- TermVector.WITH_POSITIONS—Records the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets
- TermVector.WITH_OFFSETS—Records the unique terms and their counts, with the offsets (start and end character position) of each occurrence of every term, but no positions
- TermVector.WITH_POSITIONS_OFFSETS—Stores unique terms and their counts, along with positions and offsets
- TermVector.NO—Doesn’t store any term vector information
Note that you can’t index term vectors unless you’ve also turned on indexing forthe field. Stated more directly: ifIndex.NO is specified fora field, you must also specify TermVector.NO.
※ Store 옵션 데이터를 저장 할지에 대한 정의. 결국, 검색 후 화면에 출력을 할 것인지 말 것인지에 따라 정의.
Store.YES : 저장 함 Store.NO : 저장 안함 Store.COMPRESS : 압축 저장 함 (글 내용이 크거나, binary 파일)
※ Index 옵션 검색을 위한 색인을 할지에 대한 정의. 아래는 2.x 대 내용이니 패스, 4.0 을 보면 전부 deprecated 된 걸로 나오내요. 그래도 의미는 파악 하고 있음 좋겠죠.
Index.NO : 색인을 하지 않음 (검색 field 로 사용하지 않음) Index.TOKENIZED : 검색 가능 하도록 색인 함, analyzer 에 의한 tokenized 수행을 통해 색인을 함. Index.UN_TOKENIZED : 검색 가능 하도록 색인 함, 단 analyzer 에 의한 분석을 하지 않기 때문에 색인 속도가 빠름. (숫자나 분석이 필요 없는 경우) Index.NO_NORMS : 검색 가능 하도록 색임 함, 단 색인 속도가 매우 빨라야 할 경우 사용하며, analyzer 에 의한 분석을 수행 하지 않고, field length normalize 를 수행 하지 않음.
"body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,
우선 post.jar 를 분석해 보겠습니다. post.jar 를 풀어 보면 SimplePostTool.class 가 들어가 있습니다.
[SimplePostTool.java] - 이 파일은 package 내 dependency 가 없습니다. - 그냥 가져다가 사용을 하셔도 됩니다. - 저는 solr + tomcat 구성으로 해서 http://localhost:8080/solrdev/update 로 코드 상에 설정 값을 변경했습니다. - 그럼 색인할 데이터는 어디서 가져와??? - 보통은 DB 에 content 를 저장하고 있죠, DB 에 있는 데이터를 select 해 와서 solr 에서 요구하는 format 으로 파일을 생성 하시면 됩니다. xml 을 많이 사용하니 select 해 온 데이터를 xml 파일로 생성 하시면 됩니다. - 저는 그냥 java project 하나 생성해서 색인할 url 변경하고 SimplePostTool.java 를 다시 묶었습니다.
- 제가 실행시켜 본 화면 입니다. - 위에 보시면 Main-Class 어쩌구 에러 보이시죠.. - MANIFEST 파일을 만들어서 넣어 주시면 됩니다, 중요한건 보이시죠.. 제일 뒤에 개행을 꼭 해주셔야 합니다.
package org.apache.solr.util;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ByteArrayInputStream;
import java.io.OutputStream;
import java.io.UnsupportedEncodingException;
import java.util.Set;
import java.util.HashSet;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
/**
* A simple utility class for posting raw updates to a Solr server,
* has a main method so it can be run on the command line.
*
*/
public class SimplePostTool {
public static final String DEFAULT_POST_URL = "http://localhost:8983/solr/update";
public static final String VERSION_OF_THIS_TOOL = "1.4";
private static final String DEFAULT_COMMIT = "yes";
private static final String DEFAULT_OPTIMIZE = "no";
private static final String DEFAULT_OUT = "no";
public static final String DEFAULT_DATA_TYPE = "application/xml";
private static final String DATA_MODE_FILES = "files";
private static final String DATA_MODE_ARGS = "args";
private static final String DATA_MODE_STDIN = "stdin";
private static final String DEFAULT_DATA_MODE = DATA_MODE_FILES;
private static final Set<String> DATA_MODES = new HashSet<String>();