solr 문서보고 무작정 따라하기.

ITWeb/서버관리 2012. 4. 18. 15:11

solr 설치 문서는 정말 잘 정리 되어 있습니다.
그냥 문서만 보고 따라 하시면 누구나 쉽게 설치해서 정상적인 화면을 확인 하실 수 있으니 한번 해보세요.
설치 링크 정보는 이전 글 참고 하시면 됩니다.

http://jjeong.tistory.com/639

제가 문서 보고 따라한거 그대로 스크랩 합니다.
아 그리고 저는 tomcat, jdk 모두 매뉴얼 설치로 테스트 했습니다.

[Solr Tutorial]

Solr Tutorial

Overview

This document covers the basics of running Solr using an example schema, and some sample data.

Requirements

To follow along with this tutorial, you will need...

Java 1.5 or greater. Some places you can get it are from Oracle, Open JDK, IBM, or
Running java -version at the command line should indicate a version number starting with 1.5. Gnu's GCJ is not supported and does not work with Solr.
A Solr release.

Getting Started

Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.

Begin by unziping the Solr release and changing your working directory to be the "example" directory. (Note that the base directory name may vary with the version of Solr downloaded.) For example, with a shell in UNIX, Cygwin, or MacOS:

user:~solr$ ls
solr-nightly.zip
user:~solr$ unzip -q solr-nightly.zip
user:~solr$ cd solr-nightly/example/

Solr can run in any Java Servlet Container of your choice, but to simplify this tutorial, the example index includes a small installation of Jetty.

To launch Jetty with the Solr WAR, and the example configs, just run the start.jar ...

user:~/solr/example$ java -jar start.jar
2012-03-27 17:11:29.529:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2012-03-27 17:11:29.696:INFO::jetty-6.1-SNAPSHOT
...
2012-03-27 17:11:32.343:INFO::Started SocketConnector@0.0.0.0:8983

This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr.

You can see that the Solr is running by loading http://localhost:8983/solr/admin/ in your web browser. This is the main starting point for Administering Solr.

Indexing Data

Your Solr server is up and running, but it doesn't contain any data. You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.

The exampledocs directory contains samples of the types of instructions Solr expects, as well as a java utility for posting them from the command line (a post.sh shell script is also available, but for this tutorial we'll use the cross-platform Java client).

To try this, open a new terminal window, enter the exampledocs directory, and run "java -jar post.jar" on some of the XML files in that directory, indicating the URL of the Solr server:

user:~/solr/example/exampledocs$ java -jar post.jar solr.xml monitor.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file solr.xml
SimplePostTool: POSTing file monitor.xml
SimplePostTool: COMMITting Solr index changes..

You have now indexed two documents in Solr, and committed these changes. You can now search for "solr" using the "Make a Query" interface on the Admin screen, and you should get one result. Clicking the "Search" button should take you to the following URL...

http://localhost:8983/solr/select/?q=solr&start=0&rows=10&indent=on

You can index all of the sample data, using the following command (assuming your command line shell supports the *.xml notation):

user:~/solr/example/exampledocs$ java -jar post.jar *.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file gb18030-example.xml
SimplePostTool: POSTing file hd.xml
SimplePostTool: POSTing file ipod_other.xml
SimplePostTool: POSTing file ipod_video.xml
SimplePostTool: POSTing file mem.xml
SimplePostTool: POSTing file money.xml
SimplePostTool: POSTing file monitor2.xml
SimplePostTool: POSTing file monitor.xml
SimplePostTool: POSTing file mp500.xml
SimplePostTool: POSTing file sd500.xml
SimplePostTool: POSTing file solr.xml
SimplePostTool: POSTing file utf8-example.xml
SimplePostTool: POSTing file vidcard.xml
SimplePostTool: COMMITting Solr index changes..

...and now you can search for all sorts of things using the default Solr Query Syntax (a superset of the Lucene query syntax)...

There are many other different ways to import your data into Solr... one can

Import records from a database using the Data Import Handler (DIH).
Load a CSV file (comma separated values), including those exported by Excel or MySQL.
POST JSON documents
Index binary documents such as Word and PDF with Solr Cell (ExtractingRequestHandler).
Use SolrJ for Java or other Solr clients to programatically create documents to send to Solr.

Updating Data

You may have noticed that even though the file solr.xml has now been POSTed to the server twice, you still only get 1 result when searching for "solr". This is because the example schema.xml specifies a "uniqueKey" field called "id". Whenever you POST instructions to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you. You can see that that has happened by looking at the values for numDocs and maxDoc in the "CORE"/searcher section of the statistics page...

http://localhost:8983/solr/admin/stats.jsp

numDocs represents the number of searchable documents in the index (and will be larger than the number of XML files since some files contained more than one <doc>). maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. You can re-post the sample XML files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old.

Go ahead and edit the existing XML files to change some of the data, and re-run the java -jar post.jar command, you'll see your changes reflected in subsequent searches.

Deleting Data

You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents (be careful with that one!). Since these commands are smaller, we will specify them right on the command line rather than reference an XML file.

Execute the following command to delete a document

java -Ddata=args -Dcommit=no -jar post.jar "<delete><id>SP2514N</id></delete>"

Now if you go to the statistics page and scroll down to the UPDATE_HANDLERS section and verify that "deletesById : 1"

If you search for id:SP2514N it will still be found, because index changes are not visible until changes are committed and a new searcher is opened. To cause this to happen, send a commit command to Solr (post.jar does this for you by default):

java -jar post.jar

Now re-execute the previous search and verify that no matching documents are found. Also revisit the statistics page and observe the changes in both the UPDATE_HANDLERS section and the CORE section.

Here is an example of using delete-by-query to delete anything with DDR in the name:

java -Ddata=args -jar post.jar "<delete><query>name:DDR</query></delete>"

Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end. There is also an optimize command that does the same thing as commit, in addition to merging all index segments into a single segment, making it faster to search and causing any deleted documents to be removed. All of the update commands are documented here.

To continue with the tutorial, re-add any documents you may have deleted by going to the exampledocs directory and executing

java -jar post.jar *.xml

Querying Data

Searches are done via HTTP GET on the select URL with the query string in the q parameter. You can pass a number of optional request parameters to the request handler to control what information is returned. For example, you can use the "fl" parameter to control what stored fields are returned, and if the relevancy score is returned:

q=video&fl=name,id (return only name and id fields)
q=video&fl=name,id,score (return relevancy score as well)
q=video&fl=*,score (return all stored fields, as well as relevancy score)
q=video&sort=price desc&fl=name,id,price (add sort specification: sort by price descending)
q=video&wt=json (return response in JSON format)

Solr provides a query form within the web admin interface that allows setting the various request parameters and is useful when testing or debugging queries.

Sorting

Solr provides a simple method to sort on one or more indexed fields. Use the "sort' parameter to specify "field direction" pairs, separated by commas if there's more than one sort field:

"score" can also be used as a field name when specifying a sort:

Complex functions may also be used to sort results:

q=video&sort=div(popularity,add(price,1)) desc

If no sort is specified, the default is score desc to return the matches having the highest relevancy.

Highlighting

Hit highlighting returns relevent snippets of each returned document, and highlights terms from the query within those context snippets.

The following example searches for video card and requests highlighting on the fields name,features. This causes a highlighting section to be added to the response with the words to highlight surrounded with <em> (for emphasis) tags.

...&q=video card&fl=name,id&hl=true&hl.fl=name,features

More request parameters related to controlling highlighting may be found here.

Faceted Search

Faceted search takes the documents matched by a query and generates counts for various properties or categories. Links are usually provided that allows users to "drill down" or refine their search results based on the returned categories.

The following example searches for all documents (*:*) and requests counts by the category field cat.

...&q=*:*&facet=true&facet.field=cat

Notice that although only the first 10 documents are returned in the results list, the facet counts generated are for the complete set of documents that match the query.

We can facet multiple ways at the same time. The following example adds a facet on the boolean inStock field:

...&q=*:*&facet=true&facet.field=cat&facet.field=inStock

Solr can also generate counts for arbitrary queries. The following example queries for ipod and shows prices below and above 100 by using range queries on the price field.

...&q=ipod&facet=true&facet.query=price:[0 TO 100]&facet.query=price:[100 TO *]

One can even facet by date ranges. This example requests counts for the manufacture date (manufacturedate_dt field) for each year between 2004 and 2010.

...&q=*:*&facet=true&facet.date=manufacturedate_dt&facet.date.start=2004-01-01T00:00:00Z&facet.date.end=2010-01-01T00:00:00Z&facet.date.gap=+1YEAR

More information on faceted search may be found on the faceting overview and faceting parameterspages.

Search UI

Solr includes an example search interface built with velocity templating that demonstrates many features, including searching, faceting, highlighting, autocomplete, and geospatial searching.

Try it out at http://localhost:8983/solr/browse

Text Analysis

Text fields are typically indexed by breaking the text into words and applying various transformations such as lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally applied to any queries in order to match what is indexed.

The schema defines the fields in the index and what type of analysis is applied to them. The current schema your server is using may be accessed via the [SCHEMA] link on the admin page.

The best analysis components (tokenization and filtering) for your textual content depends heavily on language. As you can see in the above [SCHEMA] link, the fields in the example schema are using a fieldType named text_general, which has defaults appropriate for all languages.

If you know your textual content is English, as is the case for the example documents in this tutorial, and you'd like to apply English-specific stemming and stop word removal, as well as split compound words, you can use the text_en_splitting fieldType instead. Go ahead and edit the schema.xml in thesolr/example/solr/conf directory, to use the text_en_splitting fieldType for the text and features fields like so:

   <field name="features" type="text_en_splitting" indexed="true" stored="true" multiValued="true"/>
   ...
   <field name="text" type="text_en_splitting" indexed="true" stored="false" multiValued="true"/>

Stop and restart Solr after making these changes and then re-post all of the example documents usingjava -jar post.jar *.xml. Now queries like the ones listed below will demonstrate English-specific transformations:

A search for power-shot can match PowerShot, and adata can match A-DATA by using theWordDelimiterFilter and LowerCaseFilter.
A search for features:recharging can match Rechargeable using the stemming features of PorterStemFilter.
A search for "1 gigabyte" can match 1GB, and the commonly misspelled pixima can matches Pixma using the SynonymFilter.

A full description of the analysis components, Analyzers, Tokenizers, and TokenFilters available for use is here.

Analysis Debugging

There is a handy analysis debugging page where you can see how a text value is broken down into words, and shows the resulting tokens after they pass through each filter in the chain.

This url shows how "Canon Power-Shot SD500" would shows the tokens that would be instead be created using the text_en_splitting type. Each row of the table shows the resulting tokens after having passed through the next TokenFilter in the analyzer. Notice how both powershot and power, shot are indexed. Tokens generated at the same position are shown in the same column, in this case shot and powershot. (Compare the previous output with The tokens produced using the text_general field type.)

Selecting verbose output will show more details, such as the name of each analyzer component in the chain, token positions, and the start and end positions of the token in the original text.

Selecting highlight matches when both index and query values are provided will take the resulting terms from the query value and highlight all matches in the index value analysis.

Other interesting examples:

English stemming and stop-words using the text_en field type
Half-width katakana normalization with bi-graming using the text_cjk field type
Japanese morphological decomposition with part-of-speech filtering using the text_ja field type
Arabic stop-words, normalization and stemming using the text_ar field type

Conclusion

Congratulations! You successfully ran a small Solr instance, added some documents, and made changes to the index and schema. You learned about queries, text analysis, and the Solr admin interface. You're ready to start using Solr on your own project! Continue on with the following steps:

Subscribe to the Solr mailing lists!
Make a copy of the Solr example directory as a template for your project.
Customize the schema and other config in solr/conf/ to meet your needs.

Solr has a ton of other features that we haven't touched on here, including distributed search to handle huge document collections, function queries, numeric field statistics, and search results clustering. Explore the Solr Wiki to find more details about Solr's many features.

Have Fun, and we'll see you on the Solr mailing lists!

[Installing Solr on Ubuntu Linux]

Installing Solr on Ubuntu Linux

Following are instructions for installing the Solr search server on Ubuntu linux. There are several manual steps in setting up Solr, and most of the other documents I came across on the internet are inadequate in some (or in many) ways so I enlisted the help of colleagues and documented the steps start-to-finish here.

I found Solr not to my liking, encountering significant scaling issues while indexing beyond 4-5 million small documents and so I've abandoned this application in favor of more standard/robust solutions with a far larger community (e.g. mySQL) and more ubiquitous technology with long evolutionary histories (RDBMS) behind them. The problem of indexing XML documents is best solved by avoidance. Digitally born data should exist in normalized and relational states from the get-go.

These instructions have been tested with Hardy Heron 8.04, and will likely work with other recent versions of Ubuntu and Debian-based distros with little or no modification.

Before You Start
Solr can be setup several ways -- these instructions lead up to a Solr environment deployed in Tomcat, with separate development and production areas. Once you've done this a couple times (or carefully read this document a few times), you could set up three environments, just one, or whatever layout suits your needs. There are hardcoded pathing dependencies of which you need to be aware.

(1) Download and install the latest JDK from Sun.

You'll want to get the latest Java JDK from Sun http://java.sun.com/javase/downloads/index.jsp and install it first. At the time these instructions were written, I had installed Sun's jdk1.6.0_10. I'm unsure if it's required, but I also made sure that "ant" was installed on my Ubuntu box (for ant, I simply used Ubuntu's handy package installer Synaptic).

I downloaded the Sun JDK to my user home directory and chmod +x'd the .bin exectuable. I sudo'd to root and executed the file. It made me scroll through the license agreement and decompressed itself. I then mv'd it to /opt/jdk1.6.0_10.

Java needs at least two environment settings in order to be useful. You'll eventually need to set up CLASSPATH as well, but that's not essential for the instructions in this document. I made the following .bashrc additions to both my ordinary user account (/home/{username}/.bashrc), as well as for the root account (/root/.bashrc). Go into each .bashrc file and add the following (which may be slightly different if you chose a different location or have a different version of the JDK):

export PATH=/opt/jdk1.6.0_10/bin:$PATH
export JAVA_HOME=/opt/jdk1.6.0_10

Whenever you make changes to .bashrc you should issue a "source .bashrc" to instruct the shell to re-read the file (otherwise you'd have to logout, and then log back in). You should now be able to type "which java" and see something like this: /opt/jdk1.6.0_10/bin/java, depending on the version you downloaded.

(2) Download and install the latest Tomcat.

Rather than lean on the Tomcat 5.5 version which was part of the Ubuntu repositories at the time of this writing, I downloaded the latest Tomcat: http://tomcat.apache.org. I brought it down to my user directory, decompressing it via gunzip and "tar xvf". It creates a Tomcat directory, populated with everything it needs.

As you use Tomcat over the lifespan of your project/development you may want a more succinct name than something like "apache-tomcat-6.0.16" so I decided to rename (mv) this directory to simply "tomcat6". The instructions which follow in this document will use that abbreviated "tomcat6" convention.

I then did this:

sudo su
mv tomcat6 /usr/local/

You can move it somewhere else -- I picked this location because a colleague who led me through most of these steps put it in that location on his box and I decided to remain consistent with his setup. Maybe you want it in /usr/share/ or somewhere else. Before going further, you should test Tomcat. At this stage, I'm still sudo'd as root.

cd /usr/local/tomcat6/bin
./startup.sh

You should see a message like this:

Using CATALINA_BASE:   /usr/local/tomcat6
Using CATALINA_HOME:   /usr/local/tomcat6
Using CATALINA_TMPDIR: /usr/local/tomcat6/temp
Using JRE_HOME:       /opt/jdk1.6.0_10

(Note that JRE_HOME is the location of the Sun JDK installed in an earlier step. You really need this -- if Tomcat is aimed at a JRE that you don't want, or can't find it, you can't go any further.) Eventually you'll probably want to create a Tomcat specific user, and give it appropriate/minimal rights, instead of using root.

Go to your browser and type this:

http://localhost:8080/

Go to Tomcat servlet examples and click a couple of them, click a couple jsp examples also. They should execute without complaining. At this stage we've installed the latest JDK, the latest Tomcat, and things are talking to one another. If you're getting something wildly different, you can't go any further here. In order to complete this document, it should be "all systems go" at this point.

Before going further, you should shut Tomcat back down:

cd /usr/local/tomcat6/bin
./shutdown.sh

(3) Download and install Solr

I downloaded the latest Solr here: http://www.apache.org/dyn/closer.cgi/lucene/solr/. As with Tomcat, I issued gunzip and "tar xvf" to decompress it to my home user directory. It creates a directory called "apache-solr-1.2.0".

We need to manually create some directories within /usr/local/tomcat6. This setup will yield us two Solr locations within your Tomcat instance: one for development, another for production. There are other ways to set up Solr, but if this is your first attempt you may want to follow this convention. It's unclear why /Catalina and /Catalina/localhost aren't created automatically with a Tomcat install. Probably just to keep our salaries up. The /data/solr directory, as you can see, will have an identical structure below it for dev and prod. Each of those directories additionally has corresponding /conf and /data directories below it.

Make these directories:

/usr/local/tomcat6/conf/Catalina
/usr/local/tomcat6/conf/Catalina/localhost
/usr/local/tomcat6/data
/usr/local/tomcat6/data/solr
/usr/local/tomcat6/data/solr/dev
/usr/local/tomcat6/data/solr/dev/conf
/usr/local/tomcat6/data/solr/dev/data
/usr/local/tomcat6/data/solr/prod
/usr/local/tomcat6/data/solr/prod/conf
/usr/local/tomcat6/data/solr/prod/data

Now we should copy the solr "war" file into position for deployment. Go to the directory where you decompressed solr in an earlier step, and go into the dist subdirectory. For instance: apache-solr-1.2.0/dist.

cp apache-solr-1.2.0.war /usr/local/tomcat6/data/solr

Now, in /usr/local/tomcat6/conf/Catalina/localhost we need to create and save two files which will be read the next time you start Tomcat, and (hopefully) properly deploy Solr. Use a text editor of your choice and create these two files in the /Catalina/localhost subdirectory.

cd /usr/local/tomcat6/conf/Catalina/localhost

solrdev.xml

solrprod.xml

There are some sample configuration files which come with the Solr distribution you downloaded. Let's copy those into their proper position. Go to the working directory where you downloaded solr, and into the /example/solr/conf subdirectory: /apache-solr-1.2.0/example/solr/conf. You should see something like this:

admin-extra.html  schema.xml    solrconfig.xml  synonyms.txt
protwords.txt     scripts.conf  stopwords.txt   xslt

Copy everything here to your development solr configuration directory:

cp -R * /usr/local/tomcat6/data/solr/dev/conf

Do the same for your production location also:

cp -R * /usr/local/tomcat6/data/solr/prod/conf

Time to test. Everything should now be in place. Sacrifice a chicken and restart Tomcat: