[검색일반][펌] payload...
Elastic/Elasticsearch 2013. 7. 15. 23:34http://hnagtech.wordpress.com/2013/04/19/using-payloads-with-solr-4-x/
There are already quite a few good blogs on what Lucene payloads are, how they can be used and developed, either using Lucene API or with Solr. I personally feel, the following two blogs are worth viewing to quick-start on the same.
With Solr 4.x, indexing fields with payloads is made all the more easier with some readily available factory objects. The recent Apache Solr’s sample “schema.xml” has some usage details.But the trick part with Solr 4.x is making payloads work at all, and the above information isn’t sufficient, thanks to the (ever-changing!) API changes with Lucene/ Solr every coming release. This is where, this blog tries to fill in.
There are 2 parts to the solution, and I will detail accordingly.
# 1 QueryParsing
Wrapping your specific query terms with ‘PayloadTermQuery’ object in your query parser’s parse() method wouldn’t work. Rather, you should also override SolrQueryParser.getFieldQuery() method, like in the sample below, to identify your payloaded terms.
@Override
protected Query getFieldQuery(String field, String queryText, boolean quoted) throws SyntaxError {
SchemaField sf = this.schema.getFieldOrNull(field);
if (sf != null && sf.getType().getTypeName().equalsIgnoreCase("payloads")) {
Term t = new Term(field, queryText);
Query q = new PayloadTermQuery(t, new MaxPayloadFunction(), false);
return q;
}
return super.getFieldQuery(field, queryText, quoted);
}
In the above sample, a field of type ‘payloads’ is considered a payloaded field (you could give a different name), and so the wrapping query is accordingly changed. Only if the above is done, your implementation of Similarity’s scorePayload() function would be invoked.
This information on overriding ‘getFieldQuery()’ is of course available in this wiki link, Payloads, however it is hidden somewhere, and a normal google search doesn’t return this link (Try testing!).
#2 Scoing using payloads
Talking about scorePayload(), the methods’s new signature in Lucene 4.1 is all the more confusing compared to what was available before.
@Override
public float scorePayload(int doc, int start, int end, BytesRef payload) {
if (payload != null) {
float x = PayloadHelper.decodeFloat(payload.bytes, payload.offset);
return x;
}
return 1.0F;
}
The payload is available as a ‘BytesRef’ instance (unlike a byte array as in previous Lucene versions), and the developer is challenged to find out what method to invoke on that object to get the payload score! Developers may be tempted to play with ‘utf8ToString()’ method but beware. That isn’t the solution. Just note that the member variable ‘bytes’, which is a byte array, is of public scope, and that exactly carries the score. IMHO, the previous idea of a ‘byte []‘ argument seemed much safer, and readable.
#3 Adding payloaded documents to index
Quite recently in the same article, I had written in this section that if we try to index payloaded documents as a collection using ‘add()’ or ‘addBeans’, then the payload value pertaining to the first document alone is considered, and the same value is taken as score for other documents in the collection. So, I had suggested to add documents one by one, and commit each time (as given below).
for (D doc : docsIterator) {
server.addBean(doc);
server.commit();
}
Unfortunately, it is a big misunderstanding among a few Lucene-using developers like me, and I saw some forums also discussing about this idea.
So, I have re-edited this section for the better!
There is no problem adding payloaded documents in bulk, but one has to be careful to include ‘payload.offset’ while implementing scorePayload() (as in section #2). Only then, the current document’s payload value would be considered correctly.
As mentioned in the previous section, the new signature of scorePayload() hasn’t been fun to understand, with lack of proper getter methods in BytesRef, leaving the developer’s understanding quite vulnerable. This situation would continue to exist till amends are made on the method signature or BytesRef API.