jjeong :: [Elasticsearch] This Week in Elasticsearch and Apache Lucene

[Elasticsearch] This Week in Elasticsearch and Apache Lucene - 2016-04-11

Elastic/Elasticsearch 2016. 4. 12. 09:59

봐야지 봐야지 하다 이제 봅니다.

제 눈에 띄는 것은

The `match`, `match_phrase`, and `match_phrase_prefix` queries are now separate queries, not just types of the `match` query.

The task manager response now tells you which tasks can be cancelled, and supports a `_cat/tasks` API.

Elasticsearch will no longer accept unquoted field names in JSON.

Now that we have removed the percolator API, we should also remove the percolator type and use percolator fieldsinstead.

예전에 분리 되어 있던걸 합치더니 다시 분리 하는 것 같습니다.

task cancelled 기능을 테스트 해봐야 할 것 같습니다.

이제 field name 작성시 주의해야 겠내요. 좀 더 strict 해졌다고 봐야겠죠. ^^

- 아래 코드가 true에서 false로 되었습니다. (이 기능이 성능이나 기타 다른 기능적인 오류를 만들어 내는 걸까요?)

jsonFactory.configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true);

percolator 기능이 fields 로 빠졌내요. 이것도 기능 확인을 해봐야 겠내요.

등록된 issue 를 보면 ㅎㅎ 직관적이고 사용이 좀 더 편해진것 같습니다.

core 2.x에 반영된 내용은 거의 v5.0.0 에 적용 될것 같습니다.

루씬은 일단 6.0.0 이 릴리즈 vote 중이였고 이미 4월 8일에 릴리즈 되었습니다. 이외 다른 내용들은 거의 geo point, locaiton 관련 내용들 입니다.

루씬 6.0.0 릴리즈 소식으로는

Java 8 is the minimum Java version required.
Dimensional points, replacing legacy numeric fields, provides fast and space-efficient support for both single- and multi-dimension range and shape filtering. This includes numeric (int, float, long, double), InetAddress, BigInteger and binary range filtering, as well as geo-spatial shape search over indexed 2D LatLonPoints. See this blog post for details. Dependent classes and modules (e.g., MemoryIndex, Spatial Strategies, Join module) have been refactored to use new point types.
Lucene classification module now works on Lucene Documents using a KNearestNeighborClassifier or SimpleNaiveBayesClassifier.
The spatial module no longer depends on third-party libraries. Previous spatial classes have been moved to a new spatial-extras module.
Spatial4j has been updated to a new 0.6 version hosted by locationtech.
TermsQuery performance boost by a more aggressive default query caching policy.
IndexSearcher's default Similarity is now changed to BM25Similarity.
Easier method of defining custom CharTokenizer instances.

원본링크)

https://www.elastic.co/blog/this-week-in-elasticsearch-and-apache-lucene-2016-04-11

Elasticsearch Core

Changes in 2.x:

Extended Stats could return the wrong result when some indices are missing a field.
Adding an object field with the same name as an existing field should fail.
Shadow replicas should be considered as having size zero.
CORS was broken for preflight requests.
Windows users can configure the Windows service name, description, and user.
Network addresses are now consistently displayed as the ip:port, instead of the hostname.

Changes in master:

Network partitions will no longer cause loss of in flight documents, and we have the test to prove it.
The `match`, `match_phrase`, and `match_phrase_prefix` queries are now separate queries, not just types of the `match` query.
The task manager response now tells you which tasks can be cancelled, and supports a `_cat/tasks` API.
Elasticsearch will no longer accept unquoted field names in JSON.
Elasticsearch now uses mmapfs for Lucene directories instead of a hybrid of niofs/mmapfs.
ParseField is now used to parse query names, which comes with deprecation logging for free.
Geo-points support ignore_malformed correctly again.
Moving averages threw an NPE when no window was specified.
MappedFieldType should be responsible for knowing about which formatter apply, rather than the agg framework.
The allocation-explain API now includes the configured allocation_delay and remaining_delays times.
Hot threads now fail hard if the JVM doesn't support them.
Queries now have a registry, and queries have gradually been migrated to use it.

Ongoing changes:

Bulk request sizes will be subject to a circuit breaker.
Deleted index tombstones are complicated.
ObjectParser should allow constructor args.
Should we enable http compression by default?
Numeric and date fields in 5.0 should use the new Lucene points API.
Now that we have removed the percolator API, we should also remove the percolator type and use percolator fieldsinstead.
Improvements to how we score the _all field based on per-field boosts.

Apache Lucene

The 6.0.0 release vote has passed and the bits were set free a few hours ago! Thank you Nick Knize for taking on the challenging role of release manager!
Many geo3d improvements this week:
- Polygon queries now accept Polygon... inputs, including random nested test polygons, matching our geo2d implementations and respecting the order of polygon vertices
- Geo3d seems to sometimes incorrectly think a polygon is concave when it's really convex
- Adjacent polygon points can now be coplanar
- The unique GeoPath support, which matches all point within X distance of a specified path (think road trip, looking for sushi nearby), now has a simple factory API as well
- Tests were not adequately testing the new simple factory methods for common shapes
- Geo3d now uses a similar encode/decode quantization approach as LatLonPoint
- After lively discussions, geo3d APIs no longer publicly expose classes and methods that could safely be private. APIs should start life private until proven worthy of being public!
Many geo2d improvements as well:
- LatLonPoint Polygon queries are faster using a cool pixelating grid approach, and we can do the same forGeoPointField
- We must improve debuggability of our geo test failures with nice 3D earth models like this example
- Here's a lively discussion about the pros and cons of having our geo tests quantize data only once
- Quantization issues are tricky, and geo2d queries were quantizing the edges of box queries incorrectly, resulting in false positive hits
- We have improved the geo2d tests to never allow "tolerance" on the returned results
- We have moved common geo encoding APIs to core so they can be shared across implementations
- Better random latitude/longitude generation for tests has exposed a tie-break bug in distance sorting, edge case bugs in box query, test bugs and polygon bugs
- Rectangle and Polygon classes have graduated into Lucene's core, to enable sharing across our numerous geo implementations
- A new encoding for GeoPointField will be consistent with LatLonPoint, and use all 64 available bits to minimize quantization error
- GeoPointField gets an efficient distance sort
- Randomized tests tried to create a too-big GeoPointDistanceQuery
- We will move BaseGeoPointTestCase from the spatial module to test-framework allowing us to remove the dependency of the sandbox module on spatial
- SloppyMath.haversin can now move to GeoUtils
The classification module now computes the f1-measure
A previously commented out test assertion comes half way back to life
Our "getting started with Lucene" docs were a bit buggy, but now fixed thanks to a user asking about it
We've upgraded our randomizedtesting dependency to 2.3.4, so we get better messages when there is a static leak in our tests
Points were missing from the codecs package documentation
The DataSplitter in Lucene's classification module should pay attention to classes when splitting
800+ new top-level-domains have been created since we last fixed StandardTokenizer to detect them, but we may need to wait for a JFlex release
Highlighting fails to find terms inside the child query of a BlockJoinQuery
Lucene doesn't have direct support for boolean subset matching, but a number of possible workarounds may work
Math.toRadians is changing its results slightly between Java 1.8 and 1.9
NRTCachingDirectory.listAll sometimes throws IllegalStateException
A scary random test failure is hopefully caused by bad hardware or buggy JVM
TestCoreParser gets some small improvements
A possibly new JVM bug causes JVM crash when decoding postings
JapaneseTokenizer should do a better job validating custom user-provided dictionaries
Another iteration for codec level encryption; this patch uses a new initialization vector for each data block, and seems not to impact search performance
Our release scripts still struggle with the switch from Subversion to git
Sometimes, BooleanQuery's explain method can lie about its score
Another user falls into the unfortunately common trap of thinking Lucene's stored fields store all information about a field

저작자표시 비영리 변경금지 (새창열림)

jjeong

[Elasticsearch] This Week in Elasticsearch and Apache Lucene - 2016-04-11

Elasticsearch Core

Apache Lucene

티스토리툴바