[Elasticsearch] This Week in Elasticsearch and Apache Lucene - 2016-06-27
Elastic/Elasticsearch 2016. 6. 28. 09:53몇 가지 눈에 들어 오는게 있어서 scrap 합니다.
[원문]
https://www.elastic.co/blog/this-week-in-elasticsearch-and-apache-lucene-2016-06-27
[요점]
- low-level Java REST client has landed.
별도의 http client 를 이용해서 만들지 않고 es 에서 제공하는거 사용하면 될 것 같습니다.
- index.store.preload
warmmer 기능이 이걸로 대체 되는 것 같습니다.
- no longer turns red when creating an index
순간 red 나올 때가 있었는데 false alarm 이 줄어 들겠내요.
- default similarity is now BM25
TF/IDF 에서 BM25로 넘어 가는 군요.
- wait for status yellow
yellow 도 간혹 발생을 하는데 앞으로 status 에 대해서 다시 점검을 해야 겠내요.
Elasticsearch Core
Changes in 2.x:
- The .scripts index now obeys the number_of_shards setting.
- Deprecation logging for `_timestamp` and `_ttl`.
- Failed synced flushes were reporting an incorrect number of failures.
- The index-exists request shouldn't fail if the index is being recovered.
- A valid translog file can be deleted incorrectly after a disk full exception and multiple attempts to recover.
Changes in master:
- The low-level Java REST client has landed. It is functionally equivalent to the REST clients available in other languages.
- The `index.store.preload` setting can preload the specified Lucene files (eg doc values, norms) into MMAP before a segment comes online. This completes the replacement of warmers.
- The cluster health no longer turns red when creating an index, unless there is a problem assigning shards.
- The default similarity is now BM25.
- The `_timestamp` and `_ttl` fields will not be supported on indices created in 5.x.
- The `fields` parameter has been removed in favour of `stored_fields`, `docvalue_fields` and (for `text` fields only)`fielddata_fields`.
- Some percolator queries don't need in-memory validation to ensure that they match.
- Painless now has capturing lambdas, supports adding static methods like `each` to whitelisted classes, has syntax for initialising arrays, lists and maps,
- Nested inner hits no longer return _index, _type, and _id, and parent/child inner hits doesn't return _index.
- `string` fields weren't upgraded to `text`/`keyword` if `include_in_all` was specified.
- Getting a task with wait_for_completion will return the task result.
- Nodes info returns the calculated size of the total indexing buffer.
- Analysis factories are now MultiTermAware, which will help to remove the lowercase_expanded_terms from the query string query, and to support keyword analyzers on the `keyword` field.
- JNA is now a required dependency.
- Guice has been removed from the script service,
Ongoing changes:
- Sequence number checkpoints are persisted to disk when a segment is flushed.
- Reindex-from-remote now uses the Java REST client.
- Ensure that primary handover while indexing does not cause a dead lock.
- The index file which lists the snapshots in a repository should be written atomically.
- The `discovery-azure` plugin doesn't work with the security manager.
- It shouldn't be necessary to wait for status yellow before working with a newly created index.
- Add helpers to make JSON easier to render in Mustache.
- The SynonymQuery should be used for alternative terms, instead of the Bool query.
- More time zone edge case bug fixes.
- Changes to shard store fetching are required in order to allow for inline rerouting during node join.
- Analysis components should implement AnalysisPlugin instead of calling registerTokenizer, allowing Guice to be removed from Hunspell.
Apache Lucene
- 5.5.2 RC2 release vote is underway
- A tricky randomized
explain
test failure turns out to be a test bug in a recently added test case Math.toRadians
and Math.toDegrees are now banned, since their implementation changes slightly across java versions, impacting our geo testsRandomAccessFilterStrategy
comes back to life for faster filter intersection in some cases- Multi term queries that match no terms rewrite to
MatchNoDocsQuery
instead of an emptyBooleanQuery
, making it much simpler to add a helpful reason toMatchNoDocsQuery
- The new Ukrainian lemmatizer uses
MorfologikFilter
with a custom dictionary for efficient dictionary-based Ukrainian analysis - Lucene's confusing and bushy
IndexReader
hierarchy strikes again RAMDirectory
now also enforces write-once files, andMockDirectoryWrapper
now tries harder to corrupt unsync'd index files on closeGeoPoint
gets some code cleanups- Eclipse now also fails on unused imports
- Auto-prefix terms have been removed since dimensional points is better
CompressionTools
has been removedForbiddenAPIs
is upgraded to version 2.2- It's important to fsync files after copying them via Lucene's
Directory
!
- A tricky test failure was holding up the 5.5.2 release process
- Some minor code improvements to
SearchGroup
- Can we improve the default behavior of query parsers and multi-term queries?
- A test bug in
MoreLikeThisTest
still remains tricky to fix MoreLikeThis
should not invoketoString
on aField
objectScandinavianFoldingFilterFactory
andScandinavianNormalizationFilterFactory
are safe for multi-term queries- In the possibly not-rare case where many document share the same point value, we can better compress the
docIDs
- The ancient query norm and coord blocks progress and should be removed
- Should we add a light weight Ukrainian stemmer?
- Updating doc values and then using delete-by-query with a doc values query doesn't always work, but fixing it is likely not feasible