indexing - Lucene acting up on OrientDB when confronted with fuzzy queries -
i have indexed property on orientdb using lucene's keyword analyzer:
create index snippet.ssdeep on snippet (ssdeep) fulltext engine lucene metadata {"analyzer":"org.apache.lucene.analysis.core.keywordanalyzer"} the filed contains simhashes have indexed test.
now when search using lucene, response exact queries, not fuzzy queries (despite escaping query text).
for instance, given field value "192:d4e1gdzyduzrw9afcb+a66ancczmx9n2p:2e1gw18a66ac/yp", following query yields 1 record:
select snippet ssdeep lucene "192\\:d4e1gdzyduzrw9afcb\\+a66ancczmx9n2p\\:2e1gw18a66ac\\/yp" while query yields no records:
select snippet ssdeep lucene "192\\:d4e1gdzyduzrw9afcb\\+a66ancczmx9n2p\\:2e1gw18a66ac\\/yp~0.9" i wonder preventing lucene finding approximative results? more particularly lucene (or keywordanalyzer) not apt in fuzzy searching such strings, or interface between lucene , orientdb @ cause?
i.e. have other full text lucene indexes on same database work, fields contain ordinary text , analyzed using simple or standard analyzers. field need full text index on, , fails work.
the problem letter case. standardanalyzer, simpleanalyzer, , englishanalyzer lowercase text before indexing terms. keywordanalyzer doesn't.
since wildcard, fuzzy, , other expanded, multi-term queries aren't analyzed, queryparser, default, lowercases these types of query.
i don't know orientdb exposes of lucene allow effectively, 2 best solutions in lucene are:
disable queryparser lowercasing these types of queries:
queryparser.setlowercaseexpandedterms(false);use custom analyzer combines keywordtokenizer lowercasefilter:
public class lowercasekeywordanalyzer extends analyzer { @override protected tokenstreamcomponents createcomponents(string fieldname) { tokenizer source = new keywordtokenizer(); tokenstream filter = new lowercasefilter(source); return new tokenstreamcomponents(source, filter); } }
i know neither if nor how these exposed in orientdb, points in right direction.
Comments
Post a Comment