Elasticsearch "pattern_replace", replacing whitespaces while analyzing -

September 15, 2011

basically want remove whitespaces , tokenize whole string single token. (i use ngram on top of later on.)

this index settings:

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove"         ],         "type": "custom",         "tokenizer": "standard"       }     }   } }

instead of "pattern": " ", tried "pattern": "\\u0020" , \\s , too.

but when analyze text "beleza na web", still creates 3 separate tokens: "beleza", "na" , "web", instead of 1 single "belezanaweb".

the analyzer analyzes string tokenizing first applying series of token filters. have specified tokenizer standard means input tokenized using standard tokenizer created tokens separately. pattern replace filter applied tokens.

use keyword tokenizer instead of standard tokenizer. rest of mapping fine. can change mapping below

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove",           "ngram"         ],         "type": "custom",         "tokenizer": "keyword"       }     }   } }

Search This Blog

UV code

Elasticsearch "pattern_replace", replacing whitespaces while analyzing -

Comments

Post a Comment

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -