Elasticsearch "pattern_replace", replacing whitespaces while analyzing -


basically want remove whitespaces , tokenize whole string single token. (i use ngram on top of later on.)

this index settings:

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove"         ],         "type": "custom",         "tokenizer": "standard"       }     }   } } 

instead of "pattern": " ", tried "pattern": "\\u0020" , \\s , too.

but when analyze text "beleza na web", still creates 3 separate tokens: "beleza", "na" , "web", instead of 1 single "belezanaweb".

the analyzer analyzes string tokenizing first applying series of token filters. have specified tokenizer standard means input tokenized using standard tokenizer created tokens separately. pattern replace filter applied tokens.

use keyword tokenizer instead of standard tokenizer. rest of mapping fine. can change mapping below

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove",           "ngram"         ],         "type": "custom",         "tokenizer": "keyword"       }     }   } } 

Comments

Popular posts from this blog

python - Installing PyDev in eclipse is failed -

PHP OOP-based login system -

c# - Nested Internal Class with Readonly Hashtable throws Null ref exception.. on assignment -