Elasticsearch "pattern_replace", replacing whitespaces while analyzing -


basically want remove whitespaces , tokenize whole string single token. (i use ngram on top of later on.)

this index settings:

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove"         ],         "type": "custom",         "tokenizer": "standard"       }     }   } } 

instead of "pattern": " ", tried "pattern": "\\u0020" , \\s , too.

but when analyze text "beleza na web", still creates 3 separate tokens: "beleza", "na" , "web", instead of 1 single "belezanaweb".

the analyzer analyzes string tokenizing first applying series of token filters. have specified tokenizer standard means input tokenized using standard tokenizer created tokens separately. pattern replace filter applied tokens.

use keyword tokenizer instead of standard tokenizer. rest of mapping fine. can change mapping below

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove",           "ngram"         ],         "type": "custom",         "tokenizer": "keyword"       }     }   } } 

Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -