Elasticsearch "pattern_replace", replacing whitespaces while analyzing -


basically want remove whitespaces , tokenize whole string single token. (i use ngram on top of later on.)

this index settings:

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove"         ],         "type": "custom",         "tokenizer": "standard"       }     }   } } 

instead of "pattern": " ", tried "pattern": "\\u0020" , \\s , too.

but when analyze text "beleza na web", still creates 3 separate tokens: "beleza", "na" , "web", instead of 1 single "belezanaweb".

the analyzer analyzes string tokenizing first applying series of token filters. have specified tokenizer standard means input tokenized using standard tokenizer created tokens separately. pattern replace filter applied tokens.

use keyword tokenizer instead of standard tokenizer. rest of mapping fine. can change mapping below

"settings": {  "index": {   "analysis": {     "filter": {       "whitespace_remove": {         "type": "pattern_replace",         "pattern": " ",         "replacement": ""       }     },     "analyzer": {       "meliuz_analyzer": {         "filter": [           "lowercase",           "whitespace_remove",           "ngram"         ],         "type": "custom",         "tokenizer": "keyword"       }     }   } } 

Comments

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -