ssis - Algorithm used by Fuzzy Group -


the company work purchased data cleansing , matching software cleanse , match information every night. takes fifteen hours run.

i have discovered fuzzy group/fuzzy lookup component in ssis, extremely fast in experience comparison. have questions:

what algorithms these components use? have read articles suggest use: soundex, variations of soundex, qgrams , levenstein distance or combination of four. there documentation, explicitly specified algorithm use?

this page microsoft research describes these @ high level http://research.microsoft.com/en-us/projects/datacleaning/

i think 2nd-last link has full description: http://research.microsoft.com/pubs/75996/bm_sigmod03.pdf

it's way on head, reads rolled own algo.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -