ssis - Algorithm used by Fuzzy Group -
the company work purchased data cleansing , matching software cleanse , match information every night. takes fifteen hours run.
i have discovered fuzzy group/fuzzy lookup component in ssis, extremely fast in experience comparison. have questions:
what algorithms these components use? have read articles suggest use: soundex, variations of soundex, qgrams , levenstein distance or combination of four. there documentation, explicitly specified algorithm use?
this page microsoft research describes these @ high level http://research.microsoft.com/en-us/projects/datacleaning/
i think 2nd-last link has full description: http://research.microsoft.com/pubs/75996/bm_sigmod03.pdf
it's way on head, reads rolled own algo.
Comments
Post a Comment