sql server - T-SQL Get Rows With Similar Company Name Using Levenshtein -


i'm using levenshtein function t-sql works (i'm not worried performance). want write query returns rows levenshtein distance less x (where x might 5 example) using company name field comparison.

i've tried following, returns thousands of duplicate rows.

select * contacts c1, contacts c2 dbo.ufncomparestring(c1.company, c2.company) < 5 , c1.id <> c2.id 

i show list this:

1   apple experts 20  apple experts inc. 240 h&k paving 21  h , k paving 98  hk paving 189 h.k. paving 5   j.m. lawn care 105 jm lawn care 

is possible this? doing wrong?

edit

i ended query looks this. found there "invalid" entries causing problems having:

select c1.contactid, c1.company, c1.lastname, c1.firstname,    c2.contactid, c2.company, c2.lastname, c2.firstname contacts c1, contacts c2 cast(c1.contactid int) < cast(c2.contactid int)    , c1.company not null     , replace(c1.company, ' ', '') <> ''    , c2.company not null    , replace(c2.company, ' ', '') <> ''    , len(c1.company) > 6    , len(c2.company) > 6    , dbo.ufncomparestring(c1.company, c2.company) < 5 

note query pretty slow running (on 12,000 records) , have different query more effective. goal find duplicate companies had been entered using different company names , query returned many false positives. query used, it's complicated show here , outside scope of question.

to reduce duplicates, use instead:

select * contacts c1, contacts c2 dbo.ufncomparestring(c1.company, c2.company) < 5 , c1.id < c2.id 

it returns unique pairs of contacts, distance less 5.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -