mysql - Search in N to N relation in SQL database -
this database model:

what need:
i need input several terms , search documents (document.text) has of these terms.
example data:
documents:
id:1 text:dog cat train id:2 text:dog cat train car id:3 text:dog cat id:4 text:dog terms:
id:1 term:dog id:2 term:cat id:3 term:train id:4 term:car example:
i want search document has terms: dog cat train. result document 1 , document 2 not document 3 because not have train , not document 4 because not have cat or train.
my first attempt query this:
select document.text document join document_has_term on document.iddocument=document_has_term.document_iddocument join term on term.idterm=document_has_term.term_idterm term="kindness" , term="horrible" this query not select posts reflects want.
supposing every document can have term once, if run
select document_iddocument document_has_term join term on (term_idterm = idterm) term in ('cat', 'dog', 'train'); you have 3 rows each of 3 terms matched, 2 rows if 2 terms match, , on.
so
select document_iddocument document_has_term join term on (term_idterm = idterm) term in ('cat', 'dog', 'train') group document_iddocument having count(document_iddocument) = 3; will output document ids 3 matches.
this query not need access document @ stage.
you can use sub-select document's iddocument in list of ids:
select document.text document iddocument in ( above select ); more complex query
if want run more complex searches, maybe ought text search mysql , use fulltext capabilities.
otherwise, need build query starting "outer" language specify things like
cat , not dog which not sql, , translate sql query.
an efficient approach try , determine, complex query such "cat not dog", component limiting. in example if have 2000 records, cat present in 1 hundred records, , dog present in fifty, need consider: - queries searching presence of term efficient. - queries searching absence of term expensive.
and first run query cat, remove items do contain dog.
this approach quite complex.
another possibility, not recommended large databases, scan whole document_has_term table , status of documents:
select document_iddocument, sum(if(term = 'cat', 1, 0)) has_0, sum(if(term = 'dog', 1, 0)) has_1 document_has_term left join term on (term_idterm = idterm , term.term in ('cat', 'dog')) group document_iddocument; this query build in external language , made of 3 parts: template
select document_iddocument, <other_fields> document_has_term left join term on (term_idterm = idterm , term.term in <term_list> group document_iddocument; which fixed; list of fields (one each term); list of terms. longer query, longer lists, , cost increases linearly.
now have translate "textual query" series of "it's there/it isn't":
cat , not dog becomes
(has_0) , not (has_1) actually can integrate having clause, , build query this:
select document.* document iddocument in ( select document_iddocument document_has_term left join term on (term_idterm = idterm , term.term in ('cat', 'dog') -- list of terms used ) group document_iddocument having (sum(if(term = 'cat', 1, 0))!=0) -- term "cat" , not -- "textual query" (sum(if(term = 'dog', 1, 0))!=0) -- term "dog" ); here can experiment little fiddle.
as long use sqlish syntax textual query, if careful against sql injections, bob uncle. if aren't careful sanitizing input (only allowing valid terms , keywords 'and', 'or', 'not' , parentheses, , using prepared queries ? placeholders terms), bobby may daddy...
Comments
Post a Comment