mysql - Search in N to N relation in SQL database -


this database model:

enter image description here

what need:

i need input several terms , search documents (document.text) has of these terms.

example data:

documents:

id:1  text:dog cat train  id:2  text:dog cat train car  id:3  text:dog cat  id:4  text:dog 

terms:

id:1 term:dog  id:2 term:cat  id:3 term:train  id:4 term:car 

example:

i want search document has terms: dog cat train. result document 1 , document 2 not document 3 because not have train , not document 4 because not have cat or train.

my first attempt query this:

select document.text document join document_has_term on         document.iddocument=document_has_term.document_iddocument join term on  term.idterm=document_has_term.term_idterm term="kindness" ,  term="horrible" 

this query not select posts reflects want.

supposing every document can have term once, if run

select document_iddocument     document_has_term     join term on (term_idterm = idterm)     term in ('cat', 'dog', 'train'); 

you have 3 rows each of 3 terms matched, 2 rows if 2 terms match, , on.

so

select document_iddocument     document_has_term     join term on (term_idterm = idterm)     term in ('cat', 'dog', 'train') group document_iddocument     having count(document_iddocument) = 3; 

will output document ids 3 matches.

this query not need access document @ stage.

you can use sub-select document's iddocument in list of ids:

select document.text document iddocument in ( above select ); 

more complex query

if want run more complex searches, maybe ought text search mysql , use fulltext capabilities.

otherwise, need build query starting "outer" language specify things like

cat , not dog 

which not sql, , translate sql query.

an efficient approach try , determine, complex query such "cat not dog", component limiting. in example if have 2000 records, cat present in 1 hundred records, , dog present in fifty, need consider: - queries searching presence of term efficient. - queries searching absence of term expensive.

and first run query cat, remove items do contain dog.

this approach quite complex.

another possibility, not recommended large databases, scan whole document_has_term table , status of documents:

select document_iddocument,     sum(if(term = 'cat', 1, 0)) has_0,     sum(if(term = 'dog', 1, 0)) has_1     document_has_term     left join term on (term_idterm = idterm , term.term in          ('cat', 'dog')) group document_iddocument; 

this query build in external language , made of 3 parts: template

select document_iddocument,     <other_fields>     document_has_term     left join term on (term_idterm = idterm , term.term in          <term_list> group document_iddocument; 

which fixed; list of fields (one each term); list of terms. longer query, longer lists, , cost increases linearly.

now have translate "textual query" series of "it's there/it isn't":

cat , not dog 

becomes

(has_0) , not (has_1) 

actually can integrate having clause, , build query this:

select document.* document iddocument in (  select document_iddocument     document_has_term     left join term on (term_idterm = idterm , term.term in      ('cat', 'dog') -- list of terms used     ) group document_iddocument      having     (sum(if(term = 'cat', 1, 0))!=0) -- term "cat"     , not                          -- "textual query"     (sum(if(term = 'dog', 1, 0))!=0) -- term "dog" ); 

here can experiment little fiddle.

as long use sqlish syntax textual query, if careful against sql injections, bob uncle. if aren't careful sanitizing input (only allowing valid terms , keywords 'and', 'or', 'not' , parentheses, , using prepared queries ? placeholders terms), bobby may daddy...


Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -