Neo4j: performance issue finding all paths between two nodes with CYPHER -
i'm using themoviedb database downloaded here. has ~60k nodes , ~100k relationships , need find paths of given length k between 2 nodes a , b given name property. let's need find path of lenght 2 between keanu reeves , laurence fishburne. used following cypher query:
match (k)-[e*2..2]-(l) k.name = "keanu reeves" , l.name = "laurence fishburne" return k,e,l and took 40 seconds.
i decided try different approach , used following query instead:
match (k)--(m)--(l) k.name = "keanu reeves" , l.name = "laurence fishburne" return k,m,l and took 252 milliseconds!
those 2 queries gave same results, had same meaning , yet first 1 took 200x more time. how possibile?
i need conduct tests in have find paths given maximum (but not minimum) length between 2 given nodes. gives me problems because cannot use second approach described (it works fixed lenght path) , first 1 waaaay slow.
i cannot use allshortestpath because doesn't return path length greater shorter one.
it's driving me crazy... idea how solve it?
edit
another example of how big issue is: finding path of lenght 4 between robert downey jr. , harrison ford. method #2: ~500 milliseconds method #1: >360 seconds (after 6 minutes brutally unplugged pc power adaptor)
the reason first query taking long because not using indexes @ all; scanning entire database.
if change query include actor label in path matching improve query performance.
match (k)-[e*2..2]-(l) k.name = "keanu reeves" , l.name = "laurence fishburne" return k,e,l if reveal indexes executing :schema command in browser see indexes in place. can see first 1 on :actor(name); withing actor label name property indexed.
indexes on :actor(name) online on :director(name) online on :movie(title) online on :person(name) online on :user(login) online (for uniqueness constraint) constraints on (user:user) assert user.login unique if profile query
profile match (k)-[e*2..2]-(l) k.name = "keanu reeves" , l.name = "laurence fishburne" return k,e,l and profile 1 :actor label added abundantly clear why 2 perform differently.
profile match (k:actor)-[e*2..2]-(l:actor) k.name = "keanu reeves" , l.name = "laurence fishburne" return k,e,l i forgot add should profile second ( faster ) query:
profile match (k)--(m)--(l) k.name = "keanu reeves" , l.name = "laurence fishburne" return k,m,l you see query plans different. think adding asterisk relationship sends database engine down different optimization path.
good luck!
Comments
Post a Comment