csv - Should I use Neo4j's Import Tool or Load Command to Insert Several Million Rows? -


i have several csv files range 25-100 mb in size. have created constraints, created indices, using periodic commit, , increased allocated memory in neo4j-wrapper.conf , neo4j.properties.

neo4j.properties:

neostore.nodestore.db.mapped_memory=50m neostore.relationshipstore.db.mapped_memory=500m neostore.propertystore.db.mapped_memory=100m neostore.propertystore.db.strings.mapped_memory=100m neostore.propertystore.db.arrays.mapped_memory=0m 

neo4j-wrapper.conf changes:

wrapper.java.initmemory=5000 wrapper.java.maxmemory=5000 

however load still taking long time, , considering using released import tool (http://neo4j.com/docs/milestone/import-tool.html). before switch it, wondering whether doing else improve speed of imports.

i begin creating several constraints make sure ids i'm using unique:

create constraint on (country) assert c.name unique; //and constraints other name identifiers well.. 

i use periodic commit...

using periodic commit 10000 

i load in csv ignore several fields

load csv headers "file:/path/to/file/myfile.csv" line line line.countryname not null , line.cityname not null , line.neighborhoodname not null 

i create necessary nodes data.

with line merge(country:country {name : line.countryname}) merge(city:city {name : line.cityname}) merge(neighborhood:neighborhood {      name : line.neighborhoodname,      size : toint(line.neighborhoodsize),      nickname : coalesce(line.neighborhoodnn, ""),      ... 50 other features     })  merge (city)-[:in]->(country) create (neighborhood)-[:in]->(city) //note each neighborhood appears once 

does make sense use create unique rather applying merge country reference? speed up?

a ~250,000-line csv file took on 12 hours complete, , seemed excessively slow. else can doing speed up? or make sense use annoying-looking import tool?

a couple of things. firstly, suggest reading mark needham's "avoiding eager" blog post:

http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

basically says should add profile start of each of queries see if of them use eager operator. if can cost performance-wise , should split queries separate merges

secondly, neighborhood merge contains lot of properties, , each time it's trying match on every single 1 of properties before deciding if should create or not. i'd suggest like:

merge (neighborhood:neighborhood {name: line.neighborhoodname}) on create set      neighborhood.size = toint(line.neighborhoodsize),      neighborhood.nickname = coalesce(line.neighborhoodnn, ""),      ... 50 other features     }) 

Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -