Neo4j Database – Part 05 – Data Loading Options

In the last blog I described the Cypher object notation used by the query language. In this blog I will describe how to manually create nodes and relationships and then how to do this more efficiently by importing data.

Creating Nodes and Relationships by Hand
Unique Constraints
Merge Command
Bulk Loading Data Options

Creating Nodes and Relationships by Hand

In the last blog I introduced the CREATE command which I issued one at a time. We can also create multiple nodes and relationships by comma separating the arguments. For relationships we use variables to isolate the two nodes.

CREATE (…), (…), (…), …

CREATE (a), (b), (a) –[…]-> (b), (a) –[…]-> (b), …

We need to be careful when creating nodes and relationships because especially when creating relationships, you may inadvertently create two additional nodes when really you were trying to match against two nodes in order to create the relationship.

If we create the following we get duplicate nodes.

CREATE (n:Person {name: “fred”}) – [:KNOWS] -> (m:Person {name: “barney”})

CREATE (n:Person {name: “fred”}) – [:FRIENDS] -> (m:Person {name: “barney”})

One way is to create the nodes first and then locate the nodes to create the relationships.

CREATE (n:Person {name: “fred”}), (m:Person {name: “barney”})

MATCH (n:Person {name: “fred”}), (m:Person {name: “barney”}) CREATE (n) – [:KNOWS] -> (m), (n) – [:FRIENDS] -> (m)

Even better, we can do it in one statement.

CREATE (n:Person {name: “fred”}), (m:Person {name: “barney”}), (n) – [:KNOWS] -> (m), (n) – [:FRIENDS] -> (m)

Unique Constraints

Unique constraints allow us to prevent duplicate nodes based on a nodes label and one or more properties. The following constraint is based on the name property for label Person.

CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE

You can view what constraints and indexes are in place using the :SCHEMA command. When creating a unique constraint, an index is created automatically.

Let’s try our commands from above which previously created duplicate nodes.

CREATE (n:Person {name: “fred”}) – [:KNOWS] -> (m:Person {name: “barney”})

CREATE (n:Person {name: “fred”}) – [:FRIENDS] -> (m:Person {name: “barney”})

Notice this time the second command failed.

Merge Command

In other query languages we have the notion of upsert (Update or Insert). If something does not exist, create it otherwise update it. In order to know if something already exists, we need to pass a primary key to the statement.

Cypher has the MERGE command that behaves slightly differently because there is no primary key. MERGE is like a combination of MATCH and CREATE and matching is done on full patterns. So the following will not match.

MERGE (a:Person {name: “Fred”})

MERGE (a:Person {name: “Fred”, dob: 20100101})

Unique constraints will help with the duplication issue but you will still see an error. Rather than tell you the way not to do it, let’s just tell you the way to do it. MERGE has the ON CREATE and ON MATCH keywords. This enables us to determine what to do when the node is first created and then what to do on subsequent updates. MERGE will match any nodes based on the pattern specified. If you create a unique constraint on the pattern, this then behaves like a primary key.

MERGE (a:Person {name: ‘fred’}) ON CREATE SET a.dob = 20100101, a.points = 0 ON MATCH SET a.points = a.points + 10

Bulk Loading Data Options

Neo4j offers us multiple ways of bulk loading data.

LOAD CSV Command	Built-in in Cypher command for loading CSV data LOAD CSV See https://neo4j.com/docs/developer-manual/3.3/cypher/clauses/load-csv/
APOC LOAD Commands	Awesome Procedures on Cypher (APOC) Custom extension to add additional procedures and functions for Cypher Needs to be installed At the time of writing there are 188 functions and 253 procedures. Specifically for loading data we have: “apoc.load.csv”, “apoc.load.driver”, “apoc.load.jdbc”, “apoc.load.jdbcParams”, “apoc.load.jdbcUpdate”, “apoc.load.json”, “apoc.load.jsonArray”, “apoc.load.jsonParams”, “apoc.load.ldap”, “apoc.load.xml”, “apoc.load.xmlSimple” See https://neo4j-contrib.github.io/neo4j-apoc-procedures/
Command line batch import tool	Command line tool for loading CSV data neo4j-admin.bat import See https://neo4j.com/docs/operations-manual/current/tools/import/
ETL Tool	Neo4j ETL tool for importing data from relational databases into Neo4j. Source: https://github.com/neo4j-contrib/neo4j-etl Manage multiple RDBMS connections Automatically extract database metadata from relational database Derive graph model Visually edit labels, relationship-types, property-names and types Visualize current model as a graph Persist mapping as JSON Dump relevant CSV from relational databases Run import via neo4j-import, bolt-connector, cypher-shell, neo4j-shell Bundles mysql, postgres however; allows custom JDBC driver with Neo4j Enterprise See https://neo4j-contrib.github.io/neo4j-etl/#neo4j-etl-cli

In the next blog I will start creating a data model and discuss design decisions.

Denham Coder

It's all go here. SQL Server, NoSQL, TSQL, Cypher, C#, Javascript, Python, Powershell, TDD, Unit Testing, Structured Coding Methodologies, Azure Cloud, Containers, Devops, I could keep going