Neo4j Database – Part 05 – Data Loading Options

In the last blog I described the Cypher object notation used by the query language. In this blog I will describe how to manually create nodes and relationships and then how to do this more efficiently by importing data.

Contents

  • Creating Nodes and Relationships by Hand
  • Unique Constraints
  • Merge Command
  • Bulk Loading Data Options

Creating Nodes and Relationships by Hand

In the last blog I introduced the CREATE command which I issued one at a time. We can also create multiple nodes and relationships by comma separating the arguments. For relationships we use variables to isolate the two nodes.

CREATE (…), (…), (…), …

CREATE (a), (b), (a) –[…]-> (b), (a) –[…]-> (b), …

We need to be careful when creating nodes and relationships because especially when creating relationships, you may inadvertently create two additional nodes when really you were trying to match against two nodes in order to create the relationship.

If we create the following we get duplicate nodes.

CREATE (n:Person {name: “fred”}) – [:KNOWS] -> (m:Person {name: “barney”})

CREATE (n:Person {name: “fred”}) – [:FRIENDS] -> (m:Person {name: “barney”})

One way is to create the nodes first and then locate the nodes to create the relationships.

CREATE (n:Person {name: “fred”}), (m:Person {name: “barney”})

MATCH (n:Person {name: “fred”}), (m:Person {name: “barney”}) CREATE (n) – [:KNOWS] -> (m), (n) – [:FRIENDS] -> (m)

Even better, we can do it in one statement.

CREATE (n:Person {name: “fred”}), (m:Person {name: “barney”}), (n) – [:KNOWS] -> (m), (n) – [:FRIENDS] -> (m)

Unique Constraints

Unique constraints allow us to prevent duplicate nodes based on a nodes label and one or more properties. The following constraint is based on the name property for label Person.

CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE

You can view what constraints and indexes are in place using the :SCHEMA command. When creating a unique constraint, an index is created automatically.

Let’s try our commands from above which previously created duplicate nodes.

CREATE (n:Person {name: “fred”}) – [:KNOWS] -> (m:Person {name: “barney”})

CREATE (n:Person {name: “fred”}) – [:FRIENDS] -> (m:Person {name: “barney”})

Notice this time the second command failed.

Merge Command

In other query languages we have the notion of upsert (Update or Insert). If something does not exist, create it otherwise update it. In order to know if something already exists, we need to pass a primary key to the statement.

Cypher has the MERGE command that behaves slightly differently because there is no primary key. MERGE is like a combination of MATCH and CREATE and matching is done on full patterns. So the following will not match.

MERGE (a:Person {name: “Fred”})

MERGE (a:Person {name: “Fred”, dob: 20100101})

Unique constraints will help with the duplication issue but you will still see an error. Rather than tell you the way not to do it, let’s just tell you the way to do it. MERGE has the ON CREATE and ON MATCH keywords. This enables us to determine what to do when the node is first created and then what to do on subsequent updates. MERGE will match any nodes based on the pattern specified. If you create a unique constraint on the pattern, this then behaves like a primary key.

MERGE (a:Person {name: ‘fred’}) ON CREATE SET a.dob = 20100101, a.points = 0 ON MATCH SET a.points = a.points + 10

Bulk Loading Data Options

Neo4j offers us multiple ways of bulk loading data.

LOAD CSV Command

Built-in in Cypher command for loading CSV data

LOAD CSV

See https://neo4j.com/docs/developer-manual/3.3/cypher/clauses/load-csv/

APOC LOAD Commands

Awesome Procedures on Cypher (APOC)

Custom extension to add additional procedures and functions for Cypher

Needs to be installed

At the time of writing there are 188 functions and 253 procedures.

Specifically for loading data we have:

“apoc.load.csv”, “apoc.load.driver”, “apoc.load.jdbc”, “apoc.load.jdbcParams”, “apoc.load.jdbcUpdate”, “apoc.load.json”, “apoc.load.jsonArray”, “apoc.load.jsonParams”, “apoc.load.ldap”, “apoc.load.xml”, “apoc.load.xmlSimple”

See https://neo4j-contrib.github.io/neo4j-apoc-procedures/

Command line batch import tool

Command line tool for loading CSV data

neo4j-admin.bat import

See https://neo4j.com/docs/operations-manual/current/tools/import/

ETL Tool

Neo4j ETL tool for importing data from relational databases into Neo4j.

Source: https://github.com/neo4j-contrib/neo4j-etl

Manage multiple RDBMS connections

Automatically extract database metadata from relational database

Derive graph model

Visually edit labels, relationship-types, property-names and types

Visualize current model as a graph

Persist mapping as JSON

Dump relevant CSV from relational databases

Run import via neo4j-import, bolt-connector, cypher-shell, neo4j-shell

Bundles mysql, postgres however; allows custom JDBC driver with Neo4j Enterprise

See https://neo4j-contrib.github.io/neo4j-etl/#neo4j-etl-cli

 

In the next blog I will start creating a data model and discuss design decisions.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s