In the last blog I introduced some neo4j basics for managing a database and how to interact with it. In this blog I will begin to uncover the objects that make up a Neo4j database. This is very important when it comes to defining the correct data model.
You will be pleased to know that a Neo4j database is made up of the following four distinct concepts:
- Node
- Label
- Properties
- Relationships
Node
Nodes are the main component where the majority of the data resides. If you think about an RDBMS table for example; a node would be the equivalent of a row.
Imagine we want to model our family.
Here we have 4 nodes that represent the four family members. From Neo4j perspective, these are 4 very distinct nodes that represent 4 different entities however; in the real world we naturally group things into sets. Take a look at the 4 nodes below.
Here we have another 4 nodes that are the same shape and colour but we know represent different things. So in our two examples above what they all have in common is that the first 4 are People and the second 4 are hobbies.
Label
Labels are a way of categorizing nodes into sets. Below we have classified the nodes with either the Person or Hobby label.
You may be wondering what advantage that gives us. Well it allows us to query Ne04j and say “get me all the Person nodes and then ….” or “get me all the hobby nodes and then ….”. Without a Label our queries would begin with “get all nodes and then….”. Labels are indexed and therefore access directly to them is very fast.
Neo4j allows us to attach multiple labels to a single node. This gives us more query flexibility to answer specific questions. Imagine we have thousands of families in our database. We might want to say “get me all the Dad’s and then…” or “get me all the male family members and then…”.
Deciding when to attach multiple labels can actually be more than difficult than you realise. It has to do with properties which I will describe next.
Properties
I mentioned a Node is like a row in an RDBMS table and rows contain one or more pieces of data. In Neo4j we store the data in properties that are associated with a node. Each property consists on a named/value pair. The name is a string and the value can be one of the following types:
- String
- Integer
- Float
- Boolean
This is not a huge list and you may feel this is very limited however; you can store anything in a string. If you added JSON, Lists or Dictionaries into a string, it will work it is just that Cypher has no native support for it and therefore the client application would have to do the processing.
In the example above we have created 4 properties, Name, Age, Date of birth and an ID. Remember there is no schema enforcement in Neo4j, so we do not have to have the same properties on each node.
Neo4j allows us to create indexes on properties either on a single property or a composite index on two or more properties. Like any database, NoSQL or RDBMS, they all use indexes to speed up reads and having to trade performance when doing CRUD operations.
Neo4j also allows us to create a unique constraint on a property that allows us to store unique values for a specific label. It does not mean that all nodes must have this property, it just means that if you include the property in a specific node, its value needs to be unique. When defining the constraint, an index will automatically be added to that property. You can also create a composite unique constraint that spans two or more properties to provide uniqueness which also creates the corresponding index.
I mentioned in the Label section that knowing when to create a label and knowing when to create a property instead, is actually determine through query patterns and general experience. A label represents an entity type which we used Person and Hobby as an example above. What if we wanted to store a property called Married = true/false. There is a Boolean property type. We could add a label called Married to every node where person is married. Remember, all labels are indexed so we don’t have a say in that whereas property indexes are optional. So this is a performance maintenance consideration. Adding too many labels may induce lots of label matching in order to retrieve the correct nodes. We also have another way of storing this property which brings us on to relationships.
Relationships
Graph databases are all about Nodes and Relationships. In graph theory relationships are referred to as edges. Relationships connect two nodes however; unlike relationships in RDBMS, in Neo4j they are first class citizens. They can have properties and direction and can be recursive. Properties follow exactly the same rules as properties on nodes
Let’s add a LIKES relationship from Dad to Karate.
One of the aspects which cause some confusion is the relationship direction. The display above shows an arrow pointing from Dad towards Karate. That makes sense. ‘Dad likes Karate’ however; ‘Karate likes Dad’ does not make sense. What about this relationship below.
Dad is married to Mum but Mum is also married to Dad so shouldn’t we have a bi-directional arrow. Neo4j does not allow us to create a bi-directional relationships. So that’s fine you may say, let’s just create one in each direction.
It turns out that this is not a good best practise and in fact relationships are bi-directional by default. Cypher queries actually allow us to ignore the direction when it makes sense. So the best practise is to model the bi-directional relationship with an arbitrary direction and then structure your query to ignore the direction.
I do not want to get into the Cypher query language in this blog however: I just want to show the query differences which I will explain in the next blog. Look at the direction of the arrow in the picture above. (Left to Right)
Look at the direction of the arrow below in the query. Left to right. Dad and Mum are found
MATCH (a:Person)-[MARRIED]->(b:Person) WHERE a.name = “Dad” RETURN a,b
Look at the direction of the arrow below. Right to left. Mum to Dad is not found and so no data is returned.
MATCH (a:Person)<-[MARRIED]-(b:Person) WHERE a.name = "Dad" RETURN a,b
Look at the two queries below. NO arrow head so either direction is fine. Both queries return Dad and Mum.
MATCH (a:Person)-[MARRIED]-(b:Person) WHERE a.name = “Dad” RETURN a,b
MATCH (a:Person)-[MARRIED]-(b:Person) WHERE a.name = “Mum” RETURN a,b
In the next blog I will go deeper and explain Cypher language notation in order to begin using Neo4j.