Neo4j Database – Part 01 – Foundation

This is a blog series on using the Neo4j NoSQL database.

Introduction

I have been around databases for 30 years starting out with IDMS on ICL mainframes. That progressed into Sybase and Microsoft SQL Server. I have cut my teeth on relational databases however; the one size fits all no longer applies anymore. Those previous eras can be referred to as the first and second database revolution targeting Mainframes, client/server and the beginnings of the web.

The third database revolution are NoSQL databases. With the growth of the cloud, social media, big data, mobile and IOT, a new wave of technologies needed to be designed to cope with the different demands placed on it.

NoSQL databases can be categorized as follows:

  • Document Orientated
  • Key Value Store
  • Wide Column
  • Graph

In order to talk about NoSQL it is often easier when comparing it against a relational database. This is by no means the complete list, just some of the common ones.

SQL

NoSQL

Predefined schema

No schema enforcement (All Data has a schema)

Tabular

Various data structures

ACID compliance

BASE compliance although Neo4j is ACID

Scales vertically

Scales horizontally

Full transaction support

Partial transaction support

SQL Query language

SQL like language or custom language

 

There are other NoSQL types but these are the most popular. If you look at the website DB-Engines Ranking it shows a list of all the databases based on data extracted from search engines. It uses the data to determine how often a search on a particular engine is performed and thereby accessing how popular it is. At the time of writing the most popular Graph database is Neo4j. In case you are wondering the name was originally Neo and the 4j was “for Java” since Java is the code it is written in. Although it is accessed from a multitude of other languages, the name just stuck and is now the actual product name.

Neo4j

What exactly is a Graph database? They follow a concept called Graph Theory which models the relationships between pairs of objects. It really outshines relational and NoSQL databases when you have allot of highly related data. It answers questions easily that would require a multitude of hierarchical and self-joins in relational databases. The following table gives a list of the features of Neo4j.

Platform

  • It is open source with the source on Github
  • Written in Java for all platforms

Version

  • 3.3.4 (At the time of writing)

Data Durability

  • Fully ACID compliant

Indexes

  • Yes

Query Language

Compelling Features

  • Easy to bulk load data in
  • Full REST API to interface with any language
  • It does NOT require complex Joins
  • Has Indexes and Constraints
  • It represents semi-structured data very easily
  • HADR.
  • Scale-out but no Sharding.

 

When would you use Neo4j

The simple answer is any data that is highly related and requires nested traversals. You can view the Use Cases website for a list of the common applications. I always think of an application like Facebook. “How would I find the friends of a friend who are potential friends of mine, who like Martial Arts and have other friends with the same like”? Sound a bit gobbledygook but imagine trying to answer that question in SQL. The simple more understood requirement would be a Sat Nav. “What is the fastest way from A to B via C but avoid motorways”. You get the picture.

Installing Neo4j

Neo4j runs on all platforms. It is available on cloud platforms, Docker containers or installed as standalone systems. At the time of writing there are two editions available; Community and Enterprise. There is also a Desktop installation that installs an Enterprise Edition with a developer licence and includes all the tools and features to try it out.

The rest of this demo uses the Desktop installation which can be found here.

In the next blog I will discuss some of the basics.