Redis Database – Part 05 – Handling NULLs and Data Denormalization

In the last blog I covered how we create our first data model and some queries. The code can be found on Github here. In this blog I want to discuss handling NULL input data and data denormalization.

NULL Data

NOSQL databases do NOT have the concept of NULL. Since there is no schema enforcement, if the imported data is NULL, it is just excluded from the database. This means the onus is put on the client to test for missing data.

If you look at the customer data we import you will see for columns like Region have NULL data.

So when you import the data, unless you know the data very well, you need to check for NULL and exclude the field for that record. Look at this code snippet from 01_Import_Customers.py

   rowdata = {}
   rowdata["ID"] = row[0]
   rowdata["Company"] = row[1]
   rowdata["Contact"] = row[2]
   rowdata["ContactTitle"] = row[3]
   rowdata["Address"] = row[4]
   rowdata["City"] = row[5]
   if row[6] != "NULL":
      rowdata["Region"] = row[6]
   if row[7] != "NULL":
      rowdata["PostalCode"] = row[7]
   rowdata["Country"] = row[8]
   rowdata["Phone"] = row[9]
   if row[10] != "NULL":
      rowdata["Fax"] = row[10]

Here I do an explicit NULL check before adding the field.

Likewise when querying the data, in some situations you will need to check for missing fields and decide how to present the data. In the file 06_Query_Customers_Missing_Fields.py I check for None and return an empty string.

   print("{:5s} {:40s} {:25s} {:20s} {:20s}".format(
      _cust[0].decode('utf-8'), 
      _cust[1].decode('utf-8'),
      "" if _cust[2] is None else _cust[2].decode('utf-8'),
      _cust[3].decode('utf-8'),
      _cust[4].decode('utf-8')
      ))

Here is the output.

Denormalizing Data

Developers who create relational database models strive for Third Normal Form. This ensures no piece of data is stored more than once and relies heavily on joining tables. Denormalization only occurs when certain performance constraints or limits are met and there is no other alternative. In NoSQL databases denormalization is actively encouraged and is where allot of the performance is obtained. Just be aware that flexibility like this places more demand on client applications to maintain consistency when performing CRUD operations.

A good example is the output generated from running 05_Query_Customers_Sorted_By_Country.py

Notice the Country field has hardcoded names and not ID’s to another table as you would expect in a relational database. This sort of denormalization is a one to one mapping. When importing your data, if the source is a relational database, your query would include the relevant joins to create the denormalized data set.

In the next blog we will cover query filtering.