From SQL Server 2016, Microsoft provided support for storing and manipulating JSON within a SQL Server database. The aim of this blog is not to teach you how to use JSON but rather a use case for when JSON is a better alternative.
For this example I am going to show you how I implemented an Auditing table that stored JSON. Auditing tables can be used for storing the status of application process executions and typically you will store supplementary information that is different for each process. For example; you may have a daily data feed and want to store the number of rows before the feed started, the number of rows updated or inserted and the number of rows that failed validation.
An Example without using JSON
I have two application processes called Feed1 and Feed2.
Feed1 stores three specific row counts called Staged, Processed and Duplicate.
Feed2 stores two specific row counts called RegionCount, AreaCount.
I create the following audit table. I will explain the columns when necessary below.
CREATE TABLE [Audit] ( [AuditKey] INT IDENTITY (1, 1) NOT NULL, [Process] NVARCHAR (50) NOT NULL, [Version] NVARCHAR (16) NOT NULL, [Status] NVARCHAR (16) NOT NULL, [StartTime] DATETIME NOT NULL, [EndTime] DATETIME NULL, [Staged] INT NULL, [Processed] INT NULL, [Duplicate] INT NULL, [RegionCount] INT NULL, [AreaCount] INT NULL, [Message] NVARCHAR(MAX) NULL, CONSTRAINT [Audit_Status_CHK] CHECK (Status IN ('Started','Completed','Failed')), CONSTRAINT [Audit_PK] PRIMARY KEY CLUSTERED ([AuditKey] ASC) );
When a process starts it inserts a row into the table to say the process has Started and then updates the row when the process has Completed or Failed. I will create a stored procedure called sp_Audit_Initialize to initialise a row and return the unique audit key for use later.
CREATE PROC sp_Audit_Initialize @Process NVARCHAR(50), @Version NVARCHAR(16), @AuditKey INT OUTPUT AS INSERT Audit (Process, Version, Status, StartTime) VALUES(@Process, @Version, N'Started', GETDATE()); SELECT @AuditKey = CAST(SCOPE_IDENTITY() AS INT);
Next we need a stored procedure called sp_Audit_Update which is called when the process finishes.
CREATE PROC sp_Audit_Update @AuditKey INT, @Status NVARCHAR(16), @Staged INT = NULL, @Processed INT = NULL, @Duplicate INT = NULL, @RegionCount INT = NULL, @AreaCount INT = NULL, @Message NVARCHAR(512) = NULL AS UPDATE Audit SET Status = @Status, EndTime = GETDATE(), Staged = @Staged, Processed = @Processed, Duplicate = @Duplicate, RegionCount = @RegionCount, AreaCount = @AreaCount, Message = ISNULL(@Message,'None') WHERE AuditKey = @AuditKey;
Let’ create some test data. First of all I will call sp_Audit_Initialize for Feed1 and display the rows in the table.
DECLARE @Key INT; EXEC sp_Audit_Initialize @Process = 'Feed1', @Version = 'v2.1', @AuditKey = @Key OUTPUT;
Feed1 shows it is currently running (Started) or has crashed and not updated the row. The Version column can be used to log the version of the Feed1 process to ensure the correct version is running. Let’s update the row to show success. I am hardcoding the AuditKey for demonstration purposes however; this would be stored in a variable and referenced by the code.
EXEC sp_Audit_Update @AuditKey = 1, @Staged = 25000, @Processed = 24000, @Duplicate = 10000, @Status = 'Completed'
The Feed1 was successful (Completed) and the various row counts were added to the columns that are specific to Feed1. Let’s add a failure.
DECLARE @Key INT; EXEC sp_Audit_Initialize @Process = 'Feed1', @Version = 'v2.1', @AuditKey = @Key OUTPUT; EXEC sp_Audit_Update @AuditKey = @Key, @Staged = 0, @Status = 'Failed', @Message = "Zero rows in the staging table"
Feed1 failed because there were no rows in the staging table as indicated by the Message column. Let’s include an entry from the Feed2 process.
DECLARE @Key INT; EXEC sp_Audit_Initialize @Process = 'Feed2', @Version = 'v3.5', @AuditKey = @Key OUTPUT; EXEC sp_Audit_Update @AuditKey = @Key, @RegionCount = 27, @AreaCount = 53, @Status = 'Completed';
As you can see Feed2’s entry is similar Feed1. with its specific row columns updated. Whilst this audit table does meet our requirements, are you happy with this design.
What is Wrong with this Design?
I think we can all see the problem with having an audit table that needs to store process specific data. The issues are as follows:
- Imagine if we introduced 10’s of processes, how many additional columns would we need.
- The column names only make sense to the process and not the user looking at the table, so what should we realistically call them.
- When new processes are added the Audit table and sp_Audit_Update stored procedure will need to be updated.
An Example using JSON
Let’s start by creating our Audit table again. You will notice that the process specific columns are gone and are replaced by the Info column that will contain our JSON data. There is also on CONSTRAINT on that column to ensure only valid JSON is inserted.
CREATE TABLE [Audit] ( [AuditKey] INT IDENTITY (1, 1) NOT NULL, [Process] NVARCHAR (50) NOT NULL, [Version] NVARCHAR (16) NOT NULL, [Status] NVARCHAR (16) NOT NULL, [StartTime] DATETIME NOT NULL, [EndTime] DATETIME NULL, [Info] NVARCHAR(MAX) NULL, CONSTRAINT [Audit_Status_CHK] CHECK (Status IN ('Started','Completed','Failed')), CONSTRAINT [Audit_Info_CHK] CHECK (ISJSON(Info)=1), CONSTRAINT [Audit_PK] PRIMARY KEY CLUSTERED ([AuditKey] ASC) );
The sp_Audit_Initialize stored procedure is the same as above.
CREATE PROC sp_Audit_Initialize @Process NVARCHAR(50), @Version NVARCHAR(16), @AuditKey INT OUTPUT AS INSERT Audit (Process, Version, Status, StartTime) VALUES(@Process, @Version, N'Started', GETDATE()); SELECT @AuditKey = CAST(SCOPE_IDENTITY() AS INT);
This is where we now take a component based approach to developing stored procedures. Much like we do in Object Orientated coding, I will create an audit update stored procedure for each process. By creating one for each process, we are not updating existing code. Where possible we always want to be added new code and leaving existing code alone. Some of you may be thinking, why not pass in the JSON to the stored procedure and that way, we only need one stored procedure for every process. The JSON is an implementation detail and the process calling the stored procedure should not need to know that. The last thing you want to do is put the burden of passing correct and valid JSON on the user calling your code.
I create two stored procedures called sp_Audit_Update_Feed1 and sp_Audit_Update_Feed2.
CREATE PROC sp_Audit_Update_Feed1 @AuditKey INT, @Status NVARCHAR(16), @Staged INT = NULL, @Processed INT = NULL, @Duplicate INT = NULL, @Message NVARCHAR(512) = NULL AS DECLARE @JSON NVARCHAR(MAX); SET @JSON = N'{"Staged":"' + CAST(ISNULL(@Staged,'-1') AS NVARCHAR) + '", "Processed":"' + CAST(ISNULL(@Processed,'-1') AS NVARCHAR) + '", "Duplicate":"' + CAST(ISNULL(@Duplicate,'-1') AS NVARCHAR) + '", "Message":"' + ISNULL(@Message,'None') + '"}'; UPDATE Audit SET Status = @Status, EndTime = GETDATE(), Info = @JSON WHERE AuditKey = @AuditKey; CREATE PROC sp_Audit_Update_Feed2 @AuditKey INT, @Status NVARCHAR(16), @RegionCount INT = NULL, @AreaCount INT = NULL, @Message NVARCHAR(512) = NULL AS DECLARE @JSON NVARCHAR(MAX); SET @JSON = N'{"RegionCount":"' + CAST(ISNULL(@RegionCount,'-1') AS NVARCHAR) + '", "AreaCount":"' + CAST(ISNULL(@AreaCount,'-1') AS NVARCHAR) + '", "Message":"' + ISNULL(@Message,'None') + '"}'; UPDATE Audit SET Status = @Status, EndTime = GETDATE(), Info = @JSON WHERE AuditKey = @AuditKey;
I am going to insert the same test data as we did above and compare the difference.
DECLARE @Key INT; EXEC sp_Audit_Initialize @Process = 'Feed1', @Version = 'v2.1', @AuditKey = @Key OUTPUT; EXEC sp_Audit_Update_Feed1 @AuditKey = 1, @Status = 'Completed', @Staged = 25000, @Processed = 24000, @Duplicate = 10000 EXEC sp_Audit_Initialize @Process = 'Feed1', @Version = 'v2.1', @AuditKey = @Key OUTPUT; EXEC sp_Audit_Update_Feed1 @AuditKey = @Key, @Status = 'Failed', @Staged = 0, @Message = "Zero rows in the staging table" EXEC sp_Audit_Initialize @Process = 'Feed2', @Version = 'v3.5', @AuditKey = @Key OUTPUT; EXEC sp_Audit_Update_Feed2 @AuditKey = @Key, @Status = 'Completed', @RegionCount = 27, @AreaCount = 53;
As you can see this is a whole lot nicer. It is scalable and does not require anything to be updated if a new process is added.
One question we now have is “how to we query the data in the JSON column”? We do this by creating a view for each process. vAudit_Feed1 and vAudit_Feed2.
CREATE VIEW vAudit_Feed1 AS SELECT AuditKey, Process, Version, Status, StartTime, EndTime, DATEDIFF(ss, StartTime, EndTime) AS [Duration(Sec)], JSON_VALUE(Info, '$.Staged') AS [Staged], JSON_VALUE(Info, '$.Processed') AS [Processed], JSON_VALUE(Info, '$.Duplicate') AS [Duplicate], JSON_VALUE(Info, '$.Message') AS [Message] FROM Audit WHERE Process = 'Feed1'; CREATE VIEW vAudit_Feed2 AS SELECT AuditKey, Process, Version, Status, StartTime, EndTime, DATEDIFF(ss, StartTime, EndTime) AS [Duration(Sec)], JSON_VALUE(Info, '$.RegionCount') AS [RegionCount], JSON_VALUE(Info, '$.AreaCount') AS [AreaCount], JSON_VALUE(Info, '$.Message') AS [Message] FROM Audit WHERE Process = 'Feed2';
Let’s do a select * on the views. As you can see we have returned a result set making our whole use of JSON transparent to the application.
I hope you found this useful. Look out for more blogs on the use of JSON inside SQL Server.