In the world of databases when we talk about writing data into a database the topic of transactions comes up. When talking about transactions and Relational Databases the acronym ACID is implied. In the world of NoSQL we also have BASE and CAP. So what exactly does all this mean? Let’s discuss these three acronyms.
Before I do let’s think about this first. When you are accessing a website (say Amazon) which ultimately shows you data from a database you will want:
- It to be available. No website down message.
- What products for sale are actually for sale.
- What you are charged is what was shown before you clicked “go for it”
- What you get sent is what you asked for.
- And many more.
That is a lot of constraints put onto Amazon to make sure that happens and it has to do it at a price. That price is a slower website experience for you which then ultimately forces you to go to one of its competitors. It turns out that we may be able to operate on a more flexible model that on average meets your requirements but has ways of dealing with those situations they may not. A nice quote I once heard was “availability is revenue” and that fits well when discussing this topic.
ACID
When relational databases took off, ACID was all we had and was the standard to follow. Everything was in a big box and when we needed to scale, we just put more hardware in the box. Trying to scale outside the box and maintaining ACID proved extremely difficult.
Atomic
Either a transaction completes as a whole or it fails as a whole. If one element with the transaction fails, they all fail.
Consistency
Before the transaction started the database was in a valid state. When it completes, it must also be in a valid state. So what does that mean? It means all constraints (Primary, Foreign, Check, etc), triggers, cascading actions have all completed and are valid. If a user decides to delete all the data in a database, that is consistently valid, so it does not guarantee no data loss.
Isolation
Guarantees if two transactions were executed concurrently and each updated the same data, the outcome would be the same as if the two transactions ran sequentially. In the RDBMS world this comes in many levels and is configurable. For example; it may be ok to read a piece of updated data mid transaction or it may not.
Durability
Once a transaction is committed or rolled back, the system is guaranteed to remain in a consistent state even if the server is restarted.
BASE
Distributed shared data systems is where the same piece of data is in two or more places and the BASE model determines how available and how consistent this piece of data needs to be. Every business requirement is different and you will need to dial it up or down in order to meet your requirements.
Basically Available
Guarantees the availability of data based on the CAP model but the data could be in an inconsistent state. We sacrifice inconsistency for availability. “Availability is revenue”.
Soft State
The state of the system could change over time or during a transaction. The data might not be correct right now but it will be correct soon.
Eventual Consistency
The system will eventually be consistent but just know there is a short window of inconsistency (whatever short means).
CAP
This is an acronym created by Eric Brewer around 2000 and you may hear it referred to a Brewer’s Theorem. It was devised for distributed shared data systems, so databases sitting on a single server need not apply. It works on a pessimistic model which does not enforce anything upfront. The theorem is typically shown as a pyramid as shown below.
The important point to know is that according to Eric, you can only have 2 out of the 3 properties which is why you see vendors marked between A and C, P and C, etc.
Consistency
Wants consistency but not in the same way as ACID. It works on the “eventually it will be consistent” model.
Availability
More important for a system to be available than it is to be consistent. We do not want to see “Website is temporarily unavailable”.
Partition Tolerant
If a network failure occurs and half the servers cannot see the other half what should happen? It turns out in a NoSQL world you never sacrifice partition tolerance which is another way of saying, all NoSQL systems must handle this type of failure. So the question is “do NoSQL systems go AP or CP”? AC would be for RDBMS’s only.
There is a good article titled CAP: Twelve Years Later which is worth a read.