So, what are ACID Properties?

So, what are ACID Properties?

No not chemistry, this is about database systems.

ยท

6 min read

Like many other weird acronyms (DRY, KISS, SOLID...) in computer science, ACID refers to a group of 4 guiding principles - Atomicity, Consistency, Isolation & Durability that ensure database transactions are processed reliably in a database system.

ACID properties were one of the most asked interview topics for backend development interviews I appeared for this year. So here is a brief blog about the various principles.

Before we understand the ACID properties in-depth, we should know what are transactions in a database system.

Transactions ๐Ÿฆ

A transaction is a collection of queries which is treated as one unit of work. It is not possible to do everything in one single query. You might need multiple queries to achieve what you logically want in the application.

For example, you might want to design a system where you send โ‚น1000 to your friend's bank account from your account. This monetary transaction can be broken down into 2 queries:

  • An UPDATE query to deduct โ‚น1000 from your account

  • Another UPDATE query to add โ‚น1000 to your friend's account

Transaction Lifecycle

A transaction begins with the BEGIN keyword. The changes made by the transaction are not saved in the database until we COMMIT them to the database. If we want to undo the changes made by a transaction, we do a ROLLBACK.

Here is an example transaction:

begin;
update accounts set amount = amount - 1000 where acc_id = 1;
update accounts set amount = amount + 1000 where acc_id = 2;
select * from accounts;
commit;

Now that we are aware of what a transaction is, we can continue with the ACID properties.

Atomicity โš›๏ธ

Atomicity states that all queries in a transaction must succeed. Even if one single query fails, all the previous successful queries should rollback. Let's take an example based on the previous queries.

Consider a situation where during the transaction, we executed the first query (deducting the amount from the account) but due to some reason (like power cut, OS crash etc.) we were not able to complete the second query.

This would result in the database being in an unstable state (and โ‚น1000 missing from the universe). To avoid this, the database should do a ROLLBACK as soon as it comes back online to ensure that the data is in a consistent state.

This leads us to the next property, consistency.

Consistency ๐ŸŸฐ

Consistency ensures that data which is stored and which is read in the database is correct throughout its lifecycle. It does not guarantee the correctness of the transaction in all ways an application programmer might expect (that should be taken by the application). Instead, consistency guarantees that programming errors cannot result in the violation of any defined database constraints.

Isolation ๐Ÿ—‚๏ธ

In a database system, multiple connections are trying to read & modify the data at the same instance. Isolation defines whether inflight (ongoing) transactions will be able to see changes made by other transactions.

The SQL standard defines 3 read phenomena which may arise when many people read/write to the same set of rows at the same time.

Dirty Reads

Reading something that some other transaction wrote but did not commit. The changes made by the other commit might be rolled-back, written to the database or even the database could crash. So there is a possibility you could be reading stuff that was never saved to the database!

Non Repeatable Reads

A non-repeatable read is when reading the same row twice within the same transaction leads to different results. This happens because another transaction UPDATES the value of the row while the first one was still reading it.

Phantom Reads

A phantom read is a special case of non-repeatable reads. In this case, another transaction adds/deletes a row which matches the WHERE clause of our query. As a result, the repeated query returns different rows.

For example, consider this scenario. You have a table with sales reports for each day of this month. You were running transaction A which was calculating the sum of sales and average sales for this month. As transaction A executed the sum query, another transaction B added a few more sales data to today's row.

Now, when transaction A will calculate the average sales, it will not be in sync with the sum. As a result, the same transaction reads two different values for the same row in two similar queries.

Isolation Levels ๐Ÿช

To help overcome these read phenomena, isolation levels were made.

Dirty ReadsNon-repeatable ReadsPhantom Reads
Read Uncommittedโœ…โœ…โœ…
Read CommittedโŒโœ…โœ…
Repeatable ReadsโŒโŒโœ…
SerializableโŒโŒโŒ

โœ…: Possible

โŒ: Not Allowed

Read Uncommitted: This has no isolation. Any change from outside is visible in the transaction, committed or not.

Read Committed: Each query in a transaction only sees committed changes by other transactions.

Repeatable Reads: The transaction will make sure that a row which was previously read remains unchanged during the transaction.

Serializable: In serializable, transactions are run one after another.

Now we are left with the last property, durability.

Durability ๐Ÿ›ก๏ธ

Durability is the process of persisting changes that a client makes to the database in non-volatile memory. Even if the database crashed after a power loss after a commit has been done, the changes should be visible on the next restart.

It is not as easy as it sounds because of two reasons:

  1. Saving to disk is slow

  2. A write request is sent to the OS cache and not to the disk. The OS wants to batch these writes and then flush them to the disk at once to reduce IO. This means in case of an OS failure, there are chances of losing data because the data was only written to the cache and not to the disk.

These are the methods used to ensure durability:

Write Ahead Log โŒ›๏ธ

We maintain a log of all activities which can be used to rebuild in case of a crash. Writing a lot of data (index data, rows, columns) to the database can be expensive. So we compress this data into WAL log segments.

If the WALs were directly written to the OS Cache, recovery after a machine cache would be impossible/difficult. This is why WAL uses the FSYNC command which writes immediately to the disk but is very slow.

Asynchronous Snapshots ๐Ÿ“ธ

As we write, we keep everything in memory and that is asynchronously written to disk.

Append-Only Files ๐Ÿ“

Maintain a file which only saves the changes in data.

Conclusion ๐Ÿ™๐Ÿฝ

This was a comprehensive blog on how database systems handle and manage data and transactions so that we as developers do not have to worry much about its integrity. I have attached some links which you can refer to if you want to know more about these fundamental principles.

Thanks to Esha Baweja for desiging the cover image โœจ

ย