In a multi-user database transactions run concurrently. If their access to data is not managed properly it can lead to incorrect data values and database inconsistency. The three most common problems are:
1. Lost updates, which can occur when a transaction updates a value and writes to the database, but a second transaction updates the same data before the first update has been committed, so the first update is over written.
2. Uncommitted data errors are when a transaction updates a value but hasn’t yet committed it and a second transaction performs an operation on the first transaction’s updated value, if the first transaction is roll backed, the second transaction’s update doesn’t yield a correct value.
3. Inconsistent retrievals occur when a transaction is performing an aggregate operation, like a total or average on a data set, while another transaction is updating values in the same dataset. This can result in errors when the aggregate function is using some of the pre-update values and some already updated values.
Concurrency control ensures the serializability of transactions and preserves the ‘ACID’* database transaction properties. It does this by using tools like locks, which control which transactions have access to a piece of data, and time stamps which help determine the order of operations.
*Atomicity – All transactions should be a single logically unit of work.
All operations in a transaction must be carried out or the entire transaction should be rolled back (or undone).
Consistency – The database should be consistent with all the data constraints, before and after transactions are performed.
Isolation – Transactions should be kept separate and prevented from updating the same data at once.
Durability – Once a transaction is committed to the database the change should be permanent and safe from any sort of crash or system failure.
A DBMS scheduler is responsible for concurrency control. It determines the order in which the operations of concurrent transactions are carried out. It interleaves operations using algorithms and protocols which guarantee serializability, the state of the database after the transactions are executed should be the same as if they were executed one after the other.
The scheduler can use locks, time stamps and optimistic techniques to arrange the order of operations and prevent conflicts.
Locks can be put on data (by database, table , page, row, or field) to prevent it being updated by more than one transaction at a time. If two operations are read only they may be allowed to share a lock.
A transaction must acquire all the data locks it needs to carry out its operations before it actually makes any changes to the database. If a transaction is waiting too long for a lock it can time out and be rescheduled, to help prevent deadlocks.
Time stamps are assigned to each transaction. They can be used to determine what happens when two transactions are trying to acquire access to the same piece of data simultaneously. Depending on the time stamp a transaction may be scheduled to wait for the other to finish, or it may be rolled back and rescheduled with the same time stamp.
If the optimistic method is being used, all concurrent transaction are carried out without restrictions. The data to be updated is copied into a temporary file and updated. After any changes have been validated it can be written to the database permanent disk.
After a database experiences any sort of crash or failure, the database must go through a recovery process. The database uses transaction logs and checkpoints to restore the database to its pre-crash state. The recovery process is slightly different for the deferred write and write through techniques .
When using the deferred write technique, changes are recorded in a log before a transaction commit. Only after all updates are made is the data committed. The recovery process with this technique is:
1. Identify the last checkpoint after the last successful save to disk.
2. Any transactions committed before this checkpoint can stay as they are since they were saved to disk before the crash.
3. Any transaction committed after the checkpoint can be validated by the DBMS by comparing the database values to the transaction’s after values recorded in the log.
4. For any transaction that was rolled back after the checkpoint, no action needs to be taken since they didn’t alter the database yet.
With the write through technique changes are written to the database immediately and before a transaction is committed. The log entry is written before the change is made and the log is synced with the database. Step 1 – 3 of the recovery process with this technique is the same as with deferred writing. Step four differs, since any transaction that hadn’t been committed before the last checkpoint must be rolled back and any database values that were altered need to be returned to their pre-transaction state using the before values recorded in the log.