To delete or not to delete - practical data archive in database design
One of the common rules when designing a database structure is the structure's ability to keep as much information as possible. In other words, it is not recommended to ever delete any piece of data. Even though they are not visible to the end-user.
For example, your database is required to keep a list of the registered users and to have the ability to remove a user from the database.
This post introduces some common approaches on this purpose. At the end of the post, the best trade-off method will be presented. Being written with sql and non-sql database style thinking in mind, this post is applicable for both types of database design.
Typically there are 3 ways to handle data deletion.
Method 1: Hard deletion
This method is simply removing the record associated with the being deleted row from the table.
Pros:
- Most simple implementation
Cons:
- Lose information of the deleted row
- No way to restore the deleted information if the deleted record has not been backed up. Even if the data is backed up, restoring them in run-time from the backup is non-trivial.
Method 2: Soft deletion
Make new column to the tables which have deletion functionality to mark the record for deletion.
It is highly recommended (check comment of this SO question) to make a timestamp deleted_at
column instead of a boolean flag is_deleted
.
Pros:
- Very easy to display/manipulate (e.g. restore) deleted data at run-time
Cons:
- Have to add
WHERE
clause in every query command. - Error-prone in count clause because of forgetting the
WHERE
clause (SO answer) - Conflicts occur if there is a unique key constraint in the table (e.g. username field). Because the deleted record is not really removed from the table, its key still exists and conflicts with others.There is a workaround by changing all unique key constraint into compounded keys of (the unique key, deleted_timestamp) or making
DELETED
column be the id of the deleted row. - (SQL only) The foreign-key constraint is not checked by the database in removal.
Method 3: hard archive
Clone the table structure to a new table, namely archived_user
from the original user
table.
Pros:
- (SQL only) The foreign-key constraint can be checked by the database. Make use of many useful database features.
- Easy for data restore, deleted data display/manipulation (e.g. trash bin functionality)
Cons:
- Have to maintain the cloned table's structure
- Overall database structure becomes complicated if there are multiple tables have deletion feature.
My best practice (method 4): Soft Archive
Create a unique table with the following columns
- (Optional) Primary
id
key original_id
keeps the id of the deleted record. If there is a table needs the deletion function and its primary key is not INT, thisoriginal_id
key should be in string type (VARCHAR
if you want to make an index on this column, SQL case). For faster-deleted data retrieve, I recommend adding an index on this column.table_name
is the name of the original table.payload
stores JSON-stringified data of the deleted record.- (Optional)
created_at
to store deletion timestamp
This way of database structure has the best trade-off among the three above-mentioned methods.
- Keep track of all deleted data. It is possible to retrieve deleted data in a way that is not as straightforward as the hard archive (method 3) or soft deletion (method 2). But it is much easier than restoring from backup data in the hard deletion (method 1).
- Have only one place to archive records from any table, regardless of the deleted record's table structure. No need to replicate table structure.
- No worry of unique index in the original table
- (SQL only) No worry of checking the foreign index in the original table
- No more WHERE clause in any query to check for deletion
In this post, I have introduced most of the practical methods to implement database structure for the deletion feature in both SQL and non-SQL database type.
If readers have a better idea or any additional information (pros, cons, ...), please do not feel hesitate to leave a comment.
Thank you.