POSA

To delete or not to delete - practical data archive in database design

Sang

Sep 27, 2019 • 3 min read

One of the common rules when designing a database structure is the structure's ability to keep as much information as possible. In other words, it is not recommended to ever delete any piece of data. Even though they are not visible to the end-user.

For example, your database is required to keep a list of the registered users and to have the ability to remove a user from the database.

This post introduces some common approaches on this purpose. At the end of the post, the best trade-off method will be presented. Being written with sql and non-sql database style thinking in mind, this post is applicable for both types of database design.

‌ ‌

Typically there are 3 ways to handle data deletion.

Method 1: Hard deletion

This method is simply removing the record associated with the being deleted row from the table.

Pros:

Most simple implementation

Cons:

Lose information of the deleted row
No way to restore the deleted information if the deleted record has not been backed up. Even if the data is backed up, restoring them in run-time from the backup is non-trivial.

Method 2: Soft deletion

Make new column to the tables which have deletion functionality to mark the record for deletion.

It is highly recommended (check comment of this SO question) to make a timestamp deleted_at column instead of a boolean flag is_deleted.

Pros:

Very easy to display/manipulate (e.g. restore) deleted data at run-time

Cons:

Have to add WHERE clause in every query command.
Error-prone in count clause because of forgetting the WHERE clause (SO answer)
Conflicts occur if there is a unique key constraint in the table (e.g. username field). Because the deleted record is not really removed from the table, its key still exists and conflicts with others.‌‌There is a workaround by changing all unique key constraint into compounded keys of (the unique key, deleted_timestamp) or making DELETED column be the id of the deleted row.
(SQL only) The foreign-key constraint is not checked by the database in removal.

Method 3: hard archive

Clone the table structure to a new table, namely archived_user from the original user table.

Pros:

(SQL only) The foreign-key constraint can be checked by the database. Make use of many useful database features.
Easy for data restore, deleted data display/manipulation (e.g. trash bin functionality)

Cons:

Have to maintain the cloned table's structure
Overall database structure becomes complicated if there are multiple tables have deletion feature.

‌ ‌

My best practice (method 4): Soft Archive

Create a unique table with the following columns

(Optional) Primary id key
original_id keeps the id of the deleted record. If there is a table needs the deletion function and its primary key is not INT, this original_id key should be in string type (VARCHAR if you want to make an index on this column, SQL case). For faster-deleted data retrieve, I recommend adding an index on this column.
table_name is the name of the original table.
payload stores JSON-stringified data of the deleted record.
(Optional) created_at to store deletion timestamp

This way of database structure has the best trade-off among the three above-mentioned methods.

Keep track of all deleted data. It is possible to retrieve deleted data in a way that is not as straightforward as the hard archive (method 3) or soft deletion (method 2). But it is much easier than restoring from backup data in the hard deletion (method 1).
Have only one place to archive records from any table, regardless of the deleted record's table structure. No need to replicate table structure.
No worry of unique index in the original table
(SQL only) No worry of checking the foreign index in the original table
No more WHERE clause in any query to check for deletion

‌ ‌

In this post, I have introduced most of the practical methods to implement database structure for the deletion feature in both SQL and non-SQL database type.

If readers have a better idea or any additional information (pros, cons, ...), please do not feel hesitate to leave a comment.

Thank you.

Method 1: Hard deletion

Method 2: Soft deletion

Method 3: hard archive

My best practice (method 4): Soft Archive

Sign up for more like this.