One of common rules when designing a database structure is the structure ability to keep as much information as possible. In other words, it is not recommended to ever delete any piece of data. Even though they are not visible to end-user.
For example, your database is required to keep list of registered user and to have ability to remove an user from the database.
This post introduces some common approaches on this purpose. At the end of the post, the best trade-off method will be presented. Being written with sql and non-sql database style thinking in mind, this post is applicable for both types of database design.
Typically there are 3 ways to handle data deletion.
Method 1: Hard deletion
This method is simply removing the record associated with the being deleted row from the table.
- Most simple implementation
- Lose information of the deleted row
- No way to restore the deleted information if the deleted record has not been backed up. Even if the data is backed up, restoring them in run-time from the backup is non-trivial.
Method 2: Soft deletion
Make new column to the tables which have deletion functionality to mark the record for deletion.
It is highly recommended (check comment of this SO question) to make a timestamp
deleted_at column instead of a boolean flag
- Very easy to display/manipulate (e.g. restore) deleted data at run-time
- Have to add
WHEREclause in every query command.
- Error-prone in count clause because of forgetting the
WHEREclause (SO answer)
- Conflicts occur if there is unique key constraint in the table (for e.g. username field). Because deleted record is not really removed from the table, its key still exists and conflicts with others.There is a work around by changing all unique key constraint into compounded keys of (the unique key, deleted_timestamp) or making
DELETEDcolumn be the id of the deleted row.
- (SQL only) Foreign-key constraint is not checked by database in removal.
Method 3: hard archive
Clone the table structure to a new table, namely
archived_user from the original
- (SQL only) Foreign-key constraint can be checked by database. Make use of many useful database features.
- Easy for data restore, deleted data display/manipulation (e.g. trash bin functionality)
- Have to maintain the cloned table's structure
- Overall database structure becomes complicated if there are multiple tables have deletion feature.
My best practice (method 4): Soft Archive
Create an unique table with following columns
- (Optional) Primary
original_idkeeps the id of deleted record. If there is a table needs the deletion function and its primary key is not INT, this
original_idkey should be in string type (
VARCHARif you want to make index on this column, SQL case). For faster deleted data retrieve, I recommend to add an index on this column.
table_nameis the name of the original table.
payloadstores JSON-stringified data of the deleted record.
created_atto store deletion timestamp
This way of database structure has the best trade off among three above-mentioned methods.
- Keep track of all deleted data. It is possible to retrieve deleted data in a way which is not as straight forward as the hard archive (method 3) or soft deletion (method 2). But it is much more easier than restoring from backup data in the hard deletion (method 1).
- Have only one place to archive records from any table, regardless of the deleted record's table structure. No need replicate table structure.
- No worry of unique index in the original table
- (SQL only) No worry of checking foreign index in the original table
- No more WHERE clause in any query to check for deletion
In this post, I have introduced most of the practical methods to implement database structure for the deletion feature in both SQL and non-SQL database type.
If readers have better idea or any additional information (pros, cons, ...), please do not feel hesitate to lave a comment.