In an eventual consistent system like Cassandra, information about deleted keys should be stored to avoid reading the deleted data. When a row or column is deleted, this information is stored as tombstone.  Tombstones are stored until gc grace period associated with column family not reached. Only major compaction removes tombstones that are older than gc grace period. Tombstones are spread to all replicas when repair is performed. It is important to run repair regularly to eliminate resurrecting deleted data.

A Row tombstone is created when a row is deleted, a Column tombstone is created when a specified column is deleted from a row and a Range tombstone is created when user deletes a super column. Tombstone includes application's deletion timestamp (MarkedForDeleteAt) and local deletion timestamp (LocalDeletionTime). Range tombstone also includes start and end column names.


It is a timestamp in milliseconds specified by Application in delete request. For a live column or row it is stored as max long value 0x8000000000000000.


It is a timestamp in seconds set by Cassandra server when it receives a request to delete a column or super column. The value is set as current time in seconds. For a live column or row it is stored as max integer value 0x7FFFFFFF..

Tombstone thresholds

tombstone_warn_threshold (Default value 1000) is used to log a warning message when  scanned tombstone count is greater than this limit but the query execution will continue and tombstone_failure_threshold (Default value 100000) is used to abort slice query if scanned tombstone count exceeds this limit. An error message (TombstoneOverwhelmingException) is also logged.

Tombstones can impact performance of slice queries especially for wider rows. When Cassandra executes a column slice query it needs to read columns from all the SSTables that include the given row and filter out tombstones. And all these tombstones need to be kept in memory till the row fragments from all SSTables are merged which increase heap space usage.