Categories
Database

How to Quickly Delete Data from Large Tables in SQL Server?

Home » Database » How to Quickly Delete Data from Large Tables in SQL Server?

How can we quickly delete data from large tables in SQL Server? Before answering this question, we must clarify the context and specific needs, as different scenarios require different approaches.

Scenario 1: Deleting All Data from the Entire Table

If the goal is to clear or delete all data from an entire table, this scenario is quite simple. Using TRUNCATE TABLE is definitely the fastest method. In contrast, using DELETE would be a terrible strategy.

Scenario 2: Deleting a Portion of Data from a Large Table

For the first scenario, it’s very simple. However, in many real-world business scenarios, the TRUNCATE method cannot be used. The actual situation might involve deleting only a portion of the data from the table or deleting data after archiving it. Suppose we have a table named TEST and need to delete some data from the TEST table. First, we need to summarize the amount of data in the table and the amount of data to be deleted. Specifically, we should adopt the following methods:

1. Check the amount of data in the table and the amount of data to be deleted.

2. Then, calculate the proportion of data to be deleted.

To delete more than 60% of SQL Server data

1. Create a new table `TEST_TMP`.

2. Transfer the data to be retained into `TEST_TMP`.

3. Rename the original table `TEST` to `TEST_OLD`, and rename `TEST_TMP` to `TEST`.

4. Check related triggers and constraints, and rename the triggers or constraints accordingly.

5. After verifying that the operations are correct, either truncate the original table (`TEST_OLD`) and then drop it, or keep it for a period of time as a precaution.

Note: The 60% threshold is based on experience. There are simple tests, but there is no precise and scientific probability validation.

To delete less than 60% of SQL Server data

First, delete or disable unrelated indexes (unrelated indexes refer to those not used in the execution plan, specifically those useless for the current DELETE statement). Since DELETE operations are DML operations and large tables generally have large indexes, a significant amount of DELETE operations will require index maintenance, resulting in a lot of additional I/O operations.

Delete in small batches (batch deletion is much faster than deleting all at once). Do not delete a large amount of data at once. Deleting a large number of records at once will lead to extensive lock granularity and long lock duration, potentially causing blockages and severely affecting business operations. It will also make the database transaction logs very large, extend execution time significantly, and result in poor performance.

When performing batch deletions, how many records should be deleted at a time for optimal SQL efficiency? There is no strict rule for this. Personal tests have shown no particular pattern when deleting 10,000 or 100,000 records at once (some patterns observed may vary in different cases and environments, and sometimes they are just based on experience). Generally, using 10,000 is common. In practical operations, it is recommended to conduct a few experiments to compare and choose a suitable value.

Cases of properly deleting SQL Server data

Case 1:

DECLARE @delete_rows INT;
DECLARE @delete_sum_rows INT =0;
DECLARE @row_count INT=100000
 
WHILE 1 = 1
    BEGIN
        DELETE TOP ( @row_count )
        FROM    dbo.[EmployeeDayData]
        WHERE    WorkDate < CONVERT(DATETIME, '2012-01-01 00:00:00',120);
            
        SELECT  @delete_rows = @@ROWCOUNT;
            
            SET @delete_sum_rows +=@delete_rows
            IF @delete_rows = 0
            BREAK;
        END;
SELECT @delete_sum_rows;

Case 2:

DECLARE @r INT;
DECLARE @Delete_ROWS  BIGINT;
 
SET @r = 1;
SET @Delete_ROWS =0
WHILE @r > 0
BEGIN
    BEGIN TRANSACTION;
        DELETE TOP (10000) -- this will change
           YourSQLDba..YdYarnMatch
           WHERE Remark='Waiting' and Operation_Date<CONVERT(datetime, '2019-05-30',120);
 
          SET @r = @@ROWCOUNT;
          
          SET @Delete_ROWS += @r;
 
    COMMIT TRANSACTION;
    
    PRINT(@Delete_ROWS);
 
END

This table has the following two indexes:

USE [YourSQLDba]
GO
 
 
IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[YdYarnMatch]') AND name = N'IX_YdYarnMatch_N2')
DROP INDEX [IX_YdYarnMatch_N2] ON [dbo].[YdYarnMatch] WITH ( ONLINE = OFF )
GO
 
USE [YourSQLDba]
GO
 
 
CREATE NONCLUSTERED INDEX [IX_YdYarnMatch_N2] ON [dbo].[YdYarnMatch] 
(
    [Job_No] ASC,
    [GK_No] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
 
USE [YourSQLDba]
GO
 
IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[YdYarnMatch]') AND name = N'IX_YdYarnMatch_N1')
DROP INDEX [IX_YdYarnMatch_N1] ON [dbo].[YdYarnMatch] WITH ( ONLINE = OFF )
GO
 
USE [YourSQLDba]
GO
 
CREATE NONCLUSTERED INDEX [IX_YdYarnMatch_N1] ON [dbo].[YdYarnMatch] 
(
    [Operation_Date] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

Key Point: Practice has shown that creating a new index can prevent a full table scan in the execution plan during batch deletion and significantly speed up the deletion process. I personally tested and verified this case. After adding an appropriate index, the DELETE statement performed an Index Seek, and the deletion efficiency was indeed greatly improved.

Delete index IX_YdYarnMatch_N2, keep index IX_YdYarnMatch_N1, but found that the SQL execution plan performs a full table scan, resulting in very slow deletion when executing the SQL; after deleting index IX_YdYarnMatch_N1 and recreating it, the execution plan performs an Index Seek, significantly improving deletion efficiency.

CREATE NONCLUSTERED INDEX [IX_YdYarnMatch_N1] ON [dbo].[YdYarnMatch]
(
    [Operation_Date] ASC ,
    Remark
   
)

Note: The index names are the same here, but they correspond to different fields.

So the correct approach is:

1. First, delete or disable irrelevant indexes (indexes that are not useful for the current DELETE statement). Generate the corresponding SQL before deletion to recreate indexes after data deletion is completed. Note that this operation should not impact the application during the operation phase. Otherwise, reconsider the approach.

2. Check and test the current SQL execution plan to determine if appropriate indexes can be created to speed up the DELETE operation, as shown in the example above.

3. Delete records in batches using a loop.

4. In Oracle databases, some table settings can reduce the amount of log generated for corresponding DML operations. However, SQL Server lacks these functionalities, so it’s important to promptly monitor or adjust transaction log backup situations.

If we can set the database recovery mode to SIMPLE, it can reduce additional IO overhead caused by log backups. However, many production environments cannot switch the user database recovery mode.

Sum Up

In fact, after all that has been said, the method for quickly deleting indexes from large tables in SQL Server is simply to change the single bulk delete into batched deletions, and commit them incrementally. All other methods are just supplementary approaches.

By Jaxon Tisdale

I am Jaxon Tisdale. I will share you with my experience in Network, AWS, and databases.

Leave a Reply

Your email address will not be published. Required fields are marked *