The problem
I attempted to deploy a trigger change on a load-bearing table in a busy system. I was repeatedly blocked and, while waiting for the SCH-S lock, I ended up blocking other queries.
I group blockers into two categories based on duration:
- Long
- You have enough time to detect the block and query
WhoIsActiveor another DMV utility to identify the blocker. - Can you
killit? It might be critical for business or have a long rollback time.
- You have enough time to detect the block and query
- Short
- These blockers come and go so quickly you might not even notice them.
- You're usually blocked due to lock partitioning and your session running on a scheduler with a higher ID. (as I cover in Async stats update causing blocking)
Ideally, deployments should happen when there's no active locking on the object. This reminded me of WAIT_AT_LOW_PRIORITY feature of the online index rebuilds.
It would be great if this option were available for all schema changes, but since it isn't, I decided to build my own version.
Building blocks
To build this utility, we'll need:
- A time-based loop (infinite loops can go wrong)
- Check for incompatible locks
- The actual code to be deployed
- Most of the code deployments are required to be in a separate batch (in SSMS the batch separator is
GOby default)
- Most of the code deployments are required to be in a separate batch (in SSMS the batch separator is
- End gracefully if you cannot get the lock in the defined time
- Any other safety measures
The main challenge was how to put the deployment script in a separate batch. The first method that came to mind was using dynamic SQL, but that would mean using dynamic SQL (and all its pitfalls like double apostrophes, formatting issues, lack of syntax highlighting, etc.). That's why I chose the rarely used GOTO.
GO-bordered deploy slot. If your change spans multiple batches, you will need to wrap each batch in its own copy of the loop.
The script
SET DEADLOCK_PRIORITY LOW
SET XACT_ABORT ON
SET LOCK_TIMEOUT 2000 /* milliseconds */
SET NOEXEC OFF /* resets the noexec on that's set below */
DECLARE @startTime DATETIME2(0) = SYSDATETIME()
DECLARE @max_duration INT = 5 /* minutes */
/* <-- INSERT THE FULL schema.objectname */
DECLARE @objectId BIGINT = (SELECT OBJECT_ID(''))
/* Assuming the same DB context, but feel free to modify */
DECLARE @dbId INT = (SELECT DB_ID())
RAISERROR ('Starting the wait loop', 0, 1) WITH NOWAIT
WHILE DATEADD(MINUTE, @max_duration, @startTime) > SYSDATETIME()
BEGIN
IF NOT EXISTS ( /* Check if an incompatible lock exists */
SELECT TOP 1
1
FROM sys.dm_tran_locks AS dtl
WHERE
dtl.resource_database_id = @dbId
AND dtl.resource_associated_entity_id = @objectId
AND dtl.request_status = N'GRANT'
AND dtl.request_mode = 'SCH-S'
)
BEGIN
GOTO gotta_go_fast
END
WAITFOR DELAY '00:00:00.200' /* Format: hh:mm:ss.mss */
END
/* after the timeloop finishes without success, prevent execution of further code*/
RAISERROR ('Time loop finished without success', 0, 1) WITH NOWAIT
SET NOEXEC ON
gotta_go_fast:
/*
###############################################################################
Insert the deploy code in the batch block below (bordered by GO separators)
###############################################################################
*/
GO
GO
/*
###############################################################################
The end of the deploy batch block
###############################################################################
*/
Explanation
- I start with some safety measures in a form of deadlock priority and lock timeouts
- Variable values are in the subquery format so you can highlight and check the values before running
- Inside the time loop I have the
SCH-Slock check- The
TOP 1 1is not necessary for theEXISTSbut in case you want to test the subquery with values instead of variables
- The
- If no lock is detected, jump to the
gotta_go_fast:label which is right before the batch separatorGO- You can test around which
WAITFORdelay works best for you. The goal is to limit the number of loops so it's not too resource intensive
- You can test around which
- If you cannot find a window of opportunity within the
@max_durationminutes we'll setNOEXEC ONto prevent any further code from runningSET NOEXEC OFFis at the start of the code to reset to the original state
- Do not forget the trailing
GOto properly terminate the batch
SCH-S in the microseconds between the check passing and your ALTER acquiring Sch-M. SET LOCK_TIMEOUT 2000 means the deploy fails fast rather than blocking indefinitely, but it does not eliminate the window. It reduces it.
Additional help
The lock partitioning and schedulers play an important role when it comes to acquiring locks as mentioned in my other blog post.
It helps if we run our deploy at low priority script on the highest available scheduler. That way no other process can "jump the queue" and block our process.
To check current and max scheduler we can use this snippet
SELECT
dot.scheduler_id as currentSchedulerId
, dosi.scheduler_count as totalSchedulerCount
, dot.session_id
FROM sys.dm_os_tasks AS dot
CROSS JOIN sys.dm_os_sys_info as dosi
WHERE dot.session_id = @@spid
If we get a low scheduler, we can just try to disconnect/reconnect to get a higher one. You can keep two sessions open and "reroll" the lower one until you get there.
A more aggressive version of the script
If some blocking is acceptable, replace the no-lock check with a lock threshold check and change the lock timeout to something longer. Unlike ABORT_AFTER_WAIT = BLOCKERS, neither version kills any sessions - the "aggressive" here means proceeding when the active lock count is low enough, not force-evicting blockers.
SET LOCK_TIMEOUT 8000
…
IF /* test for a lock count */
(
SELECT COUNT(1)
FROM sys.dm_tran_locks AS dtl
WHERE
dtl.resource_database_id = @dbId
AND dtl.resource_associated_entity_id = @objectId
AND dtl.request_status = N'GRANT'
AND dtl.request_mode = 'SCH-S'
) <= 5
…
Thank you for reading