Finding nested references
I originally planned this post just as an answer to DBA Stack Overflow question: How can I get the list of tables in all the stored procedure? After preparing a lab environment, I think it deserves its own blog post.
The problem ๐︎
The original question lists these requirements:
- Information about all the tables referenced by a stored procedure
- Also, list tables from cross-DB query
- Return DB, Schema and Table names
The currently highest-rated answer uses sys.dm_sql_referenced_entities which is a bit of an overkill for given requirements. This DMF returns column granularity so to just get the list of tables you have to aggregate.
I think a better answer would use sys.sql_expression_dependencies (which is a subset of sys.dm_sql_referenced_entities anyway
) that would give just the tables.
But I thought - what about views or functions or nested procedures? I’d argue those are part of the main procedure as well and they can refer to tables that are otherwise hidden.
It could also be turtles procedures all the way down. That means I’d have to write a recursive query or a loop. It could even have *shudders* cyclic references.
We’ll need a lab environment to test this.
The lab ๐︎
I’ve used AI Claude to help me generate several objects so I can test my reference script. I haven’t bothered much with formatting on this one.
- Two databases (to test cross-db query)
- Scalar function, view and inline table-valued function (for nested references)
- The main procedure with several statements
- Nested procedure with cyclic reference
Here it is:
CREATE DATABASE Sideline
GO
CREATE DATABASE Main
GO
USE Sideline
GO
CREATE TABLE dbo.Employee
(
EmployeeID int PRIMARY KEY
, FirstName nvarchar(50)
, LastName nvarchar(50)
, Department nvarchar(50)
)
GO
CREATE TABLE dbo.Salary
(
SalaryID int PRIMARY KEY
, EmployeeID int
, Amount decimal(10, 2)
, EffectiveDate date
)
GO
CREATE VIEW dbo.EmployeeSalaryView
AS
SELECT
e.EmployeeID, e.FirstName, e.LastName, s.Amount
FROM
dbo.Employee AS e
JOIN dbo.Salary AS s
ON e.EmployeeID = s.EmployeeID
GO
USE Main
GO
CREATE TABLE dbo.Project
(
ProjectID int PRIMARY KEY
, ProjectName nvarchar(100)
, StartDate date
, EndDate date
)
GO
CREATE TABLE dbo.Task
(
TaskID int PRIMARY KEY
, ProjectID int
, TaskName nvarchar(100)
, AssignedTo int
, Status nvarchar(20)
)
GO
CREATE TABLE dbo.TimeEntry
(
TimeEntryID int PRIMARY KEY
, TaskID int
, EmployeeID int
, Hours decimal(5, 2)
, EntryDate date
)
GO
CREATE TABLE dbo.ProjectMilestone
(
MilestoneID int PRIMARY KEY
, ProjectID int
, MilestoneName nvarchar(100)
, TargetDate date
, CompletionDate date NULL
, Status nvarchar(20)
)
GO
CREATE VIEW dbo.ProjectTaskView
AS
SELECT
p.ProjectID, p.ProjectName, t.TaskID, t.TaskName, t.Status
FROM
dbo.Project AS p
JOIN dbo.Task AS t
ON p.ProjectID = t.ProjectID
GO
CREATE FUNCTION dbo.GetProjectTasks (@ProjectID int)
RETURNS table
AS
RETURN
(
SELECT
Task.TaskID, Task.TaskName, Task.Status
FROM dbo.Task
WHERE Task.ProjectID = @ProjectID
)
GO
CREATE FUNCTION dbo.GetProjectStatus (@ProjectID int)
RETURNS nvarchar(20)
AS
BEGIN
DECLARE @Status nvarchar(20)
SELECT
@Status = CASE
WHEN Project.EndDate < GETDATE ()
THEN 'Completed'
WHEN Project.StartDate > GETDATE ()
THEN 'Not Started'
ELSE 'In Progress'
END
FROM dbo.Project
WHERE Project.ProjectID = @ProjectID
RETURN @Status
END
GO
/* Cannot create circular reference in single step, I'll alter it after the main proc is created */
CREATE PROCEDURE dbo.NestedProc
AS
BEGIN
SELECT
te.TimeEntryID, te.TaskID, te.EmployeeID, te.Hours, te.EntryDate
FROM dbo.TimeEntry AS te
END
GO
CREATE PROCEDURE dbo.MainProc
AS
BEGIN
CREATE TABLE #NewProjects (ProjectID int)
INSERT INTO #NewProjects (ProjectID)
VALUES
(1)
, (2)
-- Statement 1: Join with temp table
SELECT
np.ProjectID, pm.MilestoneID, pm.ProjectID, pm.MilestoneName, pm.TargetDate, pm.CompletionDate, pm.Status
FROM
#NewProjects AS np
JOIN dbo.ProjectMilestone AS pm
ON np.ProjectID = pm.ProjectID
-- Statement 2: Referencing a view, scalar and itv functions
SELECT
pv.ProjectID
, pv.ProjectName
, dbo.GetProjectStatus (pv.ProjectID) AS ProjectStatus
, t.TaskName
, t.Status
FROM
dbo.Project AS p
JOIN dbo.ProjectTaskView AS pv
ON pv.ProjectID = p.ProjectID
CROSS APPLY dbo.GetProjectTasks (pv.ProjectID) AS t
-- Statement 3: Cross-DB references (table and view)
SELECT
t.TaskID, t.TaskName, t.Status, esv.EmployeeID, esv.FirstName, esv.LastName, esv.Amount AS Salary
FROM
Main.dbo.Task AS t
JOIN Sideline.dbo.EmployeeSalaryView AS esv
ON t.AssignedTo = esv.EmployeeID
JOIN Sideline.dbo.Salary AS s
ON esv.EmployeeID = s.EmployeeID
WHERE t.Status = 'In Progress';
-- Statement 4: Call nested procedure with cyclic reference
EXEC dbo.NestedProc
END
GO
/* Add the cyclic reference */
CREATE OR ALTER PROCEDURE dbo.NestedProc
AS
BEGIN
SELECT
te.TimeEntryID, te.TaskID, te.EmployeeID, te.Hours, te.EntryDate
FROM dbo.TimeEntry AS te
-- Circular call to MainProc
EXEC dbo.MainProc
END
GO
Tests ๐︎
Stack Overflow answer ๐︎
Let’s the the original SO answer first. I will be checking the dbo.MainProc
’s dependencies for all the tests.
SELECT DISTINCT
referenced_schema_name,
referenced_entity_name
FROM sys.dm_sql_referenced_entities('dbo.MainProc', 'OBJECT')
ORDER BY referenced_entity_name /* added for comparison with other results */
It doesn’t show that the Salary table and view are from a different database but nothing that can’t be fixed with an additional column.
I’ll compare the raw output with sys.sql_expression_dependencies
sql_expression_dependencies ๐︎
SELECT
dsre.referenced_database_name
, dsre.referenced_schema_name
, dsre.referenced_entity_name
, dsre.referenced_minor_name
, dsre.referenced_id
, dsre.referenced_minor_id
FROM sys.dm_sql_referenced_entities('dbo.MainProc', 'OBJECT') AS dsre
ORDER BY
referenced_entity_name
, dsre.referenced_minor_id
SELECT
sed.referenced_database_name
, sed.referenced_schema_name
, sed.referenced_entity_name
, sed.referenced_id
, sed.referenced_minor_id
FROM sys.sql_expression_dependencies AS sed
WHERE sed.referencing_id = OBJECT_ID('dbo.MainProc')
ORDER BY
sed.referenced_entity_name
, sed.referenced_minor_id
- (1) the first result set has 23 rows due to column information and has to be aggregated to get distinct tablesโฆ
- (2) or filter only entries where
referenced_minor_id = 0
- those refer to the object level dependency - (3) the
sys.sql_expression_dependencies
is missing theobject_id
for any cross-db references
I think both of these DMOs are useful and it’s good to know their differences.
The recursive solution ๐︎
This is my script. It uses a recursive CTE to get all the nested references.
I’m using a cycle detection logic that I saw Itzik Ben-Gan (b) use somewhere.
DROP TABLE IF EXISTS #References
CREATE TABLE #References
(
ObjectId int NOT NULL
, DbId int NOT NULL
, DbName nvarchar(128) NOT NULL
, SchemaName nvarchar(128) NOT NULL
, ObjectName nvarchar(128) NOT NULL
, FullName AS CONCAT(SchemaName, '.', ObjectName) PERSISTED
, objectType nvarchar(60) NOT NULL
, NestingLevel int NOT NULL
, NestingPath varchar(MAX) NOT NULL
, SortPath varbinary(MAX) NOT NULL
/* add indexes as needed */
)
; WITH allReferences
AS
(
SELECT
dsre.referenced_id AS Id
, ca.DbId
--, dsre.referenced_database_name
--, dsre.referenced_schema_name
--, dsre.referenced_entity_name
, 0 AS NestingLevel
, CAST
(
CONCAT
(
'.'
, CAST(ca.DbId AS VARCHAR(MAX))
, '_'
, CAST(dsre.referenced_id AS VARCHAR(MAX))
, '.'
)
AS VARCHAR(max)
) AS NestingPath /* format: .dbId_ObjectId. */
, CAST(CAST(1 AS BINARY(2)) AS VARBINARY(MAX)) AS SortPath
, 0 AS CycleDetection
FROM
sys.dm_sql_referenced_entities('dbo.MainProc', 'object') AS dsre
CROSS APPLY (VALUES (ISNULL(DB_ID(dsre.referenced_database_name), DB_ID()))) AS ca(DbId)
WHERE
dsre.referenced_minor_id = 0 /* Removing column reference granularity */
UNION ALL
SELECT
dsre.referenced_id
, ca.DbId
--, dsre.referenced_database_name
--, dsre.referenced_schema_name
--, dsre.referenced_entity_name
, ar.NestingLevel + 1
, CAST
(
CONCAT
(
ar.NestingPath /* append object to existing path */
, CAST(ca.DbId AS VARCHAR(MAX))
, '_'
, CAST(dsre.referenced_id AS VARCHAR(MAX))
, '.'
)
AS VARCHAR(max)
) AS NestingPath /* format: .dbId_ObjectId. */
, ar.SortPath + CAST(ROW_NUMBER() OVER
(
PARTITION BY ca.dbId, dsre.referenced_id
ORDER BY (SELECT @@SPID)) AS BINARY(2)
) AS SortPath
, IIF
(
ar.NestingPath LIKE CONCAT
(
'%.'
, CAST(ca.DbId AS VARCHAR(MAX))
, '_'
, CAST(dsre.referenced_id AS VARCHAR(MAX))
, '.%'
)
, 1 /* if nestingPath already contains the same subpath it's a cycle */
, 0
) AS CycleDetection
FROM allReferences AS ar
CROSS APPLY sys.dm_sql_referenced_entities(CONCAT(OBJECT_SCHEMA_NAME(ar.Id), '.', OBJECT_NAME(ar.Id)), 'Object') AS dsre
CROSS APPLY (VALUES (ISNULL(DB_ID(dsre.referenced_database_name), DB_ID()))) AS ca(DbId)
WHERE
ar.CycleDetection = 0
AND dsre.referenced_id IS NOT NULL
AND dsre.referenced_minor_id = 0 /* Removing column reference granularity */
)
INSERT INTO #References WITH (TABLOCKX)
(
ObjectId
, DbId
, DbName
, SchemaName
, ObjectName
, objectType
, NestingLevel
, NestingPath
, SortPath
)
SELECT
ar.Id AS ObjectId
, ar.DbId
, DB_NAME(ar.DbId) AS DbName
, OBJECT_SCHEMA_NAME(ar.Id, ar.DbId) AS SchemaName
, OBJECT_NAME(ar.Id, ar.DbId) AS ObjectName
, IIF
(
o.object_id IS NOT NULL /* same DB reference */
, o.type_desc
, 'N/A'
) AS objectType
, ar.NestingLevel
, ar.NestingPath
, ar.SortPath
FROM allReferences AS ar
LEFT JOIN sys.objects AS o
ON ar.Id = o.object_id
AND ar.DbId = DB_ID()
WHERE
ar.CycleDetection = 0
SELECT
*
FROM #References AS r
This uses the sys.dm_sql_referenced_entities
because I don’t want to lose the cross-db object_id
.
The cross-DB view dbo.EmployeeSalaryView
is not expanded though because the DMOs are database-scoped. I didn’t want to complicate this even further with a dynamic SQL (or linked servers for that matter).
I’m also getting duplicates (due to nested references) and that’s why I’m inserting it into temp table #References
so you can slice and dice the data in any way you want it (That’s the way you need it).
For example:
SELECT
r.DbName
, r.FullName
, COUNT(1) AS Cnt
FROM #References AS r
GROUP BY
r.DbName
, r.FullName
ORDER BY
Cnt DESC
Thank you for reading!