Scary Scalar Functions - Part Two: Performance

Page content

Scary Scalar Functions series overview πŸ”—︎

  1. Part One: Parallelism
  2. Part Two: Performance
  3. Part Three: The Cure
  4. Part Four: Your Environment

Changelog πŸ”—︎

Foreword πŸ”—︎

In the second part of this series, we’ll look at how Scalar functions (or UDFs) affect performance.
If you want to follow along, start with the code from part 1.

Set up monitoring πŸ”—︎

To gather the performance metrics, we’ll set up additional monitoring tools. Namely:

  1. Query Store (QS)
  2. Extended Events (XE)
  3. SET STATISTICS IO, TIME ON
  4. Plan explorer

If you don’t have Plan explorer, I highly recommend it.

To start, let’s enable the QS first.

ALTER DATABASE [ScalarFunction]
SET QUERY_STORE = ON (QUERY_CAPTURE_MODE = ALL)

If you need to clear the QS between the runs, use this command.

ALTER DATABASE [ScalarFunction] SET QUERY_STORE CLEAR

Next, we’ll set up the Extended Event session.

CREATE EVENT SESSION ScalarPerformance ON SERVER
ADD EVENT sqlserver.module_end
(
    SET
        collect_statement = 1
    ACTION
    (
        sqlserver.session_id
    )
    WHERE
        sqlserver.client_app_name = N'MonitorXE'
        AND object_type = 'FN'
)
, ADD EVENT sqlserver.sql_statement_completed
(
    ACTION
    (
        sqlserver.query_hash_signed
        , sqlserver.session_id
    )
    WHERE
        sqlserver.client_app_name = N'MonitorXE'
        AND sqlserver.query_hash_signed <> 0
)

This will only track events from a client application name that equals MonitorXE. To do that, I will open a new SSMS connection and add this text to the Options\Additional Connection Parameters tab.

Application Name=MonitorXE;

Like so:

Connection string in change connection window

You can check that it worked with this snippet: SELECT APP_NAME() AS AppName.

Performance tests πŸ”—︎

I’ll cover several test scenarios and analyze the performance using different monitoring tools.
The results will be for the second executions of the queries, so we have compiled and cached plans and all pages in the buffer pool.

DoNothing in a Select statement πŸ”—︎

Queries under test πŸ”—︎

SELECT TOP (10000)
    n.Id
FROM dbo.Nums AS n
ORDER BY n.Id
GO
SELECT TOP (10000)
    dbo.DoNothing(n.Id) AS Id
FROM dbo.Nums AS n
ORDER BY n.Id

Query Store πŸ”—︎

Get the performance data with this query.

SELECT
    CAST(qsq.query_hash AS bigint) AS query_hash_signed
    , LEFT(qsqt.query_sql_text, 100) AS textSample
    , qsrs.last_duration
    , qsrs.last_cpu_time
    , qsrs.last_logical_io_reads
    , qsrs.last_rowcount
    , qsrs.count_executions
FROM sys.query_store_query AS qsq
JOIN sys.query_store_query_text AS qsqt
    ON qsqt.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan AS qsp
    ON qsp.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats AS qsrs
    ON qsrs.plan_id = qsp.plan_id
Where qsqt.query_sql_text LIKE N'SELECT TOP%'

Performance data from Query Store showing scalar function is worse

The duration and CPU time are in microseconds.
We can see that the Scalar function that does nothing is over 20 times worse than its counterpart.

Extended Events πŸ”—︎

Performance data from Extended Events showing Scalar function is worse

I’ve grouped the output by event name and aggregated the duration. The module_end event is fired for each execution of the Scalar function. That means for every row (10k). It doesn’t collect the CPU time.

The data is very similar to Query Store’s (I’ve collected it from separate runs). First, though, I’d like to point out some interesting info.

  • The row_count column for the query with the UDF shows double the values. I’m assuming it’s for each row returned by both the query and the function.
  • Also, the aggregated duration of the function calls is only 23 760, while the query duration is 67 134. That looks like the overhead of the UDF execution was more significant than its duration.

SET STATISTICS IO, TIME ON πŸ”—︎

(10000 rows affected)
Table 'Nums'. Scan count 1, logical reads 275, <truncated for brevity>

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.

(10000 rows affected)
Table 'Nums'. Scan count 1, logical reads 275, <truncated for brevity>

 SQL Server Execution Times:
   CPU time = 60 ms,  elapsed time = 66 ms.

We can see the difference here as well.

Actual execution plan πŸ”—︎

The actual execution plan also shows the UDF duration in the newer schemas (starting with the sql2019 xml schema).

Shows time statistics for single query execution.

CpuTime: CPU time in milliseconds
ElapsedTime: elapsed time in milliseconds
UdfCpuTime: Cpu time of UDF in milliseconds
UdfElapsedTime: Elapsed time of UDF in milliseconds

<QueryTimeStats CpuTime="21" ElapsedTime="51" UdfCpuTime="16" UdfElapsedTime="16" />

Plan explorer πŸ”—︎

Performance data from Plan Explorer showing Scalar function is worse

I like how the tool highlights the UDF problem so you can see it straight away. However, one double-edged sword is that it also tries to collect the execution plans (or at least sample) of the UDFs.

I’ve run into situations when my UDFs had multiple statements, and it ran over many rows, so that the Plan Explorer collection couldn’t keep up or ran out of memory.

But when it does, it gives you great insights, like these UDF warnings:

Plan warnings about UDFs in the Plan Explorer

I’ve re-run the tests for different row counts, and the differences are staggering.

Plan warnings about UDFs in the Plan Explorer

DoLookup in a Select statement πŸ”—︎

I’ll create a new function that actually does something for a change.
Let’s prepare some supporting tables first.

CREATE TABLE dbo.LookupTable
(
    Id int NOT NULL PRIMARY KEY
    , LookupValue varchar(50) NOT NULL
    , Filler char(100) NOT NULL
)

INSERT INTO dbo.LookupTable WITH (TABLOCK)
(Id, LookupValue, Filler)
SELECT TOP (26)
    n.Id, CHAR(64 + n.Id), ''
FROM dbo.Nums AS n
ORDER BY n.Id

It’s a dummy lookup table for the test.
What’s important is that a seek against this table uses 2 logical reads.

(1 row affected)
Table 'LookupTable'. Scan count 0, logical reads 2, <truncated for brevity>

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

And that’s what we’ll do in the Scalar function

CREATE OR ALTER FUNCTION dbo.DoLookup(@Id int)
RETURNS varchar(50)
AS
BEGIN
    DECLARE @ReturnId varchar(50)
    SET @ReturnId =
    (
        SELECT
            lt.LookupValue
        FROM dbo.LookupTable AS lt
        WHERE lt.Id = @Id
    )

    RETURN @ReturnId
END

For the performance test, I’ll use these queries.

SELECT TOP (10000)
    n.Id
    , lt.LookupValue
FROM dbo.Nums AS n
LEFT JOIN dbo.LookupTable AS lt
    ON n.Id % 27 = lt.Id
ORDER BY n.Id
OPTION (MAXDOP 1)
GO
SELECT TOP (10000)
    n.Id
    , dbo.DoLookup(n.Id % 27)
FROM dbo.Nums AS n
ORDER BY n.Id
OPTION (MAXDOP 1)

⚠️ It’s not a good idea to JOIN on a calculation with a modulo % operator, but I didn’t want to introduce a more complex table or change the one from Part 1. It won’t affect the testing.

Hidden reads πŸ”—︎

Let’s check first with the SET STATISTICS IO, TIME ON.

(10000 rows affected)
Table 'Nums'. Scan count 1, logical reads 296, <truncated for brevity>
Table 'LookupTable'. Scan count 0, logical reads 20000, <truncated for brevity>

 SQL Server Execution Times:
   CPU time = 10 ms,  elapsed time = 11 ms.

(10000 rows affected)
Table 'Nums'. Scan count 1, logical reads 275, <truncated for brevity>

 SQL Server Execution Times:
   CPU time = 368 ms,  elapsed time = 401 ms.

The first query has extra 20k reads against a LookupTable, but a duration of only 11 ms.
The second query shows only the 275 reads from the Nums table.

Nowhere does it show the LookupTable, and that’s why the duration of 401 ms seems inexplicably high.

We must use the Query Store, XE or Plan Explorer to see those reads.

Query Store Query Store showing the hidden reads

Plan Explorer Plan Explorer showing the hidden reads

The performance gap grows even further when the Scalar function repeats work for each row.

Performance data from Plan Explorer showing Scalar function is worse

DoNothing in a Check constraint πŸ”—︎

I’ll create two tables that I TRUNCATE between the test runs.

CREATE TABLE dbo.PositiveId
(
    Id int PRIMARY KEY
    , CONSTRAINT CK_PositiveId CHECK (Id > 0)
)
GO
CREATE TABLE dbo.PositiveIdScalar
(
    Id int PRIMARY KEY
    , CONSTRAINT CK_PositiveIdScalar CHECK (dbo.DoNothing(Id) > 0)
)

The test query

INSERT INTO dbo.PositiveId (Id)
SELECT TOP (10000)
    n.Id
FROM dbo.Nums AS n
ORDER BY n.Id
GO
INSERT INTO dbo.PositiveIdScalar (Id)
SELECT TOP (10000)
    n.Id
FROM dbo.Nums AS n
ORDER BY n.Id

And the results

Performance data from Plan Explorer showing Scalar function is worse

DoNothing in a Computed column πŸ”—︎

Unless the Computed column is persisted (not a default behaviour), there is no change in performance.

That’s because only a definition is saved, not the actual value. Persisting a Computed column requires a deterministic function, so I’ll create a new version with a slight variation.

Otherwise, we would get this error:

Msg 4936, Level 16, State 1, Line 428
Computed column 'ColName' in table 'TableName' cannot be persisted because the column is non-deterministic.
CREATE OR ALTER FUNCTION dbo.DoNothingDeterministic(@Id int)
RETURNS int
WITH SCHEMABINDING -- enables determinism
AS
BEGIN
    RETURN @Id
END
GO

Now we can create the tables

CREATE TABLE IdCalculation
(
    Id int PRIMARY KEY
    , IdCalculation AS Id
)
GO
CREATE TABLE IdCalculationScalarPersisted
(
    Id int PRIMARY KEY
    , IdCalculation AS dbo.DoNothingDeterministic(Id) PERSISTED
)

And the testing queries

INSERT INTO dbo.IdCalculation (Id)
SELECT TOP (10000)
    n.Id
FROM dbo.Nums AS n
ORDER BY n.Id
GO
INSERT INTO dbo.IdCalculationScalarPersisted (Id)
SELECT TOP (10000)
    n.Id
FROM dbo.Nums AS n
ORDER BY n.Id

And finally, the performance results.

Performance data from Plan Explorer showing Scalar function is worse

Recap πŸ”—︎

The performance of Scalar functions is horrendous even when it does nothing.
It worsens when the UDF has multiple statements, more complex logic, reads, etc.

Furthermore, I consider that Scalar UDFs must be destroyed.
Scalar UDF delenda est.
β€” Me

The next article in the series will be about methods to solve the problems related to Scalar functions.