Generate XML documents efficiently

Foreword 🔗︎

“It depends” is the DBA’s most frequent answer and for a good reason. Most of the time giving good advice really depends on many variables. A good DBA must follow up by explaining “why” it depends.

Another often-repeated mantra is “There are no silver bullets” everything is a trade-off, for example, parallelism might decrease query duration but increases CPU usage.

Well, this blog post (and hopefully series) will try to cover a few of those very rare silver bullets. Now I’m cheating a bit in my definition, these tips will be along the lines “did you know this existed?” and the alternative is writing the query in a sub-optimal way. Let’s get to it then!

TYPE directive in FOR XML 🔗︎

XML is a very polarizing topic in the SQL community, some developers might be lucky and never run into it. But love it or hate it, it is there and will be for a while so we might as well use it in the most efficient manner.

Let’s say our goal is to generate an XML document from a query and pass it to an application or Service Broker, etc. We’re gonna use the FOR XML query. I’m testing this in a fresh SQL Server 2019 instance in the master database.

I’m gonna generate a small XML document into an xml variable both with and without the TYPE directive. I’ll repeat the experiment but save it into the varchar(MAX) and nvarchar(MAX) variables.

/* xml variable, no TYPE directive */
DECLARE @xmlNoType xml
SET @xmlNoType =
(
    SELECT *
    FROM sys.all_objects
    FOR XML PATH(''), ROOT ('Document')
)
SELECT (DATALENGTH(@xmlNoType) / 1024.0) / 1024. AS SizeMB
GO
/* xml variable, TYPE directive */
DECLARE @xmlType xml
SET @xmlType =
(
    SELECT *
    FROM sys.all_objects
    FOR XML PATH(''), ROOT ('Document'), TYPE
)
--SELECT (DATALENGTH(@xmlType) / 1024.0) / 1024. AS SizeMB
GO
/* varchar(MAX) variable, no TYPE directive */
DECLARE @varchar varchar(MAX)
SET @varchar =
(
    SELECT *
    FROM sys.all_objects
    FOR XML PATH(''), ROOT ('Document')
    /*
        Cannot use TYPE:
        Implicit conversion from data type xml to varchar(max) is not allowed. Use the CONVERT function to run this query.
    */
)
SELECT (DATALENGTH(@varchar) / 1024.0) / 1024. AS SizeMB
GO
/* nvarchar(MAX) variable, no TYPE directive */
DECLARE @nvarchar nvarchar(MAX)
SET @nvarchar =
(
    SELECT *
    FROM sys.all_objects
    FOR XML PATH(''), ROOT ('Document')
    /*
        Cannot use TYPE:
        Implicit conversion from data type xml to nvarchar(max) is not allowed. Use the CONVERT function to run this query.
    */
)
SELECT (DATALENGTH(@nvarchar) / 1024.0) / 1024. AS SizeMB
GO
/*
    XML size            0.59 MB
    varchar(max) size    0.90 MB
    nvarchar(max) size    1.80 MB
*/

Results on my machine are the following – first the size. Unsurprisingly both with and without TYPE directive, the XML size is the same – roughly 0.59 MB varchar(MAX) variable is 0.9 MB and nvarchar(MAX) is the double 1.8 MB.

Let’s look at the CPU and elapsed time.

Small XML size. Xml using TYPE has the lowest CPU and Duration

We can see that using the TYPE in our query saved some 6k reads, also the CPU and duration is half of the query without the TYPE. The non-xml variables are not doing bad performance wise, but also more reads and larger data type size.

Let’s compare the actual execution plan:

Execution plan showing warning of type conversion

In the query without the TYPE there is an implicit conversion. The TYPE directive is saying that I want to generate an XML document.
Without that, I’m generating an nvarchar(MAX) and then implicitly casting it to a XML.

DECLARE @whatIsThis SQL_VARIANT
SET @whatIsThis =
(
    SELECT *
    FROM sys.all_objects
    FOR XML PATH(''), ROOT ('Document')
)

That returns an error message:

Msg 206, Level 16, State 2, Line 5
Operand type clash: nvarchar(max) is incompatible with sql_variant

I’ve repeated the tests but doing a cartesian product with several values to enlarge the XML. These are the results for 6 MB and 21 MB XML respectively. The varchar(max) size was 20 MB and 70 MB (nvarchar is doubled).

Medium XML size. Xml using TYPE has the lowest CPU and Duration


Large XML size. Xml using TYPE has the lowest CPU and Duration

We can see that with an increasing document size, the benefits are even greater for such a little change. Thus earning its place in the Silver bullets series. I’ve run into processes that generated XMLs that were over a hundred MBs large and the savings were vast.