A famous podcast host recorded something he shouldn't have. The video was published, then deleted. But storage archive logs don't forget - and backups don't always sync their deletions.
The problem
This is the abridged case description. Emphasis mine:
You know about Scott Hanselman's incredible video podcasts, right? He usually drops them every week, and they rack up thousands of views. But something mysterious went down with his 900th episode. Rumor has it that Prof. Smoke spilled the beans on some top-secret functionality during the interview. He had a change of heart and insisted the video be scrapped, so it never saw the light of day.
I managed to sneak my way into the public archive logs. These logs have all the juicy details - creation and replication timestamps, access records, and even deletion operations. My hope is that the deletions weren't fully synced into the archive, leaving behind some remnants.
And the main question:
What is the link to the secret interview?
So we need to find a backup URL for a deleted video. Each emphasized phrase in the description maps to a query technique:
- every week → periodic release schedule we can validate with time series analysis
- thousands of views → high read counts that stand out from typical storage blobs
- deletions weren't fully synced → partial deletes where the backup survived
The investigation
Parsing structure from chaos
We get a single table: StorageArchiveLogs. Two columns - Timestamp and EventText. That's it. 17.5 million rows of free-text log entries spanning June 14 to July 15, 2023 - with no structure whatsoever.
StorageArchiveLogs
| take 5

All five samples are read transactions. Given 17.5 million rows, reads probably aren't all there is. Let me filter them out:
StorageArchiveLogs
| where EventText !startswith "Read"
| take 5

Deletes and creates. Now let's get the full breakdown with a sample from each type:
StorageArchiveLogs
| where Timestamp >= datetime(2023-06-14) and Timestamp < datetime(2023-07-16)
| parse EventText with TransactionType " blob transaction:" *
| summarize
Count=count()
, Sample=take_any(EventText)
by TransactionType

Three types, each with a distinct pattern visible in the samples:
- Read:
'{URL}' read access ({N} reads) were detected on the origin - Delete:
'{URL}' backup is completely removed - Create:
'{URL}' backup is created on {BackupURL}
The data is overwhelmingly reads: 17.4 million read events versus about 34,000 deletes and 24,000 creates.
Pre-parsing: parse once, query fast
Every insight we need is buried in that EventText string, and with 17.5 million rows, re-parsing it in every query would be painfully slow. Let's parse once into a proper table:
.set-or-replace ParsedArchiveLogs <|
StorageArchiveLogs
| where Timestamp >= datetime(2023-06-14) and Timestamp < datetime(2023-07-16)
| parse EventText with TransactionType " blob transaction: '" BlobURI "'" *
| extend Host = tostring(parse_url(BlobURI).Host)
| parse EventText with * "(" Reads:long " reads)" *
| parse EventText with * "backup is created on " BackupURL
| project Timestamp, TransactionType, BlobURI, Host, Reads, BackupURL
One parse pass. Now we have indexed columns - TransactionType, BlobURI, Host, Reads, and BackupURL - instead of 17.5 million strings to re-parse on every query.
Finding the podcast
The case description gives us two strong filters: the podcast drops episodes every week and gets thousands of views. In an ocean of random blob storage, that's a distinctive signature.
Let's find which storage hosts get high read counts:
ParsedArchiveLogs
| where TransactionType == 'Read'
| summarize TotalReads=sum(Reads) by BlobURI, Host
| summarize
AvgReadsPerBlob=avg(TotalReads)
, BlobCount=dcount(BlobURI)
by Host
| where AvgReadsPerBlob > 1000
| top 10 by AvgReadsPerBlob desc

Most popular hosts have just 1-2 blobs - one-off viral videos, not a recurring podcast. But okeexeghsqwmda.blob.core.windows.net stands out: 5 blobs with an average of 2,532 reads each. A weekly podcast over 31 days of data would produce roughly 4-5 episodes. That fits.
But we don't want to eyeball it. Let's use time series analysis to be sure.
Anomaly detection: the missing episode
If the deleted episode was published and then scrapped, the podcast's regular viewership pattern would have a gap - a week where the expected spike in views never materialized. That's a negative anomaly.
We build an hourly time series of reads per popular host, then run series_decompose_anomalies looking for negative flags:
let PopularHosts =
ParsedArchiveLogs
| where TransactionType == 'Read'
| summarize TotalReads=sum(Reads) by BlobURI, Host
| summarize AvgReads=avg(TotalReads) by Host
| where AvgReads > 1000
| project Host;
let ReadsSeriesOfPopularHosts =
ParsedArchiveLogs
| where TransactionType == 'Read'
| where Host in (PopularHosts)
| summarize Reads=sum(Reads)
by Host, Timestamp=bin(Timestamp, 1h)
| make-series Reads=sum(Reads) default=0
on Timestamp step 1h
by Host;
ReadsSeriesOfPopularHosts
| extend (flag, score, baseline) = series_decompose_anomalies(Reads)
| mv-expand flag to typeof(real), score to typeof(double)
| where flag < 0
| summarize
NegAnomalyCount=count()
, MaxScore=max(score)
by Host
| order by NegAnomalyCount desc
| take 10

okeexeghsqwmda leads with 19 negative anomaly points. But several other hosts also show negative anomalies. We need another filter.
Confirming weekly seasonality
Here's where series_periods_validate comes in. A weekly podcast has a 168-hour cycle (7 days x 24 hours). We can ask KQL to score how strongly each host's read pattern matches that exact periodicity.
Here's the full runnable query, combining everything from the anomaly section with the periodicity check:
let PopularHosts =
ParsedArchiveLogs
| where TransactionType == 'Read'
| summarize TotalReads=sum(Reads) by BlobURI, Host
| summarize AvgReads=avg(TotalReads) by Host
| where AvgReads > 1000
| project Host;
let ReadsSeriesOfPopularHosts =
ParsedArchiveLogs
| where TransactionType == 'Read'
| where Host in (PopularHosts)
| summarize Reads=sum(Reads)
by Host, Timestamp=bin(Timestamp, 1h)
| make-series Reads=sum(Reads) default=0
on Timestamp step 1h
by Host;
let SuspectedHosts =
ReadsSeriesOfPopularHosts
| extend (flag, score, baseline) = series_decompose_anomalies(Reads)
| mv-expand flag to typeof(real), score to typeof(double)
| where flag < 0
| summarize score=max(score) by Host
| project Host;
ReadsSeriesOfPopularHosts
| where Host in (SuspectedHosts)
| extend (periods, scores) = series_periods_validate(Reads, 168.0)
| extend Period = todouble(periods[0])
| extend PeriodScore = todouble(scores[0])
| project Host, Period, PeriodScore

A periodicity score of 0.82 for okeexeghsqwmda. Every other suspected host scores near zero.
That's the same host name at the top of three independent analyses - most blobs among popular hosts, most negative anomalies, strongest weekly signal. Three different techniques, same answer. That's our podcast.
series_decompose_anomalies(); if it rings all the time, go for series_periods_validate()." We used both here - anomaly detection to find the dip, periodicity validation to confirm the weekly pattern. The combination is powerful: anomalies alone give you false positives, periodicity alone doesn't catch the missing episode. Together, they pinpoint exactly one host.
Finding the backup
Now that we know the host, was anything deleted?
StorageArchiveLogs
| where EventText startswith "Delete"
| where EventText has "okeexeghsqwmda"
| project EventText

One delete: jqfovf.mp4. And the raw text says "backup is partially removed" - not completely. The case description warned that "deletions weren't fully synced." This is exactly that.
Now let's get the backup URL:
ParsedArchiveLogs
| where Host == 'okeexeghsqwmda.blob.core.windows.net'
| where BlobURI has "jqfovf.mp4"
| where TransactionType == 'Create'
| project BackupURL

The answer: https://2023storagebackup.blob.core.windows.net/okeexeghsqwmda/vyskl/jqfovf.mp4
The investigation funnel:
- 17.5M log entries of free text, thousands of hosts
- Pre-parse into columnar format - one pass, fast queries from here on
- Filter to popular hosts (avg reads > 1000) - ~20 remain
- Negative anomaly detection - 6 candidates
- Weekly periodicity validation - 1 host confirmed
- Find backups, cross-reference with deletes - 1 video
Prof. Smoke's secret interview lives on. Another badge earned.
Thank you for reading