KDA: Echoes of Deception - Case 3

Twenty stolen cars, twenty swapped plates. The obvious move - join on VIN - is exactly what the thieves counted on us trying. The cars' last known locations before the swap tell a different story.

The problem

This is the abridged case description. Emphasis mine:

There has been a sudden increase in unsolved cases of stolen cars all across our city, and the police need our help again to crack the case.

We've been given access to a massive dataset of car traffic for over a week, as well as a set of cars that have been stolen. It's possible that the car's identification plates were replaced during the robbery, which makes this case even more challenging.

We need you to put on your detective hat and analyze the data to find any patterns or clues that could lead us to the location of these stolen cars. It is very likely that all the stolen cars are being stored in the same location.

And the main question:

Where are the stolen cars being kept?

More specifically: Avenue and Street numbers.

The investigation

We have two tables - StolenCars and CarsTraffic. StolenCars is just a list of 20 VINs, so let's take a look at CarsTraffic.

CarsTraffic
| take 50

CarsTraffic table showing Timestamp, VIN, Ave, and Street columns - the four fields available to work with

The traffic data covers June 10-16, 2023 - just under a week.

So it looks to be VINs and their respective location at a point in time. My first instinct was to join directly on VIN - but swapped plates mean the stolen car shows up under a completely different VIN after the robbery. The VIN is useless as a stable link across the swap. The location is not.

If a stolen car had its plates changed at location X, then the car that drove away from X shortly afterward is the same vehicle under a new identity. The key insight: find where each stolen VIN was last seen, then find what new VIN appeared at that exact spot within a narrow time window.

The solution

Find the last location (Ave and Street) and time of each stolen VIN
Find new possible VINs - join on the last location within a certain timeframe - let's say 10 minutes
Find last location of the new possible VINs
Aggregate by location and see if any of them has around 20 cars

Note The 10-minute window is a judgment call. Too short and you might miss a slow plate swap; too long and you start pulling in unrelated cars that happened to pass the same junction. Ten minutes felt like a reasonable first guess for a street-level operation.

let LastLocation = 
(
    CarsTraffic
    | where Timestamp >= datetime(2023-06-10) and Timestamp < datetime(2023-06-17)
    | lookup kind = inner StolenCars on VIN
    | summarize arg_max(Timestamp, *) by VIN
    | project-rename
        VinChangeTime = Timestamp
    | distinct Ave, Street, VinChangeTime
);
let NewPossibleVINs = 
(
    CarsTraffic
    | where Timestamp >= datetime(2023-06-10) and Timestamp < datetime(2023-06-17)
    | lookup kind=inner LastLocation on Ave, Street
    // VIN change within 10 minutes
    | where Timestamp >= VinChangeTime and Timestamp < VinChangeTime + 10m
    | distinct VIN
);
NewPossibleVINs
| join kind = inner (CarsTraffic | where Timestamp >= datetime(2023-06-10) and Timestamp < datetime(2023-06-17)) on VIN
| summarize arg_max(Timestamp, *) by VIN
| summarize count() by Ave, Street
| where count_ >= 20
| top 5 by count_ desc

Final result: three locations with 20+ cars. Ave 156, Street 81 has 21 - matching the count of stolen cars

Only three locations came back with 20+ cars, so top 5 returned all three. The one with 21 is the answer: the case said all stolen cars are likely in one place, and 21 matches 20 stolen cars almost exactly. The other two clusters - 70 and 56 cars - are just busy intersections. Way too many to be a stash.

Let's input it as an answer and grab another batch.

Thank you for reading