SQL Wordle Series - Part Two: Playing

Page content

SQL Wordle Series πŸ”—︎

  1. Part One: Cheating
  2. Part Two: Playing

Foreword πŸ”—︎

Continuing the series - you can’t cheat at a game you cannot play. This time I’ve focused on the scoring algorithm. I have a hunch scoring will be useful for finding the optimal strategy.

When I was done, I thought it would be fun to add some input and scoring validation to simulate the original game.

⚠️ Disclaimer: This solution builds on the scripts created in the first part of this series.

Strategy πŸ”—︎

From the start, I want to use a set-based solution. I want to split the words table and have each letter in a separate row. That way I can JOIN or INTERSECT them and generally have more options.

DROP TABLE IF EXISTS dbo.WordLetter
CREATE TABLE dbo.WordLetter
(
    WordId smallint NOT NULL
    , Word char(5) NOT NULL /* so I don't have to join */
    , Letter char(1) NOT NULL
    , Position tinyint NOT NULL
    , CONSTRAINT PK_WordLetter PRIMARY KEY CLUSTERED (Letter, Position, WordId)
    , INDEX IX_WordLetter_Position NONCLUSTERED (Position)
)

/* Unpivot the table */
INSERT INTO dbo.WordLetter WITH (TABLOCKX)
(WordId, Word, Letter, Position)
SELECT
    aw.Id
    , aw.Word
    , SUBSTRING(aw.Word, cj.Position, 1)
    , cj.Position
FROM dbo.AllWords AS aw
CROSS JOIN (VALUES (1), (2), (3), (4), (5)) AS cj(Position)

Duplicate letters πŸ”—︎

One problem I ran into in my earlier solutions was duplicate letters. I’ve tested and debugged on this example. The solution is REBUS and my guess is… GUESS.

Wordle guess and scoring of duplicate letters

Only one S is consumed and that’s the one in 5th position because it’s a perfect match. No other S remains in the solution and that’s why it’s blacked out.

Since I’m only interested in the best match. I can join the two words on their shared letters and calculate the pairings distance like so.

SELECT
    g.Word
    , g.Letter
    , g.Position
    , s.Word
    , s.Letter
    , s.Position
    , ABS(CAST(g.Position AS smallint) - s.Position) AS distance
FROM dbo.WordLetter AS g /* guess */
JOIN dbo.WordLetter AS s /* solution */
    ON g.Letter = s.Letter
WHERE
    g.Word = 'guess'
    AND  s.Word = 'rebus'
ORDER BY g.Position

Duplicate letter scoring in SSMS Grid result

Everything with distance = 0 is a perfect match. Distance > 1 is a partial match and everything else (not in the result) is not a match. I can partition by letter, order by distance and grab only the first row.

The final scoring algorithm wrapped in a function

CREATE OR ALTER FUNCTION dbo.ScoreWordle
(
    @guessId smallint
    , @solutionId smallint
)
RETURNS TABLE
AS
RETURN
    WITH score AS
    (
        SELECT
            g.WordId AS guessWordId
            , s.WordId AS SolutionWordId
            , g.Letter
            , g.Position AS guessPosition
            , s.Position AS solutionPosition
            , ca.dist
            , ROW_NUMBER() OVER (PARTITION BY s.WordId, s.Letter, s.Position ORDER BY ca.dist) AS rn
        FROM
            dbo.WordLetter AS g
            JOIN dbo.WordLetter AS s
                ON g.Letter = s.Letter
            CROSS APPLY
            (VALUES (ABS(CAST(g.Position AS smallint) - s.Position))) AS ca(dist)
        WHERE
            g.WordId = @guessId
            AND s.WordId = @solutionId
    )
    SELECT
        CONCAT
        (
              MAX(CASE WHEN s.guessPosition = 1 THEN ca.score ELSE 'B' END)
            , MAX(CASE WHEN s.guessPosition = 2 THEN ca.score ELSE 'B' END)
            , MAX(CASE WHEN s.guessPosition = 3 THEN ca.score ELSE 'B' END)
            , MAX(CASE WHEN s.guessPosition = 4 THEN ca.score ELSE 'B' END)
            , MAX(CASE WHEN s.guessPosition = 5 THEN ca.score ELSE 'B' END)
        ) AS score
    FROM dbo.WordLetter AS wl
    LEFT JOIN score AS s
        ON wl.WordId = s.guessWordId
        AND s.rn = 1
        AND s.guessPosition = wl.Position
    CROSS APPLY
    (
        VALUES
        (
            CASE
                WHEN s.dist = 0 THEN 'G' /* Green = match letter and position */
                WHEN s.dist > 0 THEN 'O' /* Orange = correct letter, wrong position */
                ELSE 'B'                 /* Black = does not contain the letter */
            END
        )
    ) ca (score)
    WHERE wl.WordId = @guessId
    GROUP BY wl.WordId

The complete game πŸ”—︎

Again, the whole code is also in my GitHub repo. I’m creating several helper objects:

  • GameHistory - table to hold the attempts and scoring
  • PlayWordle - a stored procedure that validates the input and provides the scoring for our guesses
  • ResetGame - another stored procedure to reset a single or all games

Input validation

  • The guessed word must be on the list of available words
  • Either a date or a Wordle number must be provided to match the relevant solution
    • When neither is provided - today’s date is picked
  • The correct solution hasn’t been guessed yet
  • Max number of attempts hasn’t been reached yet
DROP TABLE IF EXISTS dbo.GameHistory
CREATE TABLE dbo.GameHistory
(
    SolutionId smallint NOT NULL
    , GuessAttempt tinyint NOT NULL
    , GuessId smallint NOT NULL
    , Score char(5) NOT NULL
    , CONSTRAINT PK_GameHistory PRIMARY KEY CLUSTERED (SolutionId, GuessAttempt)
)
CREATE OR ALTER PROCEDURE dbo.PlayWordle
(
    @guess char(5)
    , @date date = NULL
    , @wordleNum smallint = NULL    /* has precedence over @date */
)
AS
BEGIN
    SET XACT_ABORT ON
    BEGIN TRY
        /* Input handling */

        DECLARE
            @guessId smallint
            , @solutionId smallint
            , @lastAttemptNum tinyint
            , @score char(5)

        /* Guessed word is valid */
        SET @guessId = (SELECT Id FROM dbo.AllWords WHERE Word = @guess)
        IF (@guessId IS NULL)
            RAISERROR('Guessed word "%s" is invalid', 11, 1, @guess)

        /* Solution Id is provided either directly or through a date */
        SET @solutionId = @wordleNum
        IF @solutionId IS NULL
        BEGIN
            SET @date = ISNULL(@date, GETDATE())

            IF (@date < '2021-06-19 00:00:00.000' OR @date > '2027-10-21 00:00:00.000')
                RAISERROR('Date must be between 2021-06-19 and 2027-10-21', 11, 1)
            SET @solutionId = (SELECT s.Id FROM dbo.Solution AS s WHERE s.SolveDate = @date)
        END

        /* check if game is already finished */
        IF EXISTS
        (
            SELECT 1/0 FROM dbo.GameHistory
            WHERE SolutionId = @solutionId
            AND GameHistory.Score = 'GGGGG'
        )
        BEGIN
            SELECT
                'Please reset using "EXEC dbo.ResetGame @wordleNum = ' + CAST(@solutionId AS varchar(5)) + '".' AS [You already won]
                , gh.GuessAttempt
                , aw.Word
                , gh.Score
            FROM dbo.GameHistory AS gh
            JOIN dbo.AllWords AS aw
                ON gh.GuessId = aw.Id
            WHERE gh.SolutionId = @solutionId
            ORDER BY gh.GuessAttempt

            RETURN
        END

        /* Max number of attempts has not been reached yet */
        SET @lastAttemptNum = (SELECT MAX(GuessAttempt) FROM dbo.GameHistory WHERE SolutionId = @solutionId)
        IF (@lastAttemptNum >= 6)
            RAISERROR('Solution #%i already has 6 attempts. Please reset using "EXEC dbo.ResetGame @wordleNum = %i".',11, 1, @solutionId, @solutionId)

        /* Calculate current guess score */
        SELECT @score = sw.score FROM dbo.ScoreWordle(@guessId, @solutionId) AS sw

        /* Persist the guess attempt */
        INSERT INTO dbo.GameHistory
        (SolutionId, GuessAttempt, GuessId, Score)
        SELECT
            @solutionId, ISNULL(@lastAttemptNum, 0) + 1, @guessId, @score

        SELECT
            gh.GuessAttempt
            , aw.Word
            , gh.Score
        FROM dbo.GameHistory AS gh
        JOIN dbo.AllWords AS aw
            ON gh.GuessId = aw.Id
        WHERE gh.SolutionId = @solutionId
        ORDER BY gh.GuessAttempt

        /* Check for solution or end game condition */
        IF @score = 'GGGGG'
            SELECT CASE @lastAttemptNum + 1
                WHEN 1 THEN 'Genius'
                WHEN 2 THEN 'Magnificent'
                WHEN 3 THEN 'Impressive'
                WHEN 4 THEN 'Splendid'
                WHEN 5 THEN 'Great'
                WHEN 6 THEN 'Phew'
            ELSE
                'You won'
            END AS Victory

        IF (@score <> 'GGGGG' AND @lastAttemptNum = 5)
            SELECT s.Word AS Solution
            FROM dbo.Solution AS s
            WHERE s.Id = @solutionId

    END TRY
    BEGIN CATCH
        ;throw
    END CATCH
END
CREATE OR ALTER PROCEDURE dbo.ResetGame
(
    @wordleNum smallint
    , @deleteAll bit = 0
)
AS
BEGIN
    IF @deleteAll = 1
        TRUNCATE TABLE dbo.GameHistory
    ELSE
        DELETE FROM dbo.GameHistory
        WHERE SolutionId = @wordleNum
END

And you can play like this:

EXEC dbo.ResetGame @wordleNum = 196

EXEC dbo.PlayWordle
    @guess = 'guess'
  , @wordleNum = 196

Enjoy the game!

Looking forward πŸ”—︎

I think there will be a longer break before the next article in the series. Partly due to preparations for sqlbits conference, and due to the problem’s complexity.

I’m not sure how to calculate the optimal guesses yet, and I’m open to suggestions. Thank you for reading.