SQL Wordle Series - Part Two: Playing

Page content

SQL Wordle Series ๐Ÿ”—︎

  1. Part One: Cheating
  2. Part Two: Playing

Foreword ๐Ÿ”—︎

Continuing the series - you can’t cheat at a game you cannot play. This time I’ve focused on the scoring algorithm. I have a hunch scoring will be useful for finding the optimal strategy.

When I was done, I thought it would be fun to add some input and scoring validation to simulate the original game.

โš ๏ธ Disclaimer: This solution builds on the scripts created in the first part of this series.

Strategy ๐Ÿ”—︎

From the start, I want to use a set-based solution. I want to split the words table and have each letter in a separate row. That way I can JOIN or INTERSECT them and generally have more options.

DROP TABLE IF EXISTS dbo.WordLetter
CREATE TABLE dbo.WordLetter
(
	WordId smallint NOT NULL
	, Word char(5) NOT NULL /* so I don't have to join */
	, Letter char(1) NOT NULL
	, Position tinyint NOT NULL
	, CONSTRAINT PK_WordLetter PRIMARY KEY CLUSTERED (Letter, Position, WordId)
	, INDEX IX_WordLetter_Position NONCLUSTERED (Position)
)

/* Unpivot the table */
INSERT INTO dbo.WordLetter WITH (TABLOCKX)
(WordId, Word, Letter, Position)
SELECT
	aw.Id
	, aw.Word
	, SUBSTRING(aw.Word, cj.Position, 1)
	, cj.Position
FROM dbo.AllWords AS aw
CROSS JOIN (VALUES (1), (2), (3), (4), (5)) AS cj(Position)

Duplicate letters ๐Ÿ”—︎

One problem I ran into in my earlier solutions was duplicate letters. I’ve tested and debugged on this example. The solution is REBUS and my guess isโ€ฆ GUESS.

Wordle guess and scoring of duplicate letters

Only one S is consumed and that’s the one in 5th position because it’s a perfect match. No other S remains in the solution and that’s why it’s blacked out.

Since I’m only interested in the best match. I can join the two words on their shared letters and calculate the pairings distance like so.

SELECT
	g.Word
	, g.Letter
	, g.Position
	, s.Word
	, s.Letter
	, s.Position
	, ABS(CAST(g.Position AS smallint) - s.Position) AS distance
FROM dbo.WordLetter AS g /* guess */
JOIN dbo.WordLetter AS s /* solution */
	ON g.Letter = s.Letter
WHERE
	g.Word = 'guess'
	AND  s.Word = 'rebus'
ORDER BY g.Position

Duplicate letter scoring in SSMS Grid result

Everything with distance = 0 is a perfect match. Distance > 1 is a partial match and everything else (not in the result) is not a match. I can partition by letter, order by distance and grab only the first row.

The final scoring algorithm wrapped in a function

CREATE OR ALTER FUNCTION dbo.ScoreWordle
(
	@guessId smallint
	, @solutionId smallint
)
RETURNS TABLE
AS
RETURN
	WITH score AS
	(
		SELECT
			g.WordId AS guessWordId
			, s.WordId AS SolutionWordId
			, g.Letter
			, g.Position AS guessPosition
			, s.Position AS solutionPosition
			, ca.dist
			, ROW_NUMBER() OVER (PARTITION BY s.WordId, s.Letter, s.Position ORDER BY ca.dist) AS rn
		FROM
			dbo.WordLetter AS g
			JOIN dbo.WordLetter AS s
				ON g.Letter = s.Letter
			CROSS APPLY
			(VALUES (ABS(CAST(g.Position AS smallint) - s.Position))) AS ca(dist)
		WHERE
			g.WordId = @guessId
			AND s.WordId = @solutionId
	)
	SELECT
		CONCAT
		(
			  MAX(CASE WHEN s.guessPosition = 1 THEN ca.score ELSE 'B' END)
			, MAX(CASE WHEN s.guessPosition = 2 THEN ca.score ELSE 'B' END)
			, MAX(CASE WHEN s.guessPosition = 3 THEN ca.score ELSE 'B' END)
			, MAX(CASE WHEN s.guessPosition = 4 THEN ca.score ELSE 'B' END)
			, MAX(CASE WHEN s.guessPosition = 5 THEN ca.score ELSE 'B' END)
		) AS score
	FROM dbo.WordLetter AS wl
	LEFT JOIN score AS s
		ON wl.WordId = s.guessWordId
		AND s.rn = 1
		AND s.guessPosition = wl.Position
	CROSS APPLY
	(
		VALUES
		(
			CASE
				WHEN s.dist = 0 THEN 'G' /* Green = match letter and position */
				WHEN s.dist > 0 THEN 'O' /* Orange = correct letter, wrong position */
				ELSE 'B' /* Black = does not contain the letter */
			END
		)
	) ca (score)
	WHERE wl.WordId = @guessId
	GROUP BY wl.WordId

The complete game ๐Ÿ”—︎

Again, the whole code is also in my GitHub repo. I’m creating several helper objects:

  • GameHistory - table to hold the attempts and scoring
  • PlayWordle - a stored procedure that validates the input and provides the scoring for our guesses
  • ResetGame - another stored procedure to reset a single or all games

Input validation

  • The guessed word must be on the list of available words
  • Either a date or a Wordle number must be provided to match the relevant solution
    • When neither is provided - today’s date is picked
  • The correct solution hasn’t been guessed yet
  • Max number of attempts hasn’t been reached yet
DROP TABLE IF EXISTS dbo.GameHistory
CREATE TABLE dbo.GameHistory
(
	SolutionId smallint NOT NULL
	, GuessAttempt tinyint NOT NULL
	, GuessId smallint NOT NULL
	, Score char(5) NOT NULL
	, CONSTRAINT PK_GameHistory PRIMARY KEY CLUSTERED (SolutionId, GuessAttempt)
)
CREATE OR ALTER PROCEDURE dbo.PlayWordle
(
	@guess char(5)
	, @date date = NULL
	, @wordleNum smallint = NULL	/* has precedence over @date */
)
AS
BEGIN
	SET XACT_ABORT ON
	BEGIN TRY
		/* Input handling */
		
		DECLARE 
			@guessId smallint
			, @solutionId smallint
			, @lastAttemptNum tinyint
			, @score char(5)

		/* Guessed word is valid */
		SET @guessId = (SELECT Id FROM dbo.AllWords WHERE Word = @guess)
		IF (@guessId IS NULL)
			RAISERROR('Guessed word "%s" is invalid', 11, 1, @guess)

		/* Solution Id is provided either directly or through a date */
		SET @solutionId = @wordleNum
		IF @solutionId IS NULL 
		BEGIN 
			SET @date = ISNULL(@date, GETDATE())

			IF (@date < '2021-06-19 00:00:00.000' OR @date > '2027-10-21 00:00:00.000')
				RAISERROR('Date must be between 2021-06-19 and 2027-10-21', 11, 1)
			SET @solutionId = (SELECT s.Id FROM dbo.Solution AS s WHERE s.SolveDate = @date)
		END 

		/* check if game is already finished */
		IF EXISTS 
		(
			SELECT 1/0 FROM dbo.GameHistory 
			WHERE SolutionId = @solutionId 
			AND GameHistory.Score = 'GGGGG'
		)
		BEGIN 
			SELECT 
				'Please reset using "EXEC dbo.ResetGame @wordleNum = ' + CAST(@solutionId AS varchar(5)) + '".' AS [You already won]
				, gh.GuessAttempt
				, aw.Word
				, gh.Score
			FROM dbo.GameHistory AS gh
			JOIN dbo.AllWords AS aw
				ON gh.GuessId = aw.Id
			WHERE gh.SolutionId = @solutionId
			ORDER BY gh.GuessAttempt

			RETURN
		END 

		/* Max number of attempts has not been reached yet */
		SET @lastAttemptNum = (SELECT MAX(GuessAttempt) FROM dbo.GameHistory WHERE SolutionId = @solutionId)
		IF (@lastAttemptNum >= 6)
			RAISERROR('Solution #%i already has 6 attempts. Please reset using "EXEC dbo.ResetGame @wordleNum = %i".',11, 1, @solutionId, @solutionId)

		/* Calculate current guess score */
		SELECT @score = sw.score FROM dbo.ScoreWordle(@guessId, @solutionId) AS sw

		/* Persist the guess attempt */
		INSERT INTO dbo.GameHistory 
		(SolutionId, GuessAttempt, GuessId, Score)
		SELECT
			@solutionId, ISNULL(@lastAttemptNum, 0) + 1, @guessId, @score

		SELECT 
			gh.GuessAttempt
			, aw.Word
			, gh.Score
		FROM dbo.GameHistory AS gh
		JOIN dbo.AllWords AS aw
			ON gh.GuessId = aw.Id
		WHERE gh.SolutionId = @solutionId
		ORDER BY gh.GuessAttempt

		/* Check for solution or end game condition */
		IF @score = 'GGGGG'
			SELECT CASE @lastAttemptNum + 1
				WHEN 1 THEN 'Genius'
				WHEN 2 THEN 'Magnificent'
				WHEN 3 THEN 'Impressive'
				WHEN 4 THEN 'Splendid'
				WHEN 5 THEN 'Great'
				WHEN 6 THEN 'Phew'
			ELSE 
				'You won'
			END AS Victory

		IF (@score <> 'GGGGG' AND @lastAttemptNum = 5)
			SELECT s.Word AS Solution
			FROM dbo.Solution AS s
			WHERE s.Id = @solutionId
	
	END TRY
	BEGIN CATCH
		;throw
	END CATCH
END
CREATE OR ALTER PROCEDURE dbo.ResetGame
(
	@wordleNum smallint
	, @deleteAll bit = 0
)
AS
BEGIN
	IF @deleteAll = 1
		TRUNCATE TABLE dbo.GameHistory
	ELSE
		DELETE FROM dbo.GameHistory
		WHERE SolutionId = @wordleNum
END

And you can play like this:

EXEC dbo.ResetGame @wordleNum = 196

EXEC dbo.PlayWordle
	@guess = 'guess'
  , @wordleNum = 196

Enjoy the game!

Looking forward ๐Ÿ”—︎

I think there will be a longer break before the next article in the series. Partly due to preparations for sqlbits conference, and due to the problem’s complexity.

I’m not sure how to calculate the optimal guesses yet, and I’m open to suggestions. Thank you for reading.