Good news everyone!
Yesterday I finished rebuilding the Eternity II tile set, and today I determined the source of the anomalies in the growth of generated boards.
Rebuilding the tile set just took a bunch of time to sit down and go through the physical tile pieces and convert them to a CSV file. I won’t bore you with those details.
More interesting is the programming error. Let’s see what was corrupting my data! Let’s start with this handy SQL statement:
SELECT max(id) AS id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
LIMIT 1
What do you think this query accomplishes? You can probably determine the intent. I was trying to get the maximum tile id below some limit, and some data stored with it. My runner randomly generates the limit value, hopefully giving a good mix of rows from different parts of the computation.
However, there is an interesting thing that happens with this query. It returns the correct id value, but it returns the data portion from an arbitrary row. It returned whatever the database engine happens to read first that meets the WHERE clause. This means I was processing the same board configuration multiple times, but skipping the ones identified by the id. This would cause both a lot of wasted calculations, and a lot of missed configurations.
Oof, no fun at all. A restart was really required.
After some digging and experimentation, the corrected query looks like this:
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1
Moving the “max” calculation from the fields section of the query into the ORDER BY clause means the query will return the paired row id and data together.
Now that I have corrected the error, on to processing board configurations!