Eternity II Processing Update – Jan 2020

Another half, another Eternity II update. This half was a good half for the runner, the cluster infra was doing a fairly good job of staying up and processing on the Eternity II job. You can see that in the nice upward curve that started in September.

The trend for percentage of boards investigated continues to be a downward one. This is appears even more evident in the graph, now with 27% processed, but that is only an increase in 1%.

Depth is crawling, with the halfway point remaining at about 24 tiles placed. On October 19th we found a board with 156 tiles placed. It was again rapidly marked as Invalid, but we still strive to get deeper into the board space.

On December 15 we finished the last 20 depth board. 20 only had 413 boards left to be searched in July, so it took quite a long time for so few boards, the focus was really on higher level boards. But, for perspective, 21 has over 10,000,000 boards left to investigate, so we may not see the end of that board in the next six months unless I tweak the algorithm to be more breadth first than depth first. But since the search space still seems to be so large I have little hope of completing it by doing a full search, I’d rather keep it more oriented towards depth first.

Ricochet Robots

Ricochet Robots is a path finding board game where you place robots on a board and then try to find an optimal path to get those robots to different targets on the board. I play this game frequently with a group of friends and typically we can get robots to move there in a 10 moves or less, but sometimes we get truly stumped with scenarios that require 20 or more moves. We are pretty sure there’s usually a more optimal solution in these cases, but without finding it, who knows? So to answer that, I decided I’m going to write a solver, with a UI and everything. Maybe I can get one of those gaming display table things to play on instead of the physical board.

The game, Ricochet Robots.

This is another relatively hard problem to solve, in fact, there’s a research paper that established that the game is NP-hard! However, the parallels between this and the Eternity II puzzle are striking. A simple solver will just walk through the different states that can be reached, but that may never resolve due to the complexity of the problem. However, I really want to be able to resolve the question in real time while my friends and I are playing, we’ll see how it goes!

Eternity II Processing Update – July 2019

Checking in with another update on the progress of the Eternity II solver. Progress has been pretty good for the past six months, though we’re starting to hit more bottlenecks, probably as we’re getting more data in the database. Currently my cluster is not processing as I do some maintenance on it, which is taking a lot longer than I expected.

I got a new machine with a Threadripper as a main device, and so decided to throw it at this job instead of just the cluster. That results in a significant uptick in processing due to the raw compute of the Threadripper (woo hoo!). As you can see we’re over 150 million boards, with about 28% of the boards that have been found already processed. This is a decrease in the percentage of boards processed from 36% back in January. Seems we’re still generating boards faster than we can investigate them.

For depth, we’re seeing a similar trend, but we are getting a little deeper into the search graph. About half of the boards are at depth 24 or better, an improvement of 1 since January. In fact, on June 6 we finished processing the last of the board with 19 tiles placed, so all active boards have at least 20 tiles, pretty good progress from the bottom. As for progress at the top, that was an early find. On January 27th a board was found with 154 pieces placed. An improvement by 4, but, I was hoping to see more improvement over the latter five months.

Eternity II Processing Update – Jan 2019

Now that I’ve got the Eternity II solver running on the cluster in a sustained way and I don’t intend to make any major changes to it, it’s a good time to see how it’s doing! Though it was exciting to get the clue pieces, unfortunately it meant I had to restart the run, which got setup in November, which means I’ve run for just over a month. That said, the progress is pretty good!

We’re at nearly 20 million boards generated, with about 5 million of those boards tested against. Done boards are boards that produced new results to look at, while Invalid boards are boards that were shown to result in 0 possible configurations after examining it. It’d be nice if that Invalid count was a little higher as each invalid board prunes off large swaths of the search space, but can’t expect too much. This is a hard problem after all.

I’m also happy with how deep we’ve managed to get. The board selection is random, but definitely favors deeper boards (boards with more pieces placed) than shallow boards. On December 26th it found a board with a depth of 150, but it was rather quickly marked as Invalid since no other pieces could be found. Over half of the boards have 23 pieces placed or less, so there’s still a lot of work to be done!

Found the Clue Pieces!

The Eternity II puzzle had secondary puzzles, called Clue Puzzles, that were shipped as well. Unfortunately the sites that support those Clue Puzzles went out of business, so I can no longer submit an answer for one, even if I could find one to buy. However, that doesn’t mean the Internet doesn’t still have the information available, and while scouring the Internet I managed to find someone that had the answers and was willing to share it with me. This is great news!

To express how great this news is, I ran some numbers. Roughly speaking, the number of possible tile placements for the Eternity II puzzle without the hints is:

652,462,655,969,994,559,741,103,489,614,177,039,095,430,462,067,869,985,328,314,734,123,815,335,654,941,630,696,773,132,693,354,219,109,906,288,103,573,501,531,150,317,158,003,508,504,294,576,898,531,816,917,242,939,827,675,255,753,862,937,689,127,386,664,152,157,193,215,761,270,180,463,221,783,405,432,213,823,989,543,602,829,637,372,441,786,368,232,967,396,104,308,262,351,073,321,219,956,381,058,597,901,130,302,400,003,222,913,210,852,993,216,323,986,907,423,192,907,547,317,559,294,078,158,801,148,729,797,579,289,872,095,986,642,075,633,102,503,292,713,540,279,428,817,217,361,773,920,428,869,170,826,800,909,973,643,591,680,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

That’s A LOT of zeroes. So many zeroes that I’d potentially have to run through to find the solution. This is the problem with a brute force attempt at solving this problem, but it’s the best solution I have until I can find a better one.

Now, if I place the hint pieces, that reduces the count to: 16,471,454,954,102,004,915,949,763,791,844,744,430,585,229,828,359,845,826,807,586,676,468,871,926,460,454,921,662,799,498,975,527,462,257,803,719,082,043,944,234,353,763,593,218,787,245,960,053,433,530,332,704,640,618,075,985,077,669,619,720,398,781,085,382,876,707,264,497,078,741,193,820,374,052,661,483,240,397,076,615,679,691,078,850,369,645,098,379,778,994,469,343,042,697,011,720,829,508,275,095,401,159,313,602,972,897,407,959,503,917,009,744,002,977,670,417,479,366,472,013,021,278,665,875,338,440,986,474,644,383,629,748,795,479,278,395,699,139,113,766,925,316,308,824,626,251,133,079,516,264,401,967,390,287,387,269,358,864,142,491,487,460,862,832,061,116,222,847,000,843,986,110,919,266,611,017,828,131,398,382,602,948,421,100,528,168,747,679,034,091,634,765,198,599,351,059,491,424,212,826,980,352,000,000,000,000,000

That’s still a lot of zeroes, but notably it’s 34 zeroes less, or a reduction of a factor of 1 decillion.

Also unfortunately, this requires me to restart my run, but I think that this is an okay trade-off considering the number of combinations I’m eliminating overall.

The Importance of Analyzing and Optimizing your SQL Queries

I’m sure many of you have heard the importance of optimizing your SQL queries for the optimizer. You may have doubted this, maybe you thought your query was too simple. Well, this post is going to show you exactly why query optimization is so important.

I’ll start with the graph:

For statistical analysis my solver periodically gathers statistics about the state of the database. The graph above shows the number of boards by status. “New” boards have not been processed yet, while “Done” boards have.

As you can see over the initial few days it generated about 2.8 million boards. After I optimized only one of my queries and in the past 18 hours have generated over 18 million boards.

I’m sure you remember the query we fixed from last week, that’s the query I optimized. Let’s look at it again:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1

Looking at the status of the server showed the database engine was pegging the CPU. Let’s figure out why that is!

MariaDB has a handy tool, SHOW PROCESSLIST to view what the server is processing at the moment. While running with the old query, that query showed up every time I tried it, so I decided to investigate the query. The ANALYZE tool was made to help investigate query performance:

ANALYZE
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= 14537478
ORDER BY id DESC LIMIT 1

FieldValue
id1
select_typeSIMPLE
tableboards
typeref
possible_keysPRIMARY, status, id_status
keystatus
key_len1
refconst
rows11,369,300
r_rows8,014,010.00
filtered100.00
r_filtered0.00
ExtraUsing where

The analyze output indicates even though we’re using the id_status key values in our where clause, it still uses the key created on status. This is problematic, because it means that even though we should be able to use the key built exactly to optimize this query, we ended up reading over 8 million rows to get the table result. That’s a huge cost to grab a single row.

So why are we doing this? Turns out, MariaDB doesn’t support DESC order keys. So that ORDER BY id DESC is causing us to not use the Primary Key, and we end up going to the status key, and then with those results we search for the id closest to the target.

That means to get the benefit of the Primary Key we need to invert the query:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id >= ?
ORDER BY id ASC LIMIT 1

Here’s the analyze result for that:

FieldValue
id1
select_typeSIMPLE
tableboards
typeref
possible_keysPRIMARY, status, id_status
keystatus
key_len1
refconst
rows11,590,310
r_rows1
filtered100.00
r_filtered100.00
ExtraUsing index condition; Using where

So we’re reading only 1 row, which is good, but we’re still using the status key. And looking at the behavior of the server, we’re still pegging the CPU. Something is still wrong, the database engine isn’t using the right index. Well, MariaDB gives us a way to give the database engine a hint. The result is this query:

SELECT id, tiles
FROM boards
USE INDEX (id_status)
WHERE status = 'NEW' AND id >= 14537478
ORDER BY id ASC LIMIT 1

With this analyze:

FieldValue
id1
select_typeSIMPLE
tableboards
typerange
possible_keysid_status
keyid_status
key_len9
ref(NULL)
rows11,626,588
r_rows1
filtered100.00
r_filtered100.00
ExtraUsing index condition

Finally we’re using the right index, and that means some other interesting things happen. For instance, we stop using the WHERE clause and are really just using the index. The query is much faster, even the ANALYZE finished immediately while the prior ANALYZE instances took over a minute. Now after running that query, the server’s CPU load has fallen to below 10%. This means we should be able to scale out the runner a bit better, at least, until we start hitting our next bottleneck.

Tile Set Re-completed, Error Found

Good news everyone!

Yesterday I finished rebuilding the Eternity II tile set, and today I determined the source of the anomalies in the growth of generated boards.

Rebuilding the tile set just took a bunch of time to sit down and go through the physical tile pieces and convert them to a CSV file. I won’t bore you with those details.

More interesting is the programming error. Let’s see what was corrupting my data! Let’s start with this handy SQL statement:

SELECT max(id) AS id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
LIMIT 1

What do you think this query accomplishes? You can probably determine the intent. I was trying to get the maximum tile id below some limit, and some data stored with it. My runner randomly generates the limit value, hopefully giving a good mix of rows from different parts of the computation.

However, there is an interesting thing that happens with this query. It returns the correct id value, but it returns the data portion from an arbitrary row. It returned whatever the database engine happens to read first that meets the WHERE clause. This means I was processing the same board configuration multiple times, but skipping the ones identified by the id. This would cause both a lot of wasted calculations, and a lot of missed configurations.

Oof, no fun at all. A restart was really required.

After some digging and experimentation, the corrected query looks like this:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1

Moving the “max” calculation from the fields section of the query into the ORDER BY clause means the query will return the paired row id and data together.

Now that I have corrected the error, on to processing board configurations!

Data Corruption – Resetting Eternity II Progress

Disaster has struck!

I was investigating anomalies in the growth of generated boards and counted the occurrences of each side in my tile configuration. This allowed me to determine at what scale the growth should be at each level.

Unfortunately, doing so revealed my tile set is corrupt! Two of the sides occur an odd number of times in my data set!

Because this is a tile matching problem, the number of occurrences of each side must be even. This is apparent because each side must match with another side of the same type, requiring two same sides for each edge. Long story short, if any side appears an odd number of times, I can never complete the puzzle.

This corrupts my entire generated board state so far and requires me to reset and with a regenerated tile set. Boo!

I think something else is wrong with the generation algorithm and need to investigate that. For now, I’m just going to go through the process of regenerating my tile set.

Efficiently Representing Board State

“Our calculations are that if you used the world’s most powerful computer and let it run from now until the projected end of the universe, it might not stumble across one of the solutions.”
– Lord Monckton, One of the Designers of the Eternity II puzzle

An algorithm for generating partial boards for Eternity II is reasonably straightforward. As is an algorithm for checking if the board you have is a solution to the puzzle. However, the scale of the problem means it’s not feasible to set up an algorithm that generates boards and checks correctness until you find one. It would simply take too long and if the computer you’re running it on crashes, you’d have to restart. Not a good place to be.

This means, we need to store state as we go, nearly continuously. This way we lose as little progress as possible if something breaks along the way.

The algorithm I’m using takes the current board state and determines the “optimal” place to set the next batch of tiles. Once that location has been determined it generates the board states representing placing all possible tiles in that location and stores them. Then we grab a new board state and try again. So for the sake of this design, each “step” is placing one new tile on the board. I’ll go over this algorithm in more detail in a later post. For now, let’s talk about storage.

To represent board state, we need to represent the placement and orientation of all tiles on the 16×16 grid. For simplicity we’ll represent the tiles with numbers and the coordinate system of the grid can be the 1st quadrant of a xy-plane. So that means to store the board state itself we’ll need to store a set of tuples of tile ids, x, y coordinates, and orientation. So a set of tiles would be stored like this:

(37, 5, 6, 0)
(48, 5, 7, 180)

I used the number of degrees we rotate the tile around the unit circle to represent the orientation. If we have a full solution to the puzzle, it would involve a set of 256 of these tuples.

Now, with this being all we need let’s get this as small as we can.

The tile id is a number between 1 and 256. All the tiles came pre-numbered, so the numbering system is already given. And fortunately 256 is 28, so we know we can use just 8 bits to represent it. Unfortunately storing 256 itself actually requires the 9th bit, due to how unsigned numbers work in binary. It’d be great to not waste that extra bit, so we need a solution. One solution is to subtract one from each number, but an even better solution a friend told me a long time ago is to just convert 256 itself to 0. This means most of our tiles are exactly as they are represented in the physical world, except for 256, which is 0.

So now we can neatly get the tile id into an 8-bit field, or 1 byte.

The x and y coordinates are on a 16×16 grid. 16 is 24, so we can use the same approach we used for tile ids, and get each dimension into 4 bits. Given two dimensions that’s another 8 bits.

So now we’re at a 16-bit field, or 2 bytes.

That leaves orientation. Above I represented orientation as the number of degrees rotated around the unit circle. There are 360 degrees around a circle which, while not a power of two, will fit in 9 bits. So we could continue to store this and use another 9 bits, or 3 bytes and 1 bit. However, we can’t really use all 360 degrees, these tiles are all squares, so we can only use increments of 90 degrees. That means we only need to consider 4 orientations, 0°, 90°, 180°, and 270°. This can be represented in 2 bits.

So the total storage needed to store a single tile position is 2 bytes and 2 bits.

But then we hit reality, computer systems rarely like to store things that are less than a byte, so really we end up with 3 bytes, and 6 bits of space (or 25%!) wasted:

That’s a lot of wasted storage! Time for some bit packing! It’s possible to pack in data from the next tile into that wasted space to avoid reclaim it. We start by putting the first six bits of the tile id into the wasted space, and then filling the next byte with the remaining two bits, and the x coordinate, etc. Finally, repeat until we reach a byte boundary, ending up with a structure that looks like this:

So we can store 4 tiles every 11 bytes with zero wasted storage. Great!

If you want to try to do this, be very careful with your bit-shift operators! Most of the bit-shift operators will shift the sign bit, and certain languages (like Java) make assumptions around the unit type when you are reading a byte field. Those caused some headaches for me while implementing this, but now I have a completed, and thoroughly tested, solution. On to generating new board states!

Going after Eternity II

The Eternity II Puzzle is a 256 piece edge-matching puzzle. To solve the puzzle, you must lie out all 256 tiles onto a 16×16 grid with all the edges matching. A single seed tile is positioned in the middle of the board. Tomy UK Ltd. released Eternity II in 2007 with a $2,000,000 prize for solving it. The designers tried to generate the hardest to solve puzzle with this specification. I picked up a copy in 2008 and did some work trying to solve it back then, but dropped it as other priorities came up.

Back when I was working on this initially there were a variety of people trying to solve it with stateless DFS, or randomly positioning all tiles and trying to shuffle the board to maximize the number of edges that matched. I tried a variety of these solutions and some of my own invention. One method I tried was a construction solution where I build all possible sets of two tiles, combined those to all possible sets of four, etc, resulting in storing all combinations of these sizes: 2×1, 2×2, 4×2, 4×4, 8×4, 8×8, 16×8, 16×16. The idea was after only eight steps I could solve the whole board. That rapidly proved a bad idea as the number of valid combinations at just the lower stages were so numerous they dwarfed my storage capacity. I had little hope of things improving in the higher stages. I ran a simulation running a DFS to generate possible 8×8 squares and counting (without storage) and got to the hundreds of millions before I stopped. There was simply too much data generated for me to store. To make matters worse, I knew a vast majority of these solutions weren’t conducive to the actual solution, as this approach ignored the seed tile. So I didn’t want to store a bunch of data that wasn’t useful in generating a solution. That wouldn’t get us anywhere.

A number of things have changed since 2009. In 2010 they ended the prize portion of the puzzle, and they’ve shut everything about it down. However, there are a number of positive developments for this puzzle. For instance, the price of storage has fallen dramatically. I could conceivably store all those 8×8 squares if I wanted, I would have the capacity for it (not that I want to waste space that way). Computing power is also easier to come by. In 2009 I had a desktop and a laptop, now I own a small cluster of compute power at my disposal, and can easily expand that compute power with the cloud. I’ve also grown as a developer and have more experience with distributed solutions. Given all of this, I will tackle this problem again. I don’t know if I’ll actually solve it, the solution space is enormous, and my solution may never stumble upon the answer. But I still think it will be a fun way to learn more about distributed systems, and there are several interesting problems that I can tackle within this context.

It should be fun.