optimization – KSRandom's Tech Ramblings

I’m sure many of you have heard the importance of optimizing your SQL queries for the optimizer. You may have doubted this, maybe you thought your query was too simple. Well, this post is going to show you exactly why query optimization is so important.

I’ll start with the graph:

For statistical analysis my solver periodically gathers statistics about the state of the database. The graph above shows the number of boards by status. “New” boards have not been processed yet, while “Done” boards have.

As you can see over the initial few days it generated about 2.8 million boards. After I optimized only one of my queries and in the past 18 hours have generated over 18 million boards.

I’m sure you remember the query we fixed from last week, that’s the query I optimized. Let’s look at it again:

SELECT id, tiles FROM boards WHERE status = 'NEW' AND id <= ? ORDER BY id DESC LIMIT 1

Looking at the status of the server showed the database engine was pegging the CPU. Let’s figure out why that is!

MariaDB has a handy tool, SHOW PROCESSLIST to view what the server is processing at the moment. While running with the old query, that query showed up every time I tried it, so I decided to investigate the query. The ANALYZE tool was made to help investigate query performance:

ANALYZE SELECT id, tiles FROM boards WHERE status = 'NEW' AND id <= 14537478 ORDER BY id DESC LIMIT 1

Field	Value
id	1
select_type	SIMPLE
table	boards
type	ref
possible_keys	PRIMARY, status, id_status
key	status
key_len	1
ref	const
rows	11,369,300
r_rows	8,014,010.00
filtered	100.00
r_filtered	0.00
Extra	Using where

The analyze output indicates even though we’re using the id_status key values in our where clause, it still uses the key created on status. This is problematic, because it means that even though we should be able to use the key built exactly to optimize this query, we ended up reading over 8 million rows to get the table result. That’s a huge cost to grab a single row.

So why are we doing this? Turns out, MariaDB doesn’t support DESC order keys. So that ORDER BY id DESC is causing us to not use the Primary Key, and we end up going to the status key, and then with those results we search for the id closest to the target.

That means to get the benefit of the Primary Key we need to invert the query:

SELECT id, tiles FROM boards WHERE status = 'NEW' AND id >= ? ORDER BY id ASC LIMIT 1

Here’s the analyze result for that:

Field	Value
id	1
select_type	SIMPLE
table	boards
type	ref
possible_keys	PRIMARY, status, id_status
key	status
key_len	1
ref	const
rows	11,590,310
r_rows	1
filtered	100.00
r_filtered	100.00
Extra	Using index condition; Using where

So we’re reading only 1 row, which is good, but we’re still using the status key. And looking at the behavior of the server, we’re still pegging the CPU. Something is still wrong, the database engine isn’t using the right index. Well, MariaDB gives us a way to give the database engine a hint. The result is this query:

SELECT id, tiles FROM boards USE INDEX (id_status) WHERE status = 'NEW' AND id >= 14537478 ORDER BY id ASC LIMIT 1

With this analyze:

Field	Value
id	1
select_type	SIMPLE
table	boards
type	range
possible_keys	id_status
key	id_status
key_len	9
ref	(NULL)
rows	11,626,588
r_rows	1
filtered	100.00
r_filtered	100.00
Extra	Using index condition

Finally we’re using the right index, and that means some other interesting things happen. For instance, we stop using the WHERE clause and are really just using the index. The query is much faster, even the ANALYZE finished immediately while the prior ANALYZE instances took over a minute. Now after running that query, the server’s CPU load has fallen to below 10%. This means we should be able to scale out the runner a bit better, at least, until we start hitting our next bottleneck.

“Our calculations are that if you used the world’s most powerful computer and let it run from now until the projected end of the universe, it might not stumble across one of the solutions.”
– Lord Monckton, One of the Designers of the Eternity II puzzle

An algorithm for generating partial boards for Eternity II is reasonably straightforward. As is an algorithm for checking if the board you have is a solution to the puzzle. However, the scale of the problem means it’s not feasible to set up an algorithm that generates boards and checks correctness until you find one. It would simply take too long and if the computer you’re running it on crashes, you’d have to restart. Not a good place to be.

This means, we need to store state as we go, nearly continuously. This way we lose as little progress as possible if something breaks along the way.

The algorithm I’m using takes the current board state and determines the “optimal” place to set the next batch of tiles. Once that location has been determined it generates the board states representing placing all possible tiles in that location and stores them. Then we grab a new board state and try again. So for the sake of this design, each “step” is placing one new tile on the board. I’ll go over this algorithm in more detail in a later post. For now, let’s talk about storage.

To represent board state, we need to represent the placement and orientation of all tiles on the 16×16 grid. For simplicity we’ll represent the tiles with numbers and the coordinate system of the grid can be the 1st quadrant of a xy-plane. So that means to store the board state itself we’ll need to store a set of tuples of tile ids, x, y coordinates, and orientation. So a set of tiles would be stored like this:

(37, 5, 6, 0)
(48, 5, 7, 180)

I used the number of degrees we rotate the tile around the unit circle to represent the orientation. If we have a full solution to the puzzle, it would involve a set of 256 of these tuples.

Now, with this being all we need let’s get this as small as we can.

The tile id is a number between 1 and 256. All the tiles came pre-numbered, so the numbering system is already given. And fortunately 256 is 2⁸, so we know we can use just 8 bits to represent it. Unfortunately storing 256 itself actually requires the 9th bit, due to how unsigned numbers work in binary. It’d be great to not waste that extra bit, so we need a solution. One solution is to subtract one from each number, but an even better solution a friend told me a long time ago is to just convert 256 itself to 0. This means most of our tiles are exactly as they are represented in the physical world, except for 256, which is 0.

So now we can neatly get the tile id into an 8-bit field, or 1 byte.

The x and y coordinates are on a 16×16 grid. 16 is 2⁴, so we can use the same approach we used for tile ids, and get each dimension into 4 bits. Given two dimensions that’s another 8 bits.

So now we’re at a 16-bit field, or 2 bytes.

That leaves orientation. Above I represented orientation as the number of degrees rotated around the unit circle. There are 360 degrees around a circle which, while not a power of two, will fit in 9 bits. So we could continue to store this and use another 9 bits, or 3 bytes and 1 bit. However, we can’t really use all 360 degrees, these tiles are all squares, so we can only use increments of 90 degrees. That means we only need to consider 4 orientations, 0°, 90°, 180°, and 270°. This can be represented in 2 bits.

So the total storage needed to store a single tile position is 2 bytes and 2 bits.

But then we hit reality, computer systems rarely like to store things that are less than a byte, so really we end up with 3 bytes, and 6 bits of space (or 25%!) wasted:

That’s a lot of wasted storage! Time for some bit packing! It’s possible to pack in data from the next tile into that wasted space to avoid reclaim it. We start by putting the first six bits of the tile id into the wasted space, and then filling the next byte with the remaining two bits, and the x coordinate, etc. Finally, repeat until we reach a byte boundary, ending up with a structure that looks like this:

So we can store 4 tiles every 11 bytes with zero wasted storage. Great!

If you want to try to do this, be very careful with your bit-shift operators! Most of the bit-shift operators will shift the sign bit, and certain languages (like Java) make assumptions around the unit type when you are reading a byte field. Those caused some headaches for me while implementing this, but now I have a completed, and thoroughly tested, solution. On to generating new board states!

Tag: optimization

The Importance of Analyzing and Optimizing your SQL Queries

Efficiently Representing Board State