The Importance of Analyzing and Optimizing your SQL Queries

I’m sure many of you have heard the importance of optimizing your SQL queries for the optimizer. You may have doubted this, maybe you thought your query was too simple. Well, this post is going to show you exactly why query optimization is so important.

I’ll start with the graph:

For statistical analysis my solver periodically gathers statistics about the state of the database. The graph above shows the number of boards by status. “New” boards have not been processed yet, while “Done” boards have.

As you can see over the initial few days it generated about 2.8 million boards. After I optimized only one of my queries and in the past 18 hours have generated over 18 million boards.

I’m sure you remember the query we fixed from last week, that’s the query I optimized. Let’s look at it again:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1

Looking at the status of the server showed the database engine was pegging the CPU. Let’s figure out why that is!

MariaDB has a handy tool, SHOW PROCESSLIST to view what the server is processing at the moment. While running with the old query, that query showed up every time I tried it, so I decided to investigate the query. The ANALYZE tool was made to help investigate query performance:

ANALYZE
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= 14537478
ORDER BY id DESC LIMIT 1

FieldValue
id1
select_typeSIMPLE
tableboards
typeref
possible_keysPRIMARY, status, id_status
keystatus
key_len1
refconst
rows11,369,300
r_rows8,014,010.00
filtered100.00
r_filtered0.00
ExtraUsing where

The analyze output indicates even though we’re using the id_status key values in our where clause, it still uses the key created on status. This is problematic, because it means that even though we should be able to use the key built exactly to optimize this query, we ended up reading over 8 million rows to get the table result. That’s a huge cost to grab a single row.

So why are we doing this? Turns out, MariaDB doesn’t support DESC order keys. So that ORDER BY id DESC is causing us to not use the Primary Key, and we end up going to the status key, and then with those results we search for the id closest to the target.

That means to get the benefit of the Primary Key we need to invert the query:

SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id >= ?
ORDER BY id ASC LIMIT 1

Here’s the analyze result for that:

FieldValue
id1
select_typeSIMPLE
tableboards
typeref
possible_keysPRIMARY, status, id_status
keystatus
key_len1
refconst
rows11,590,310
r_rows1
filtered100.00
r_filtered100.00
ExtraUsing index condition; Using where

So we’re reading only 1 row, which is good, but we’re still using the status key. And looking at the behavior of the server, we’re still pegging the CPU. Something is still wrong, the database engine isn’t using the right index. Well, MariaDB gives us a way to give the database engine a hint. The result is this query:

SELECT id, tiles
FROM boards
USE INDEX (id_status)
WHERE status = 'NEW' AND id >= 14537478
ORDER BY id ASC LIMIT 1

With this analyze:

FieldValue
id1
select_typeSIMPLE
tableboards
typerange
possible_keysid_status
keyid_status
key_len9
ref(NULL)
rows11,626,588
r_rows1
filtered100.00
r_filtered100.00
ExtraUsing index condition

Finally we’re using the right index, and that means some other interesting things happen. For instance, we stop using the WHERE clause and are really just using the index. The query is much faster, even the ANALYZE finished immediately while the prior ANALYZE instances took over a minute. Now after running that query, the server’s CPU load has fallen to below 10%. This means we should be able to scale out the runner a bit better, at least, until we start hitting our next bottleneck.