I’m sure many of you have heard the importance of optimizing your SQL queries for the optimizer. You may have doubted this, maybe you thought your query was too simple. Well, this post is going to show you exactly why query optimization is so important.
I’ll start with the graph:
For statistical analysis my solver periodically gathers statistics about the state of the database. The graph above shows the number of boards by status. “New” boards have not been processed yet, while “Done” boards have.
As you can see over the initial few days it generated about 2.8 million boards. After I optimized only one of my queries and in the past 18 hours have generated over 18 million boards.
I’m sure you remember the query we fixed from last week, that’s the query I optimized. Let’s look at it again:
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= ?
ORDER BY id DESC LIMIT 1
Looking at the status of the server showed the database engine was pegging the CPU. Let’s figure out why that is!
MariaDB has a handy tool, SHOW PROCESSLIST
to view what the server is processing at the moment. While running with the old query, that query showed up every time I tried it, so I decided to investigate the query. The ANALYZE
tool was made to help investigate query performance:
ANALYZE
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id <= 14537478
ORDER BY id DESC LIMIT 1
Field | Value |
---|---|
id | 1 |
select_type | SIMPLE |
table | boards |
type | ref |
possible_keys | PRIMARY, status, id_status |
key | status |
key_len | 1 |
ref | const |
rows | 11,369,300 |
r_rows | 8,014,010.00 |
filtered | 100.00 |
r_filtered | 0.00 |
Extra | Using where |
The analyze output indicates even though we’re using the id_status
key values in our where clause, it still uses the key created on status
. This is problematic, because it means that even though we should be able to use the key built exactly to optimize this query, we ended up reading over 8 million rows to get the table result. That’s a huge cost to grab a single row.
So why are we doing this? Turns out, MariaDB doesn’t support DESC order keys. So that ORDER BY id DESC
is causing us to not use the Primary Key, and we end up going to the status
key, and then with those results we search for the id closest to the target.
That means to get the benefit of the Primary Key we need to invert the query:
SELECT id, tiles
FROM boards
WHERE status = 'NEW' AND id >= ?
ORDER BY id ASC LIMIT 1
Here’s the analyze result for that:
Field | Value |
---|---|
id | 1 |
select_type | SIMPLE |
table | boards |
type | ref |
possible_keys | PRIMARY, status, id_status |
key | status |
key_len | 1 |
ref | const |
rows | 11,590,310 |
r_rows | 1 |
filtered | 100.00 |
r_filtered | 100.00 |
Extra | Using index condition; Using where |
So we’re reading only 1 row, which is good, but we’re still using the status key. And looking at the behavior of the server, we’re still pegging the CPU. Something is still wrong, the database engine isn’t using the right index. Well, MariaDB gives us a way to give the database engine a hint. The result is this query:
SELECT id, tiles
FROM boards
USE INDEX (id_status)
WHERE status = 'NEW' AND id >= 14537478
ORDER BY id ASC LIMIT 1
With this analyze:
Field | Value |
---|---|
id | 1 |
select_type | SIMPLE |
table | boards |
type | range |
possible_keys | id_status |
key | id_status |
key_len | 9 |
ref | (NULL) |
rows | 11,626,588 |
r_rows | 1 |
filtered | 100.00 |
r_filtered | 100.00 |
Extra | Using index condition |
Finally we’re using the right index, and that means some other interesting things happen. For instance, we stop using the WHERE
clause and are really just using the index. The query is much faster, even the ANALYZE
finished immediately while the prior ANALYZE
instances took over a minute. Now after running that query, the server’s CPU load has fallen to below 10%. This means we should be able to scale out the runner a bit better, at least, until we start hitting our next bottleneck.