SQL Query Backtrace

Quick and easy way to track the code where SQL queries are executed from

April 24, 2021

I am revisiting my old blog drafts, and I found this story. It is an interesting topic to write about, although nowadays in 2021 it might be moot.

The Problem

First, some background. Many years ago, way back in 2010, at work we hit a particular bottleneck in our platform and we needed to investigate what is going on. There was this flood of repeated and duplicate sql queries and we needed to find what and why is going on. At the time most of the queries were composed by our ORM, so it is close to impossible to match a query to where it originates in the code. Well, it was not impossible, but it really took time to figure out what query comes from what code.

Here lies the problem: we can intercept the completed query before or after it has been sent to the SQL server, but we need to trace the steps taken to reach this moment. Multiply this 100x times as we re fighting to reduce the number of queries and you get the type of problem we were in when trying to backtrace a query.

Desperate Measures

Initially I tried some stupid tricks. One was to adding meaningless columns to select queries from the ORM and then look for those columns in the query log, and if there is a match then this is where this query came from. Here's some pseudo code to visualize this:

$active_record->add_col('origin', '"AB123"');

And then the query became

SELECT id, ...., "AB123" as `origin` ...

You've probably guessed already that this a bad idea for a myriad of reasons, like

we are now introducing this extra column in the result set, this will be present in every record returned with the same hard-coded value
you have to add this ad-hoc to every piece of code where a query is being composed
the extra column will only work for SELECT queries
etc.

I had some discussions to do something similar for the WHERE statement of the query as it will not pollute the resultset and it will work for all queries, something like:

$active_record->where(' OR (0 and "AB123")');

This part of the query will never be used as it is equivalent to "... OR 0", and the query will look like this:

UPDATE ... WHERE ... OR (0 and "AB123")

This still leaves the problem that we need to do this for every query we send. Plus, let's be honest, it's nasty to change the query like that. Ideally the query must be preserved as it was originally composed.

The Solution

Let's recap and see what the requirements are for this to work:

find where a query has been triggered
do not change the query
must work with all queries: SELECT, UPDATE, ALTER, whatever

As it usually happens, the solution was quite simple, and you have probably guessed it by now. Let me try to explain my thinking process, and see if you would come to the same conclusion.

First, the way not to change a query is to just leave it as it is and just use comments. Pretty obvious, I know.

Most ORMs have code that scans the first word of the query to determine its type and use that to decide what to return. For example SELECTs will return resultsets, UPDATEs and DELETEs will return number of affected rows, and INSERTs will return the ID of the newly inserted record. This means that the comments must not be on top of the query, but at the end - otherwise the ORM will have to do some extra work to clean up the comments to get the first word of the query

 -- bad place to add a comment if you are the ORM trying to figure out what this is
SELECT id, name FROM sites;

SELECT id, price FROM products;
 -- good place to add a comment as the ORM will not care

So far so good: the comments will make this work with all queries without changing them. Now how about finding where a query has been called from? I bet by now you are already screaming:

"Backtraces, you idiot! Append a bloody backtrace to the query!"

Yes, that's right - backtraces. And again, it is pretty obvious, I know.

Now, what are the options here?

There is everyone's favorite debug_backtrace(), but you have to process the output and only collect what is needed. This is what we started with, but let's see what other options are there.

You can also use debug_print_backtrace() and output buffering to capture the content

ob_start();
debug_print_backtrace();
$backtrace = ob_get_contents();
ob_end_clean();

My personal preference however is Exception::getTraceAsString(), as it is really quick and easy. I know that by using debug_backtrace() you can be more precise and exclude some of the stack frames, but the exception trace already is pretty compact so I don't mind the extra line there for the method from where this is being called.

$backtrace = (new \Exception)->getTraceAsString();

The last part is to turn the backtrace into a SQL comment, and ... that's it.

/// ... somewhere deep inside the ORMs in the method that sends the query

$backtrace = (new \Exception)->getTraceAsString();
$backtrace = str_replace("\n", "\n -- ", $backtrace);
$query .= $backtrace;

And the query becomes something like this:

DELETE FROM oc_customer_online WHERE ...
 -- #0 /home/centuryp/public_html/vqmod/vqcache/vq2-system_library_db.php(20): DBMySQLi->query('DELETE FROM `oc...')
 -- #1 /home/centuryp/public_html/catalog/model/tool/online.php(4): DB->query('DELETE FROM `oc...')
 -- #2 /home/centuryp/public_html/vqmod/vqcache/vq2-catalog_controller_common_footer.php(318): ModelToolOnline->whosonline('113.212.126.108', NULL, 'http://centuryp...', '')
 -- #3 /home/centuryp/public_html/vqmod/vqcache/vq2-system_engine_controller.php(43): ControllerCommonFooter->index(Array)
 -- #4 /home/centuryp/public_html/vqmod/vqcache/vq2-system_engine_controller.php(169): Controller->getChild('common/footer')
 -- #5 /home/centuryp/public_html/catalog/controller/error/not_found.php(69): Con in /home/centuryp/public_html/system/database/mysqli.php on line 41

* Note that this is just an example and I am using an exception backtrace I found here, as I do not want to share any backtraces from our own platform.

I think it goes without saying that you should not do this on production/live sites, but only locally and on dev installs. If you are logging queries on the SQL server side, you should be sending the query with the appended backtrace comment. If you however are just logging queries on the client side in your app, you can just append the backtrace comment to the logged query after it has been executed and save yourself the inflated footprint from the comment.