I’ve been a database administrator off and on for nearly seven years. In that time, I’ve had a to do a lot of queries that were complex to answer simple questions. One of the things you start to notice is that many queries actually reflections of programming logic. As an example, say we have several tables that we need to pull data from, but we don’t want the query to be expensive (take a lot of time and resources). We try the standard of setting up sub-queries through joins and find the WHERE logic creates a great deal of selectivity in the query. It’s great we got our result set, but the query took two to five minutes with a ton of disk I/O and CPU. That’s bad.
In programming, you learn to set things up with classes that have the contained methods you need making the main class extremely small. For you procedural guys out there, its the same as creating functions in your program and having only a few calls to the functions to run the program. It’s cleaner and tends to run faster (in my opinion). The same approach can be true of database queries.
I was recently given an SSRS report and told to automate it and adjust it for the new file format the vendor wanted. When the report ran, it took ten minutes to complete and had several Query Hints to limit processor utilization. The report did have a couple of temporary tables, but the majority of the data was filtered through several nested joins and a VERY long list in the WHERE and ORDER BY clause. This was no good.
Although I can’t put the original query here (It would be pages of reading) because it is proprietary for my work, sufficed to say, the query was 350 lines long. This is what I did to change it. First, I pulled the sub-queries away from the JOIN list and made them temporary tables. I then created a series of indexes and statistics on the new temporary tables. You cannot do this within a sub-query and it takes longer for the data to compile. I also added a Common Table Expression (CTE) before the main select query for history information. This made the main query look much cleaner and smaller. The beauty of the CTE is that it can be read like a table, expanded in the query itself, or to setup for a cursor (although I personally hate cursors).
Think of each temporary table and CTE (you can have more than one), as a function to procedural programming or a class to object oriented programming. When you are ready for your main program to run, you are just calling the code above it. Although, I can’t put the actual query here, I will show a skeletal structure of what it would look like instead:
CREATE TABLE #fingfangfoom (
testid INT IDENTITY,
SELECT empid, empname, city
# CREATE TABLE #Customerfoo (
SELECT custid, custname, address1, city, zip
CREATE TABLE #Orders (
WITH cte_history (custname, custid, transactionid, productid, productname, orderdate)
SELECT ca.custid, ca.custname, tr.transactionid, pr.productid, pr.productname, tr.orderdate
FROM CustomerArchive ca
RIGHT JOIN Transactions tr
ON ca.custid = tr.custid
LEFT JOIN ProductArchive pr
ON tr.productid = pr.productid
WHERE tr.quantity > 1
SELECT empname, custid, productid, O.orderdate, cf.address1, cf.city, cf.zip
FROM cte_history ch
INNER JOIN #Orders O
ON ch.custname = O.shipname
LEFT JOIN #Customerfoo cf
ON ch.custid = cf.personid
LEFT JOIN #fingfangfoom fff
ON fff.empname = ch.custname
This is a very simple example whereby we are pulling all employees that are customers and showing all transactions with the store by those employees. Another words, how loyal are the employees. However, we could further refine the query by showing a transaction count greater than five. Note that this table creates a History CTE from the archive tables and joins them to the current product tables. It’s a rough query as there will be plenty of NULL values in there.
The nice thing about the above query is that we can take the #Orders table and add an index to it in order to speed up the results. Moreover, now that the front work was done by the CTE, the actual cost on the main select statement is low. From my first example at the top of the post, I was able to get the ten minute report down to one minute thirty-two seconds. If you build SSIS style jobs like I do, the amount of time each job takes can add up.
What we want to get at here is try to do as much as you can up front before you actually do the query (especially if it is a long one). It will save you time, make it more readable, and make it much easier for someone to figure out if they have to look at it years later.