January 2018 – What William Said

This week I ran across a new-to-me option in some T-SQL code I was reviewing: OPTION (RECOMPILE). It turns out that this option improves performance by forcing SQL Server to recompile the query rather than using the plan it already generated.

Wait, what? Don’t we save time by saving a query plan and reusing it?

Generally, yes. When you first run a stored procedure, the SQL engine will figure out a plan for that procedure and cache the plan for reuse, thus saving the time of recomputing the it each time the procedure is run. The problem comes in when a plan is generated that makes sense for the current database state and parameters, but doesn’t make sense when those change. For example, suppose you have a table with ten million rows; different approaches are required if you want to retrieve ten of those rows vs seven million of them.

Generating the query plan

When the stored procedure is executed, SQL Server looks at the distribution statistics it has about the data in the tables the procedure references and uses this data to guess at the best way to run the query. Assuming that the statistics are up to date, it can generally make a fairly good guess; for example, it may choose an index seek if it expects to return a small number of rows, but a table scan if it expects that a significant percentage of the table will be returned.

Parameter sniffing
Sometimes, the output of the procedure depends heavily on the value of a parameter; one value might result in returning one row, while another returns a million rows. When you run the procedure, SQL Server sniffs the parameters and uses whose to choose the query plan. If a query is called with literal values, you get one plan for each call, but if it is parameterized, the same plan is used the next time the procedure is called even if the value of the parameter is very different. If the procedure behaves similarly as the parameter varies, this works fine, and you save the cost of creating a new plan each time. If it doesn’t, however, you can end up with a plan that is orders of magnitude slower than optimal.

Recompiling
Telling SQL Server to recompile forces it to create a new query plan rather than reusing the old one. When the old plan was “good enough”, this is a bad thing – generating the plan is expensive. In some cases, though, the performance improvement from using a better plan easily overwhelms the cost of recomputing that plan.

One way to do this is to alter the stored procedure to include the WITH COMPILE option; this tells SQL Server not to cache a plan, so the query plan is regenerated each time. Alternatively, we can add OPTION (RECOMPILE) at the statement level, in which case only the plans for that particular statement, rather than the entire procedure, will be regenerated.

Being bossy with SQL Server
Forcing a recompile isn’t the only way to take control away from SQL Server. Another option is to use the OPTIMIZE FOR hint, when you know that the plan for a particular value will work well for a wide range of possible parameter values; in this case, you use domain knowledge to improve on what SQL Server could manage with statistics alone. You can also use OPTIMIZE FOR UNKNOWN to keep SQL Server from optimizing based on the particular value that was passed in.

Remember the disadvantages of taking control away: you lose the performance benefits of reusing a query plan, and if you make a bad choice as to what value to optimize for (or the data changes to make it a poor choice), you could see much worse performance than if you just leave SQL Server alone to do its own thing. As usual, you want to tell SQL Server how you want it to process something (rather than just what you want it to do) only when you have a good reason for doing so.

I’ve never been a believer in New Year’s resolutions; there’s a reason it’s a cliche to start a new exercise program in January and abandon it by February. Resolutions tend to be things that people would like to do…and by the end of the year, they become things that people would have liked to have done.

At the same time, goals can be worthwhile – especially if they’re SMART goals. A SMART goal is one that is specific, measurable, actionable, realistic, and time-bound.

SMART Goals
Specific: It is clear exactly what the goal is.
Measurable: It is clear whether the goal has been achieved.
Actionable: It’s clear what actions need to be taken to achieve the goal. “I will be a millionaire in a year” isn’t actionable; “I will spend at least two hours per week writing” is.
Realistic: The goal is something you can realistically achieve within the given time bound.
Time-bound: There is a deadline for when the goal will be accomplished.

For 2018, I’ve decided to set New Year’s commitments. Why commitments instead of resolutions or goals? Because I’m committing that these are things I will get done this year (and I’ve actually set up a penalty for if I fail to accomplish any of them). In 2018, I am committing that:

I will publish at least one book – this will probably be either my computer science textbook or my first novel.
I will apply to speak at [at least] two conferences – probably the two I spoke at in 2017 and any others that look good.
I will update my blog at least once per month.

None of these goals are based on things outside of my control; I’m not resolving to make six figures from my book (although it would be nice!), have my talks accepted, or write an outstandingly brilliant post; I’m simply committing to doing the work, and posting a public record of that commitment. When I’m running on way too little sleep because the baby was screaming all night long, I won’t have the option to say “oh, I guess I have a good excuse for not meeting my goals this month” – I have this commitment to hold myself accountable.

Are you setting goals for 2018?

Month: January 2018

Improving query performance in SQL Server with recompile

Goals for 2018