Sometimes, the query optimizer makes mistakes. Instead of selecting the most optimal plan, a sub-optimal plan is chosen, and it is your task as a DBA to help SQL choose the right plan. In this case, the optimizer overestimated the selectivity for a WHERE clause, and my solution to this problem was to trick the optimizer into using a clustered index seek instead of a nonclustered index.

This is going to be a short post, since I can only provide partial code. The offending query joined 5 tables, each with millions of records. It had the following WHERE clause:

1 2 |
WHERE [tableA].[columnA] LIKE 'R%NL' AND [tableB].[date] > @date |

The query would run for minutes. Analyzing the execution plan showed that the problem was the estimation of the number of records. Based on its statistics, SQL estimated that the first WHERE clause was the most selective. Therefore, SQL started with an index on this table (the top right operator in the query plan). In reality, however, the second WHERE clause was much more selective.

Looking at the statistics, it was pretty obvious why SQL would think that the date clause was not very selective. This particular table contains years of data, but the query would only need records from the last few hours. The statistics on the table are not that granular (for more info on the granularity of statistics, see this Technet article).

At first glance, it was not so obvious why SQL would think that the first clause (LIKE ‘R%NL’) would be selective. The statistics clearly show that most values start with ‘R%’. This is, however, where SQL makes a mistake. It assumes that the combination of the two filters would be a lot more selective. Updating the statistics doesn’t solve this problem.

Having found the underlying problem, my first attempt at a solution was to split the first clause:

1 2 3 |
WHERE [tableA].[columnA] LIKE 'R%' AND [tableA].[columnA] LIKE '%NL' AND [tableB].[date] > @date |

That didn’t solve the problem. SQL still estimated that the combination of the filters would be selective enough to start with tableA. The solution I used was to make the clause non-SARGable, by performing an operation on the column:

1 2 3 |
WHERE [tableA].[columnA] LIKE 'R%' AND RIGHT([tableA].[columnA],2) = 'NL' AND [tableB].[date] > @date |

Normally, this is something you should definitely avoid. A non-SARGable filter eliminates the optimizers ability to estimate whether it could use an index to satisfy this query. In this case, it eliminated the optimizers abililty to make the *wrong *estimation! So sometimes, two wrongs make a right!