SQL functions

It happens all the time, but a client phoned me with an urgent SQL query problem.

Their query was taking way too long to run. Sure, its got millions of records to run through, but it only need to retrieve a small result set, and the WHERE clause was seeking on an indexed fields. Or where they?

A quick look at the query:

Wait… before you read that, the only thing you need to know, if you are putting your index fields inside a function… you are slowing it down. Rather limit the results in a inner query, and filter those further with your functions.

The reason: The query optimiser cannot tell what the outcome of parsing the value through the function will be. So it needs to build a temp table with the parsed values (or some other tedious operation that goes through every single record, instead of selecting only a sub set or rows).

So, functions in a WHERE clause are a bad idea, unless the record set being drawn from is already small.

The code is below. And for a more technical background: see this article http://www.databasejournal.com/features/mssql/article.php/3845381/T-SQL-Best-Practices–Dont-Use-Scalar-Value-Functions-in-Column-List-or-WHERE-Clauses.htm

Bad Query:

DECLARE @fdate as datetime
DECLARE @tdate as datetime
SET @fdate = '2009-10-01'
SET @tdate = '2009-10-31'
SELECT cat.Descrip,

SUM(((invl.Quantity*invl.SellingPrice)-((invl.Discount/100)* (invl.Quantity*invl.SellingPrice)))*invl.tax) As Total

FROM Invoice inv
JOIN InvoiceLine invl ON inv.InvoiceID=invl.InvoiceID
JOIN Products prod  ON invl.ProductID=prod.ProductID
JOIN CategoryForSales cat ON invl.salescat=cat.CategorySalesID
WHERE
  inv.cid=@cid
 AND (convert(varchar(10),inv.InvoiceDate,111)
  between  convert(varchar(10),@fdate,111) AND  convert(varchar(10),@tdate,111))
GROUP BY cat.Descrip

Obviously the user is trying limit the time of the transaction out of the Where clause.
It would have been better in this case to simply set @fdate to the date at 12am, and @tdate set to 23:59.59.998 or 12am of the next day.

DECLARE @fdate as datetime
DECLARE @tdate as datetime
SET @fdate = '2009-10-01 00:00:00.000'
SET @tdate = '2009-10-31 23:59:59.998'
-- OR SET @tdate = '2009-11-01 00:00:00.000'
SELECT cat.Descrip,

SUM(((invl.Quantity*invl.SellingPrice)-((invl.Discount/100)* (invl.Quantity*invl.SellingPrice)))*invl.tax) As Total

FROM Invoice inv
JOIN InvoiceLine invl ON inv.InvoiceID=invl.InvoiceID
JOIN Products prod  ON invl.ProductID=prod.ProductID
JOIN CategoryForSales cat ON invl.salescat=cat.CategorySalesID
WHERE
  inv.cid=@cid  AND inv.InvoiceDate between  @fdate AND  @tdate
  --Hey look how much simpler the where clause looks now... and its that much simpler for the query optimiser too!
GROUP BY cat.Descrip

Comment (2)

  • Mark Diamond| January 7, 2012

    Hi,

    I like this post, more people should be aware of “SARGability”.

    However, I don’t believe that the first and second query are alike in terms of expected result sets…
    The first query will only return records up until and including (because the BETWEEN operator is being used and is inclusive) 2009-10-31 00:00:000, thereby virtually forfeiting all records for that day. The second query however, will return all results for that day, though I find it interesting that …998ms was used and not 997ms; because the precision tolerance of the DATETIME type is 997 – I believe this has been seen to in the DATETIME2 type. Yes, I am being pedantic 🙂

    Also, for the sake of simplicity, I would have considered using lt and gt operators in a syntax that looks something like “WHERE DTCol >= @DTStart AND DTCol < @DTEnd + 1" where my variables would be time zero'd.

    Thank you!

    Mark D.

  • Matthew| January 9, 2012

    Hi Mark,

    Dead right. The customer had their WHERE clause wrong to begin with as well (they excluded a day in error).

    For the 997, they still wanted .998 and 999, and .000 to be included.

    I guess for the final consideration of using lt and gt, I am 100% not sure if this speeds the query up or not, but I guess if not, it would be down programmer preference.

    My understanding is, the “between” syntax extrapolates into the lt and gt.