Tag Archives: “swiss army knife”

The Potential of the LINQ. Why I really Wish I Could Love LINQ to Entities.

Executive Summary

Generalized search (or catch all, or Swiss army knife) stored procedures are one of the cases where I personally feel that T-SQL does not offer a fantastic solution – my feeling is that we usually need to choose between maintainability (LINQ is likely to seem more maintainable than the “sea of red” that the stored procedure approach becomes) and performance (when parameter sniffing becomes an issue, LINQ will suffer). The careful use of LINQ to Entities would be able to help us if it were not for the fact that there is no way to supply a query hint for those cases where parameter sniffing becomes an issue. The only way that I’ve discovered to work around this shortcoming is to use plan guides which are not very maintainable IMO. That’ a long winded way of saying that dynamic SQL or dynamic string execution in T-SQL may currently be the least bad solution.

The Problem

Whether a database stored purchase order information, technical support tickets, employee data, descriptions of Doctor Who episodes, or something else entirely in this day it is likely that a software frontend will eventually sit between the database and the data consumer. Frequently the software will include some kind of search screen with about a gazillion optional fields where the user can potentially enter data. For example, if we are working on a system for creating and tracking tech support tickets then the user might expect to see a screen someplace in our product that lets them filter by things like creation date, assigned technician, customer, system location, completion date, etc.

Typically the expectation is that this search functionality will be implemented by calling a single stored procedure to perform this search regardless of what combination of parameters is supplied. If the development team has not been down this road before or warned about the pitfalls, typically the optional parameters will be handled by using the “@parm IS NULL OR @parm=val” approach. For example a greatly simplified procedure for our ticketing system may look like this

CREATE PROCEDURE dbo.SearchAllTheThings
  @StartDateCreation DATE = NULL,
  @EndDateCreation DATE = NULL,
  @TechnicianID TINYINT = NULL
AS
	SELECT TicketID, CustomerID, CompletionDate, TicketText
	FROM dbo.Ticket
	WHERE (@StartDateCreation IS NULL OR Ticket.CreationDate >= @StartDateCreation)
	AND (@EndDateCreation IS NULL OR Ticket.CreationDate <= @EndDateCreation)
	AND (@TechnicianID IS NULL OR EXISTS(SELECT * FROM dbo.TicketTechnician
		WHERE TicketTechnician.TicketID = Ticket.TicketID AND TicketTechnician.TechID = @TechnicianID))
GO

and if we ask for the actual plan when calling it like this (requesting all tickets created on October 1, IE all tickets created between October 1 and October 1).

exec dbo.SearchAllTheThings '20131001', '20131001'

we may see a plan like this

What’s up with that nested loop and the concatenation operator? Since we did not specify a parameter for the technician there is no reason for us to look at anything but the ticket table. The clustered index scan should have us covered – well, in the absence of a better index.

The reason, of course, is that SQL Server will reuse the plan for future calls. Even though the plan is optimized for the current set of parameters, it needs to produce correct results for all possible combinations of parameters. The only way this is possible is for the plan to include these extra operators.

Fixing With Dynamic SQL

Typically a good solution for this issue is to rewrite the stored procedure so that it builds an appropriate query on the fly, depending on which parameters are actually used, and then to execute using either exec or sp_executesql depending on whether compilation overhead or parameter sniffing are of greater concern. So in this case we could rewrite the stored procedure as

CREATE PROCEDURE dbo.ImprovedSearchAllTheThings
  @StartDateCreation DATE = NULL,
  @EndDateCreation DATE = NULL,
  @TechnicianID TINYINT = NULL
AS
	DECLARE @query NVARCHAR(300);
	DECLARE @parms NVARCHAR(300);

	SET @parms = N'@StartDateCreation DATE, @EndDateCreation DATE, @TechnicianID TINYINT';

	SET @query = N'SELECT TicketID, CustomerID, CompletionDate, TicketText FROM dbo.Ticket WHERE 1=1';
	IF @StartDateCreation IS NULL
	BEGIN
		SET @query = @query + N' AND Ticket.CreationDate >= @StartDateCreation';
	END
	IF @EndDateCreation IS NULL
	BEGIN
		SET @query = @query + N' AND Ticket.CreationDate <= @EndDateCreation';
	END
	IF @TechnicianID IS NULL
	BEGIN
		SET @query = @query + N' AND EXISTS(SELECT * FROM dbo.TicketTechnician WHERE TicketTechnician.TicketID = Ticket.TicketID AND TicketTechnician.TechID = @TechnicianID)';
	END

	EXEC dbo.sp_executesql @query, @parms, @StartDateCreation, @EndDateCreation, @TechnicianID
GO

When we request the actual execution plan while calling

exec dbo.ImprovedSearchAllTheThings '20131001', '20131001'

And now the plan looks a little cleaner

2013_12_isnull_improved_plan

A Face Only a DBA Could Love

As a rule, my experience has been that developers hate any form of dynamic SQL. At a visceral level, I believe it feels as if dynamic SQL flies in the face of at least 15 years of database programming best practice thinking (to be clear it does not, at least IMO). There was a time when it was common to slam tons of strings, some of which may have come from an end user, together in code to assemble a query. This practice is what made SQL injection attacks so effective at the time. Far and away the most popular mitigation against SQL injection has been various kinds of parameterized SQL such as the sp_executesql procedure used above or using parameters with the humble SQLCommand object. Over time as the community continues to hammer into developers heads that parameterized SQL is a very very good thing, many have lost sight of the fact that the real issue is including user-supplied input into dynamically assembled SQL … many today are of the opinion that ALL dynamic SQL is dangerous (or at least icky).

And I actually agree with them, dynamic SQL is icky. Don’t get me wrong, I’m not saying that dynamic SQL should not be used. I actually think it is often demonstrably the best approach in cases like the above. What I am saying is that it is the best of a handful of bad options. We keep hoping for a good one and that’s one of the reasons LINQ has become so popular over the objections of many many DBAs.

I’m sure different folks have different reasons for disliking dynamic SQL. My issues with it are testing and maintainability. Testing because if there are a large number of parameters and complicated logic the number of potential combinations that could be produced can increase exponentially which makes it impossible to be absolutely sure that there is no possible combination that is missing a space somewhere important. Maintainability because there is no syntax help from the IDE when working inside text strings everything is in a “sea of red” which means anyone who isn’t familiar with how the procedure is put together will need to spend some time orienting themselves before they touch anything.

Not such big problems if you hire some fancy-pants consultant with a blog and never have to modify the code after they leave, but these are problems for mere mortals.

So How Was LINQ Supposed to Help?

I don’t want to get too bogged down in discussing why LINQ is so exciting from a development perspective, but briefly it is an acronym for “Language-Integrated Query” and is pretty awesome because that’s exactly what it does. It allows a developer to drop something that looks an awful lot like a query into the middle of their code and work with the results of the query as objects instead of flat data. All this without tens or hundreds of lines mapping parameters, looping through readers, and stuffing data into objects by hand. As somebody who uses LINQ to Entities in secret, late at night, when nobody else is looking and I don’t care about parameter sniffing issues I can say it really is pretty awesome.

The reason I with I could love LINQ, as a DBA, is that all of this happens through SQL rather than through some proprietary library. In other words, LINQ is a SQL generator that is fully supported by IntelliSense and syntax checkers in Visual Studio. It does not generate concise SQL. It does not always generate comprehensible SQL. But it does generate correct SQL without leaving the developer an opportunity to forget an important space or misspell “SELECT”.

Here is a sample what some LINQ code may look like for this problem. I kept things kind of simple for this example, this isn’t necessarily code that I would put into production. Notice how, even though the syntax and order of clauses is a bit different, the database access code almost looks a little bit like an inline SQL query.

		Ticket[] LongVersion(int? TechnicianID, DateTime StartDate, DateTime EndDate)
		{
			ThrowAway.Blog_2013_11_LameTicketDBEntities ctx = new Blog_2013_11_LameTicketDBEntities();

			IEnumerable<Ticket> pancakes = null;   // "var" is for squids.

			if (TechnicianID == null)
			{
				pancakes = from tkt in ctx.Tickets
						   orderby tkt.TicketID
						   select tkt;
			}
			else
			{
				pancakes = (from tech in ctx.Technicians
							where tech.TechID == (int)TechnicianID
							select tech.Tickets).SingleOrDefault();
			}

			//Narrow down by StartDate
			if (StartDate != null)
			{
				pancakes = from tkt in pancakes
						   where tkt.CreationDate >= StartDate
						   select tkt;
			}

			//Narrow down by EndDate
			if (EndDate != null){
				pancakes = from tkt in pancakes
						   where tkt.CreationDate <= EndDate
						   select tkt;
			}

			return pancakes.ToArray();
		}

Or, more succinctly, if lambda expressions don’t make your head explode and if you aren’t that excited about the SQL-ish syntax

		Ticket[] CompactVersion(int? TechnicianID, DateTime StartDate, DateTime EndDate)
		{
			ThrowAway.Blog_2013_11_LameTicketDBEntities ctx = new Blog_2013_11_LameTicketDBEntities();

			IEnumerable<Ticket> pancakes = null;   // "var" is for squids.

			if (TechnicianID == null)
			{
				pancakes = ctx.Tickets;
			}
			else
			{
				pancakes = ctx.Technicians.Where( x => x.TechID == (int)TechnicianID ).SingleOrDefault().Tickets;
			}

			//Narrow down by StartDate
			if (StartDate != null)
			{
				pancakes = pancakes.Where( x => x.CompletionDate >= StartDate );
			}

			//Narrow down by EndDate
			if (EndDate != null){
				pancakes = pancakes.Where(x => x.CreationDate <= EndDate);
			}

			return pancakes.OrderBy(x => x.TicketID).ToArray();
		}

One key thing to understand about LINQ to Entities is that the query is not executed against the database until the results of the query are accessed. So in the above examples, the only thing that happens between the start of the function until just before the return statement is that the text of a SQL query is assembled. Since the optional parts of the query are enclosed in if/then statements the query will not contain any reference to conditions which do not apply, and since the query is not sent to the server until the results are actually needed no time is wasted generating results that may not be looked at. In both versions of the function, the query is actually executed at the time the ToArray() method is called.

But The Reader Gets the Impression Rick Dislikes LINQ
That would be correct, I do have issues with LINQ at least as it exists today. But this post is probably already more than long enough … stay tuned for my next post on why I feel LINQ fails to live up to its potential.