Introduction
If you follow the various newsgroups on Microsoft SQL Server, you often see people asking why they can't do:
SELECT * FROM @tablename
SELECT @colname FROM tbl
SELECT * FROM tbl WHERE x IN (@list)
For all three examples you can expect someone to answer
Use dynamic SQL and give a quick example on how to do it. Unfortunately, for all three examples above, dynamic
SQL is a poor solution. On the other hand, there are situations where dynamic
SQL is the best or only way to go.
In this article I will discuss the use of dynamic SQL in stored procedures and to a minor extent from client languages. To set the scene, I start with a very quick overview on application architecture for data access. I then proceed to describe the feature dynamic SQL as such, with a quick introduction followed by the gory syntax details. Next, I continue with a discussion on SQL injection, a security issue that it is essential to have good understanding of when you work with dynamic SQL. This is followed by a section where I discuss why we use stored procedures, and how that is affected by the use of dynamic SQL. I carry on with a section on good practices and tips for writing dynamic SQL. I conclude by reviewing a number of situations where you could use dynamic SQL and whether it is a good or bad idea to do it.
The article covers all versions of SQL Server from SQL 6.5 to SQL 2008, with emphasis on SQL 2000 and later versions.
Note: many of the code samples in this text works against the
pubs and
Northwind databases that ship with
SQL 2000 and
SQL 7, but not with
SQL 2005 and later. You can download these databases from
Microsoft's web site.
Before I describe dynamic
SQL, I like to briefly discuss the various ways you can access data from an application to give an overview of what I'll be talking about in this article.
(
Note: all through this text I will refer to
client as anything that accesses
SQL Server from the outside. In the overall application architecture that may in fact be a middle tier or a business layer, but as that is of little interest to this article, I use
client in the sake of brevity.)
There are two main roads to go, and then there are forks and sub-forks.
- Send SQL statements from the client to SQL Server.
- Rely on SQL generated by the client API, using options like CommandType.TableDirect and methods like .Update. LINQ falls into this group as well.
- Compose the SQL strings in the client code.
- Build the entire SQL string with parameter values expanded.
- Use parameterised queries.
- Perform access through stored procedures.
- Stored procedures in T-SQL
- Use static SQL only.
- Use dynamic SQL together with static SQL.
- Stored procedures in a CLR language such as C# or VB .Net. (SQL 2005 and later.)
Fork
1-a may be good for simple tasks, but you are likely to find that you outgrow it as the complexity of your application increases. In any case, this approach falls entirely outside the scope of this article.
Many applications are built along the principles of fork
1-b, and as long as you take the sub-fork
1-b-ii, it does not have to be bad. (Why
1-b-i is bad, is something I will come back to. Here I will just drop two keywords:
SQL Injection and
Query-Plan Reuse.) Nonetheless, in many shops the mandate is that you should use stored procedures. When you use stored procedures with only static
SQL, users do not need direct permissions to access the tables, only permissions to execute the stored procedures, and thus you can use the stored procedure to control what users may and may not do.
The main focus for this text is sub-fork
2-a-ii. When used appropriately, dynamic
SQL in stored procedures can be a powerful addition to static
SQL. But some of the questions on the newsgroups leads to dynamic
SQL in stored procedures that are so meaningless, that these people would be better off with fork
1-b instead.
Finally, fork
2-b, stored procedures in the
CLR, is in many regards very similar to fork
1-b, since all data access from
CLR procedures is through generated
SQL strings, parameterised or unparameterised. If you have settled on
SQL procedures for your application, there is little point in rewriting them into the
CLR. However,
CLR code can be a valuable supplement for tasks that are difficult to perform in
T-SQL, but you yet want to perform server-side.
In this chapter I will first look at some quick examples of dynamic
SQL and point out some very important implications of using dynamic
SQL. I will then describe
sp_executesql and
EXEC() in detail, the two commands you can use to invoke dynamic
SQL from
T-SQL.
Understanding dynamic
SQL itself is not difficult. Au contraire, it's rather too easy to use. Understanding the fine details, though, takes a little longer time. If you start out using dynamic
SQL casually, you are bound to face accidents when things do not work as you have anticipated.
One of the problems listed in the introduction was how to write a stored procedure that takes a table name as its input. Here are two examples, based on the two ways to do dynamic
SQL in Transact-
SQL:
CREATE PROCEDURE general_select1 @tblname sysname,
@key varchar(10) AS
DECLARE @sql nvarchar(4000)
SELECT @sql = ' SELECT col1, col2, col3 ' +
' FROM dbo.' + quotename(@tblname) +
' WHERE keycol = @key'
EXEC sp_executesql @sql, N'@key varchar(10)', @key
CREATE PROCEDURE general_select2 @tblname nvarchar(127),
@key varchar(10) AS
EXEC('SELECT col1, col2, col3
FROM ' + @tblname + '
WHERE keycol = ''' + @key + '''')
Before I say anything else, permit me to point out that these are examples of
bad usage of dynamic
SQL. Passing a table name as a parameter is not how you should write stored procedures, and one aim of this article is to explain this in detail. Also, the two examples are not equivalent. While both examples are bad, the second example has several problems that the first does not have. What these problems are will be apparent as you read this text.
Whereas the above looks very simple and easy, there are some very important things to observe. The first thing is permissions. You may know that when you use stored procedures, users do not need permissions to access the tables accessed by the stored procedure.
This does not apply when you use dynamic SQL! For the procedures above to execute successfully, the users must have
SELECT permission on the table in @tblname. In
SQL 2000 and earlier this is an absolute rule with no way around it.
Starting with SQL 2005, there are alternatives, something I will come back to in the section
The Permission System.
Next thing to observe is that the dynamic
SQL is not part of the stored procedure, but constitutes
its own scope. Invoking a block of dynamic
SQL is akin to call a nameless stored procedure created ad-hoc. This has a number of consequences:
- Within the block of dynamic SQL, you cannot access local variables (including table variables) or parameters of the calling stored procedure. But you can pass parameters – in and out – to a block of dynamic SQL if you use sp_executesql.
- Any USE statement in the dynamic SQL will not affect the calling stored procedure.
- Temp tables created in the dynamic SQL will not be accessible from the calling procedure since they are dropped when the dynamic SQL exits. (Compare to how temp tables created in a stored procedure go away when you exit the procedure.) The block of dynamic SQL can however access temp tables created by the calling procedure.
- If you issue a SET command in the dynamic SQL, the effect of the SET command lasts for the duration of the block of dynamic SQL only and does not affect the caller.
- The query plan for the stored procedure does not include the dynamic SQL. The block of dynamic SQL has a query plan of its own.
As you've seen there are two ways to invoke dynamic
SQL,
sp_executesql and
EXEC().
sp_executesql was added in
SQL 7, whereas
EXEC() has been around since
SQL 6.0. In application code,
sp_executesql should be your choice 95% of the time for reasons that will prevail. For now I will only give two keywords:
SQL Injection and
Query-Plan Reuse.
EXEC() is mainly useful for quick throw-away things and
DBA tasks, but also comes to the rescue in
SQL 2000 and
SQL 7 when the
SQL string exceeds 4000 characters. And, obviously, in
SQL 6.5,
EXEC() is the sole choice. In the next two sections we will look at these two commands in detail.
sp_executesql is a built-in stored procedure that takes two pre-defined parameters and any number of user-defined parameters.
The first parameter
@stmt is mandatory, and contains a batch of one or more
SQL statements. The data type of @stmt is
ntext in
SQL 7 and
SQL 2000, and
nvarchar(MAX) in
SQL 2005 and later. Beware that you must pass an
nvarchar/
ntext value (that is, a Unicode value). A
varchar value won't do.
The second parameter
@params is optional, but you will use it 90% of the time. @params declares the parameters that you refer to in @stmt. The syntax of @params is exactly the same as for the parameter list of a stored procedure. The parameters can have default values and they can have the
OUTPUT marker. Not all parameters you declare must actually appear in the
SQL string. (Whereas all variables that appear in the
SQL string must be declared, either with a
DECLARE inside @stmt, or in @params.) Just like @stmt, the data type of @params is
ntext SQL 2000 and earlier and
nvarchar(MAX) since
SQL 2005.
The rest of the parameters are simply the parameters that you declared in @params, and you pass them as you pass parameters to a stored procedure, either positional or named. To get a value back from your output parameter, you must specify
OUTPUT with the parameter, just like when you call a stored procedure. Note that the first two parameters, @stmt and @params, must be specified positionally. You can provide the parameter names for them, but these names are blissfully ignored.
Let's look at an example. Say that in your database, many tables have a column
LastUpdated, which holds the time a row last was updated. You want to be able to find out how many rows in each table that were modified at least once during a period. This is not something you run as part of the application, but something you run as a
DBA from time to time, so you just keep it as a script that you have a around. Here is what it could look like:
DECLARE @tbl sysname,
@sql nvarchar(4000),
@params nvarchar(4000),
@count int
DECLARE tblcur CURSOR STATIC LOCAL FOR
SELECT object_name(id) FROM syscolumns WHERE name = 'LastUpdated'
ORDER BY 1
OPEN tblcur
WHILE 1 = 1
BEGIN
FETCH tblcur INTO @tbl
IF @@fetch_status <> 0
BREAK
SELECT @sql =
N' SELECT @cnt = COUNT(*) FROM dbo.' + quotename(@tbl) +
N' WHERE LastUpdated BETWEEN @fromdate AND ' +
N' coalesce(@todate, ''99991231'')'
SELECT @params = N'@fromdate datetime, ' +
N'@todate datetime = NULL, ' +
N'@cnt int OUTPUT'
EXEC sp_executesql @sql, @params, '20060101', @cnt = @count OUTPUT
PRINT @tbl + ': ' + convert(varchar(10), @count) + ' modified rows.'
END
DEALLOCATE tblcur
I've put the lines that pertain directly to the dynamic
SQL in bold face. You can see that I have declared the @sql and @params variables to be of the maximum length for
nvarchar variables in
SQL 2000. In
SQL 2005 and later, you may want to make it a routine to declare @sql as
nvarchar(MAX), more about this just below.
When I assign the @sql variable, I am careful to format the statement so that it is easy to read, and I leave in spaces to avoid that two concatenated parts are glued together without space in between, which could cause a syntax error. I put the table name in
quotename() in case a table name has any special characters in it. I also prefix the table name with "
dbo.", which is a good habit, as we will see when we look at dynamic
SQL and
query plans. Overall, I will cover this sort of
good practices more in detail later in the text. Note also the appearance of
''
around the date literal – the rule in
T-SQL is that to include the string delimiter in a string, you must double it.
In this example, the dynamic
SQL has three parameters: one mandatory input parameter, one optional input parameter, and one output parameter. I've assumed that this time the
DBA wanted to see all changes made after 2006-01-01, which is why I've left out @todate in the call to
sp_executesql. Since I left out one variable, I must specify the last, @cnt by name – the same rules as when you call a stored procedure. Note also that the variable is called @cnt in the dynamic
SQL, but @count in the surrounding script. Normally, you might want to use the same name, but I wanted to stress that the @cnt in the dynamic
SQL is only visible within the dynamic
SQL, whereas @count is
not visible there.
You may note that I've prefix the string literals with
N to denote that they are Unicode strings. As @sql and @params are declared as
nvarchar, technically this is not necessary (as long as you stick to your 8-bit character set). However, when you provide any of the strings directly in the call to
sp_executesql, you must specify the N, as in this fairly silly example:
EXEC sp_executesql N'SELECT @x', N'@x int', @x = 2
If you remove any of the
Ns, you will get an error message. Since
sp_executesql is a built-in stored procedure, there is no implicit conversion from
varchar.
You may wonder why I do not pass @tbl as a parameter as well. The answer is that you can't. Dynamic
SQL is just like any other
SQL. You can't specify a table name through a variable in
T-SQL, that's the whole story. Thus, when you need to specify things like table names, column names etc dynamically, you must interpolate them into the string.
If you are on
SQL 2000 or
SQL 7, there is a limitation with
sp_executesql when it comes to the length of the
SQL string. While the parameter is
ntext, you cannot use this data type for local variables. Thus, you will have to stick to
nvarchar(4000). In many cases this will do fine, but it is not uncommon to exceed that limit. In this case, you will need to use
EXEC(), described just below.
Since
SQL 2005, this is not an issue. Here you can use the new data type
nvarchar(MAX) which can hold as much data as
ntext, but without the many restrictions of
ntext.
EXEC() takes one parameter which is an
SQL statement to execute. The parameter can be a concatenation of string variables and string literals, but cannot include calls to functions or other operators. For very simple cases,
EXEC() is less hassle than
sp_executesql. For instance, say that you want to run
UPDATE STATISTICS WITH FULLSCAN on some selected tables. It could look like this:
FETCH tblcur INTO @tbl
IF @@fetch_status <> 0 BREAK
EXEC('UPDATE STATISTICS [' + @tbl + '] WITH FULLSCAN')
In the example with
sp_executesql, I used
quotename(), but here I've let it suffice with adding brackets, in case there is a table named
Order Details (which there is in the
Northwind database). Since
EXEC only permits string literals and string variables to be concatenated and not arbitrary expressions, this is not legal:
EXEC('UPDATE STATISTICS ' + quotename(@tbl) + ' WITH FULLSCAN')
Best practice is to always use a variable to hold the
SQL statement, so the example would better read:
FETCH tblcur INTO @tbl
IF @@fetch_status <> 0 BREAK
SELECT @sql = 'UPDATE STATISTICS ' + quotename(@tbl) + ' WITH FULLSCAN'
EXEC(@sql)
The fact that you can concatenate strings within
EXEC() permits you to make very quick things, which can be convenient at times, but it can lead to poor habits in application code. However, there are situations where this is an enormous blessing. As I mentioned, in
SQL 7 and
SQL 2000, you can in practice only use 4000 characters in your
SQL string with
sp_executesql.
EXEC does not have this limitation, since you can say:
EXEC(@sql1 + @sql2 + @sql3)
Where all of @sql1, @sql2 and @sql3 can be 4000 characters long – or even 8000 characters as
EXEC() permits you to use
varchar.
Since you cannot use parameters, you cannot as easily get values out from
EXEC() as you can with
sp_executesql. You can, however, use
INSERT-EXEC to insert the result set from
EXEC() into a table. I will show you an example
later on, when I also show you how you can use
EXEC() to pass longer strings than 4000 characters to
sp_executesql.
In
SQL 2005 and later,
EXEC() permits impersonation so that you can say:
EXEC(@sql) AS USER = 'mitchell'
EXEC(@sql) AS LOGIN = 'CORDOBA\Miguel'
This is mainly a syntactical shortcut that saves you from embedding the invocation of dynamic
SQL in
EXECUTE AS and
REVERT. (I discuss these statements more in detail in my article
Granting Permissions Through Stored Procedures.)
SQL 2005 adds a valuable extension to
EXEC(): you can use it to execute strings on
linked servers. I will cover this form of
EXEC() in a separate section later in this text.
Before you start to use dynamic
SQL all over town, you need to learn about
SQL injection and how you protect your application against it.
SQL injection is a technique whereby an intruder enters data that causes your application to execute
SQL statements you did not intend it to.
SQL injection is possible as soon there is dynamic
SQL which is handled carelessly, be that
SQL statements sent from the client, dynamic
SQL generated in
T-SQL stored procedures, or
SQL batches executed from
CLR stored procedures. This is not a line of attack that is unique to
MS SQL Server, but all
RDBMS are open to it.
Here is an example. The purpose of the procedure below is to permit users to search for orders by various conditions. A real-life example of such a procedure would have many more parameters, but I've cut it down to two to be brief. (This is, by the way, a problem for which dynamic
SQL is a very good solution.) As the procedure is written, it is open for
SQL injection:
CREATE PROCEDURE search_orders @custid nchar(5) = NULL,
@shipname nvarchar(40) = NULL AS
DECLARE @sql nvarchar(4000)
SELECT @sql = ' SELECT OrderID, OrderDate, CustomerID, ShipName ' +
' FROM dbo.Orders WHERE 1 = 1 '
IF @custid IS NOT NULL
SELECT @sql = @sql + ' AND CustomerID LIKE ''' + @custid + ''''
IF @shipname IS NOT NULL
SELECT @sql = @sql + ' AND ShipName LIKE ''' + @shipname + ''''
EXEC(@sql)
Before we look at a real attack, let's just discuss this from the point of view of user-friendliness. Assume that the input for the parameters @custid and @shipname comes directly from the user and a naïve and innocent user wants to look for orders where
ShipName is
Let's Stop N Shop, so he enters
Let's. Do you see what will happen? Because @shipname includes a single quote, he will get a syntax error. So even if you think that
SQL injection is no issue to you, because you trust your users, you still need to read this section, so that they can search for Brian O'Brien and Samuel Eto'o.
So this is the starting point. A delimiter, usually a single quote, affects your dynamic
SQL, and a malicious user can take benefit of this. For instance, consider this input for @shipname:
' DROP TABLE Orders --
The resulting
SQL becomes:
SELECT * FROM dbo.Orders WHERE 1 = 1 AND ShipName LIKE '' DROP TABLE orders --'
This is a perfectly legal batch of
T-SQL, including the text in red. Since there is something called permissions in
SQL Server, this attack may or may not succeed. A plain user who runs a Windows application and who logs into
SQL Server with his own login, is not likely to have permissions to drop a table. But it is not uncommon for web applications to have a general login that runs
SQL queries on behalf of the users. And if this web app logs into
SQL Server with
sysadmin or
db_owner privileges, the attack succeeds. Mind you, with
sysadmin rights, the attacker can add users and logins as he pleases. And if the service account for
SQL Server has admin privileges in Windows, the attacker has access into your network far beyond
SQL Server through
xp_cmdshell. (Which is disabled by default on
SQL 2005 and later, but if the attacker has achieved
sysadmin rights on the server, he can change that.)
Typically, an attacker first tests what happens if he enters a single quote (
'
) in an input field or a
URL. If this yields a syntax error, the attacker knows that there is a vulnerability. He then finds out if he needs any extra tokens to terminate the query, and then he can add his own
SQL statement. Finally he adds a comment character to kill the rest of the
SQL string to avoid syntax errors. Single quote is the most common character to reveal openings for
SQL injection, but if you have dynamic table and column names, there are more options an attacker could succeed with. Take this dreadful version of
general_select:
CREATE PROCEDURE general_select2 @tblname nvarchar(127),
@key varchar(10) AS
EXEC('SELECT col1, col2, col3
FROM ' + @tblname + '
WHERE keycol = ''' + @key + '''')
and assume that @tblname comes from a
URL. There are quite some options that an attacker could use to take benefit of this hole.
And don't overlook numeric values: they can very well be used for
SQL injection. Of course, in a
T-SQL procedure where the value is passed as an
int parameter there is no risk, but if a supposedly numeric value is directly interpolated into an
SQL string in client code, there is a huge potential for
SQL injection.
Keep in mind that user input comes from more places than just input fields on a form. The most commonly used area for injection attacks on the Internet is probably parameters in URLs and cookies. Thus, be very careful how you handle anything that comes into your application from the outside.
You may think that it takes not only skill, but also luck for someone to find and exploit a hole for
SQL injection. But remember that there are too many hackers out there with too much time on their hands.
SQL injection is a serious security issue, and you must take precautions to protect your applications against it.
One approach I seen mentioned from time to time, is to validate input data in some way, but in my opinion that is not the right way to go. Here are are the three steadfast principles you need to follow:
- Never run with more privileges than necessary. Users that log into an application with their own login should normally only have EXEC permissions on stored procedures. If you use dynamic SQL, it should be confined to reading operations so that users only need SELECT permissions. A web site that logs into a database should not have any elevated privileges, preferably only EXEC and (maybe) SELECT permissions. Never let the web site log in as sa!
- For web applications: never expose error messages from SQL Server to the end user.
- Always used parameterised statements. That is, in a T-SQL procedure use sp_executesql, not EXEC().
The first point is mainly a safeguard, so that if there is a injection hole, the intruder will not be able to do that much harm. The second point makes the task for the attacker more difficult as he cannot get feedback from his attempts.
But it is the third point that is the actual protection, and that we will look a little closer at. The procedure
search_orders above should be coded as:
CREATE PROCEDURE search_orders @custid nchar(5) = NULL,
@shipname nvarchar(40) = NULL AS
DECLARE @sql nvarchar(4000)
SELECT @sql = ' SELECT OrderID, OrderDate, CustomerID, ShipName ' +
' FROM dbo.Orders WHERE 1 = 1 '
IF @custid IS NOT NULL
SELECT @sql = @sql + ' AND CustomerID LIKE @custid '
IF @shipname IS NOT NULL
SELECT @sql = @sql + ' AND ShipName LIKE @shipname '
EXEC sp_executesql @sql, N'@custid nchar(5), @shipname nvarchar(40)',
@custid, @shipname
Since the
SQL string does not include any user input, there is
no opening for
SQL injection. It's as simple as that. By the way, note that since we can include parameters in the parameter list, even if they don't actually appear in the
SQL string, we don't need any complicated logic to build the parameter list, but can keep it static. In the same vein, we can always pass all input parameters to the
SQL string.
As you may recall, you cannot pass everything as parameters to dynamic
SQL, for instance table and column names. In this case you must enclose all such object names in
quotename(), that I will return to in the section
Good Coding Practices and Tips for Dynamic SQL.
The example above was for dynamic
SQL in a
T-SQL stored procedure. The same advice applies to
SQL generated in client code or in a
CLR stored procedure. Since this is so important, here is an example of coding the above in VB6 and
ADO:
Set cmd = CreateObject("ADODB.Command")
Set cmd.ActiveConnection = cnn
cmd.CommandType = adCmdText
cmd.CommandText = " SELECT OrderID, OrderDate, CustomerID, ShipName " & _
" FROM dbo.Orders WHERE 1 = 1 "
If custid <> "" Then
cmd.CommandText = cmd.CommandText & " AND CustomerID LIKE ? "
cmd.Parameters.Append
cmd.CreateParameter("@custid", adWChar, adParamInput, 5, custid)
End If
If shipname <> "" Then
cmd.CommandText = cmd.CommandText & " AND ShipName LIKE ? "
cmd.Parameters.Append cmd.CreateParameter("@shipname", _
adVarWChar, adParamInput, 40, shipname)
End If
Set rs = cmd.Execute
Since the main focus of this text is dynamic
SQL in
T-SQL procedures, I will explain this example only briefly. In
ADO you use
?
as a parameter marker, and you can only pass parameters that actually appear in the
SQL string. (If you specify too many parameters, you will get a completely incomprehensible error message.) If you use the
SQL Profiler to see what
ADO sends to
SQL Server, you will find that it invokes –
sp_executesql.
Protection against
SQL injection is not the only advantage of using parameterised queries. In the section
Caching Query Plans, we will look more in detail on parameterised queries and at a second very important reason to use them. This section also includes an example of composing and sending a parameterised
SQL statement for SqlClient in VB .Net.
You may think that an even better protection against
SQL injection is to use stored procedures with static
SQL only. Yes, this is true,
but! It depends on how you call your stored procedures from the client. If you compose an
EXEC command into which you interpolate the input values, you are back on square one and you are as open to
SQL injection as ever. In
ADO, you need to call your procedure with the command type
adCmdStoredProc and use
.CreateParameter to specify the parameters. By specifying
adCmdStoredProc, you call the stored procedure through
RPC,
Remote Procedure Call, which not only protects you against
SQL injection, but it is also more efficient. Similar measures apply to other client APIs; all APIs I know of supply a way to call a stored procedure through
RPC.
In the
introduction, I presented various strategies for data-access for an application, and I said that in many shops all data access is through stored procedures. In this section, I will look a little closer at the advantages with using stored procedures over sending
SQL statements from the client. I will also look at what happens when you use dynamic
SQL in a stored procedure, and show that you lose some of the advantages with stored procedures, whereas other are unaffected.
Historically, using stored procedures has been
the way to give users access to data. In a locked-down database, users do not have permissions to access tables directly. Instead, the application performs all access through stored procedures that retrieve and update data in a controlled way, so that users only get to see data they have access to, and they cannot perform updates that violate business rules. This works as long as the procedure and the tables have the same owner, typically
dbo (the database owner), through a mechanism known as
ownership chaining.
As I have already mentioned,
ownership chaining does not work when you use dynamic SQL. The reason for this is very simple: the block of dynamic
SQL is not a procedure and does not have any owner. Thus the chain is broken.
SQL 2005 and later
In
SQL 2005 and later versions of
SQL Server, this can be addressed by signing a procedure that uses dynamic
SQL with a certificate. You associate the certificate with a user, and grant that user (which is a user that cannot log in) the rights needed for the dynamic
SQL to execute successfully. A second method is to use the
EXECUTE AS clause to impersonate a user that has been granted the necessary permissions. This method is easier to use, but has side effects that can have unacceptable consequences for auditing, row-level security schemes and system monitoring. For this reason, my strong recommendation is to use certificates.
Describing these methods more closely, would take up too much space here. Instead I've written a separate article about them,
Giving Permissions through Stored Procedures, where I discusses both certificates and impersonation in detail, and I also take a closer look on ownership chaining.
If you write
CLR procedures that perform data access, the same is true for them. Ownership chaining never applies since all data access in a
CLR procedure is through dynamic
SQL. But you can use certificates or impersonation to avoid having to give users direct permissions on the tables.
SQL 2000 and earlier
On
SQL 2000 there is
no way to combine dynamic
SQL with the encapsulation of permissions that you can get through stored procedures. Any use of dynamic
SQL requires that the users have direct permissions on the accessed tables. If your security scheme precludes giving users permissions to access tables directly,
you cannot use dynamic SQL. It is that plain and simple. Depending on the sensitivity of the data in the application, it may be acceptable to give the users
SELECT permissions on the tables (or on some tables) to permit the use of dynamic
SQL. I strongly recommend against granting users
INSERT, UPDATE and
DELETE rights on tables only to permit dynamic
SQL in some occasional procedure.
There are however, some ways to arrange so that users only have access to the data through the application. All and all, there are three alternatives, application roles, "application proxies" and Terminal Server. All require you to change the application architecture or infrastructure, so it is nothing you introduce at whim.
Application roles were introduced in
SQL 7. Users log into
SQL Server but have no permissions on their own beyond the database access. Instead, the application activates the application role by sending a password somehow embedded into it, and this application role has the permissions needed. With "application proxies", the application authenticates the users outside
SQL Server and logs into
SQL Server on their behalf with a proxy login. This proxy login impersonates the users in
SQL Server, and thus their permissions apply. However, since the users do not have any login on their own, they cannot log into
SQL Server outside the application. In
Giving Permissions..., I discuss these two methods a little further.
The final possibility is to put the application on Terminal Server. Users log into the terminal server which is set up so that all they can do is to run this application. Furthermore, the network is configured so that they cannot access
SQL Server from their regular computers. Thus, the application is their only way to the data.
For all these methods, keep in mind about
SQL injection, and do not grant more permissions than needed.
Every query you run in
SQL Server requires a query plan. When you run a query the first time,
SQL Server builds a query plan for it – or as the terminology goes – it
compiles the query.
SQL Server saves the plan in cache, and next time you run the query, the plan is reused. The query plan stays in cache until it's aged out because it has not been used for a while, or it is invalidated for some reason. (Why this happens falls outside the scope of this article.)
The reuse of cached query plans is very important for the performance of queries where the compilation time is in par with the execution time or exceeds it. If a query needs to run for four minutes, it does not matter much if the query is recompiled for an extra second each time. On the other hand, if the execution time of the query is 40 ms but it takes one second to compile the query, there is a huge gain with the cached plan, particularly if the query is executed over and over again.
Up to
SQL 6.5 the only plans there were put into the cache were plans for stored procedures. Loose batches of
SQL were compiled each time. And since the query plan for dynamic
SQL is not part of the stored procedure, that included dynamic
SQL as well. Thus in
SQL 6.5, the use of dynamic
SQL nullified the benefit with stored procedures in this regard.
Starting with
SQL 7,
SQL Server also caches the plans for bare statements sent from a client or generated through dynamic
SQL. Say that you send this query from the client, or execute it with
EXEC():
SELECT O.OrderID, SUM(OD.UnitPrice * OD.Quantity)
FROM Orders O
JOIN [Order Details] OD ON O.OrderID = OD.OrderID
WHERE O.OrderDate BETWEEN '19980201' AND '19980228'
AND EXISTS (SELECT *
FROM [Order Details] OD2
WHERE O.OrderID = OD2.OrderID
AND OD2.ProductID = 76)
GROUP BY O.OrderID
The query returns the total order amount for the orders in February 1998 that contained the product
Lakkalikööri. SQL Server will put the plan into the cache, and next time you run this query, the plan will be reused. But only if it is exactly the same query. Since the cache lookup is by a hash value computed from the query text, the cache is space- and case-sensitive. Thus, if you add a single space somewhere, the plan is not reused. More importantly, it is not unlikely that next time you want to run the query for a different product, or a different period.
All this changes, if you instead use
sp_executesql to run your query with parameters:
DECLARE @sql nvarchar(2000)
SELECT @sql = 'SELECT O.OrderID, SUM(OD.UnitPrice * OD.Quantity)
FROM dbo.Orders O
JOIN dbo.[Order Details] OD ON O.OrderID = OD.OrderID
WHERE O.OrderDate BETWEEN @from AND @to
AND EXISTS (SELECT *
FROM dbo.[Order Details] OD2
WHERE O.OrderID = OD2.OrderID
AND OD2.ProductID = @prodid)
GROUP BY O.OrderID'
EXEC sp_executesql @sql, N'@from datetime, @to datetime, @prodid int',
'19980201', '19980228', 76
The principle for cache lookup is the same as for a non-parameterised query:
SQL Server hashes the query text and looks up the hash value in the cache, still in a case- and space-sensitive fashion. But since the parameter values are not part of the query text, the same plan can be reused even when the input changes.
To make this really efficient there is one more thing you need to observe. Do you see that I've prefixed all tables in the query with
dbo? There is a very important reason for this. Users can have different default schema, and up to
SQL 2000, all users had a default schema equal to their username. Thus, if default schema for
user1 is
user1, and this users runs a query that goes "
SELECT ... FROM Orders",
SQL Server must first check if there is a table
user1.Orders, before it looks for
dbo.Orders. Since
user1.Orders could appear on the scene at any time,
user1 cannot share cache entry with a user different default schema. Yes, in
SQL 2005, it is perfectly possible that all users have
dbo as their default schema, but it seems to be a bad idea to rely on it.
If you instead use stored procedures, it is not equally important to prefix tables with
dbo. Microsoft still recommends that you do, but even if you don't, users with different default schema can share the same query plan.
From what I have said here, it follows that if you use dynamic
SQL with
EXEC() you lose an important benefit of stored procedures whereas with
sp_executesql you don't. At least in theory. It's easy to forget that
dbo, and if you leave it out in just a single place in the query, you will get as many entries in the cache for the query as there are users running it. Recall also that the cache is space- and case-sensitive, so if you generate the same query in several places, you may inadvertently have different spacing or inconsistent use of case. And this is not restricted to the
SQL statement, the parameter list is as much part of the cache entry. Furthermore, since the cache lookup is by a hash value computed from the query text, I would assume that this is somewhat more expensive than looking up a stored procedure. In fact, under extreme circumstances, heavy use of dynamic
SQL, can lead to serious performance degradation. Some of my
MVP colleagues have observed systems with lots of memory (> 20 GB) when the plan cache has been so filled with plans for
SQL statements, that there have been hash collisions galore, and the cache lookup alone could take several seconds. Presumably, the applications in question either did not use parameterised queries at all, or they failed to prefix tables with
dbo.
So far, I've only talked about dynamic
SQL in stored procedures. But in this regard there is very little difference to
SQL statements sent from the client, or
SQL statements generated in
CLR procedures. The same rules apply: unparameterised statements are cached but with little probability for reuse, whereas parameterised queries can be as efficient as stored procedures if you remember to always prefix the tables with
dbo. (And still with the caveat that the cache lookup is space- and case-sensitive.) Most client APIs implement parameterised queries by calling
sp_executesql under the covers.
In the section on
SQL Injection, I included an example on how to do parameterised queries with
ADO and VB6. Here is an example with VB .Net and SqlClient:
cmd.CommandType = System.Data.CommandType.Text
cmd.CommandText = _
" SELECT O.OrderID, SUM(OD.UnitPrice * OD.Quantity)" & _
" FROM dbo.Orders O " & _
" JOIN dbo.[Order Details] OD ON O.OrderID = OD.OrderID" & _
" WHERE O.OrderDate BETWEEN @from AND @to" & _
" AND EXISTS (SELECT *" & _
" FROM dbo.[Order Details] OD2" & _
" WHERE O.OrderID = OD2.OrderID" & _
" AND OD2.ProductID = @prodid)" & _
" GROUP BY O.OrderID"
cmd.Parameters.Add("@from", SqlDbType.Datetime)
cmd.Parameters("@from").Value = "1998-02-01"
cmd.Parameters.Add("@to", SqlDbType.Datetime)
cmd.Parameters("@to").Value = "1998-02-28"
cmd.Parameters.Add("@prodid", SqlDbType.Int)
cmd.Parameters("@prodid").Value = 76
In contrast to
ADO, SqlClient uses names with @ for parameters. The syntax for defining parameters is similar to
ADO, but not identical. This article is long enough, so I will not go into details on how the
Parameters collection works. Instead, I refer you to
MSDN where both SqlClient and
ADO are documented in detail. Whatever client
API you are using,
please learn how to use parameterised commands with it. Yes, there is a tone of desperation in my voice. I don't know how many posts I've seen on the newsgroups over the years where people build their
SQL strings by interpolating the values from input fields into the
SQL string, and thereby degrading the performance of their application, and worst of all opening their database to
SQL injection.
... and just when you thought you were safe, I need to turn this upside down. Recall what I said in the beginning of this section, that if the query is going to run for four minutes, one second extra for compilation is not a big deal. And if that recompilation slashes the execution time from forty minutes to four, there is a huge gain. Most queries benefit from cached parameterised plans, but not all do. Say that you have a query where the user can ask for data for some time span. If the user asks for a summary for a single day, there is a good non-clustered index that can be used for a sub-second response time. But if the request is for the entire year, the same index would be a disaster, and a table scan is better. Starting with
SQL 2005 you can force a query to be recompiled each time it is executed by adding
OPTION (RECOMPILE) to the end of the query, and thus you can still use
sp_executesql to get the best protection against
SQL injection. On
SQL 2000 and earlier, it may in fact be better to interpolate critical parameters into the query string when you need to force recompilation each time.
For the sake of completeness, I should mention that
SQL Server is able to auto-parameterise queries. If you submit:
SELECT OrderID, OrderDate FROM dbo.Orders WHERE CustomerID = N'ALFKI'
SQL Server may recast this as
SELECT OrderID, OrderDate FROM dbo.Orders WHERE CustomerID = @P1
so if next time you submit
BERGS instead of
ALFKI, the query plan will be reused. Auto-parameterisation comes in two flavours: simple and forced. Simple is the default and is the only option on
SQL 2000 and earlier. With simple parameterisation, auto-parameterisation happens only with very simple queries, and, it seems, with some inconsistency. With forced parameterisation,
SQL Server parameterises all queries that comes its way (with some exceptions documented in Books Online). Forced parameterisation is, in my opinion, mainly a setting to cover up for poorly designed third-party application that uses unparameterised dynamic
SQL. For your own development you should not rely on any form of auto-parameterisation. (But in the situation you really a want a new query plan each time, you may have to verify that it doesn't happen when you don't want to.)
They say seeing is believing. Here is a demo that you can try on yourself, if you have
SQL 2005. First create this database:
CREATE DATABASE many_sps
go
USE many_sps
go
DECLARE @sql nvarchar(4000),
@x int
SELECT @x = 200
WHILE @x > 0
BEGIN
SELECT @sql = 'CREATE PROCEDURE abc_' + ltrim(str(@x)) +
'_sp @orderid int AS
SELECT O.OrderID, O.OrderDate, O.CustomerID, C.CompanyName,
Prodcnt = OD.cnt, Totalsum = OD.total
FROM Northwind..Orders O
JOIN Northwind..Customers C ON O.CustomerID = C.CustomerID
JOIN (SELECT OrderID, cnt = COUNT(*), total = SUM(Quantity * UnitPrice)
FROM Northwind..[Order Details]
GROUP BY OrderID) AS OD ON OD.OrderID = O.OrderID
WHERE O.OrderID = @orderid'
EXEC(@sql)
SELECT @x = @x - 1
END
Then in
SQL Server Management Studio 2005, press F7 navigate down to the list of stored procedures. Select all procedures. Then from the context menu select to script them as
CREATE TO to a new query window. How long time this takes depends on your hardware, but on my machine it took 90 seconds and at the same time
SQL Server grabbed over 250
MB of memory. If you use the Profiler to see what Mgmt Studio is up to, you will see that for each procedure, Mgmt Studio emits a couple of queries with the procedure name embedded. That is, no parameterised statements. Once scripting is complete, issue this command:
ALTER DATABASE many_sps SET PARAMETERIZATION FORCED
and redo the operation. On my machine scripting now completed in five seconds!. This demonstrates that the difference between parameterised and unparameterised can be dramatic. (And that Microsoft can not use their own products properly.) If you run
SQL Server on your local machine, you can see this from one more angle, you can stop and restart
SQL Server before the two scripting operations, and then use Task Manager to see how much physical memory
SQL Server uses in the two cases. That difference lies entirely in the plan cache.
This particular issue have been addressed in
SQL Server Management Studio 2008.
SSMS 2008 has its own scripting issues, but they have nothing to do with the topic of this article.
Another advantage with stored procedures over
SQL sent from the client is that less bytes travel the network. Rather than sending a 50-line query over the network, you only need to pass the name of a stored procedure and a few parameters. This gets more significant if the computation requires several queries, possibly with logic in between. If all logic is outside the database, this could mean that data has to travel up to the client, only to travel back in the next moment. With stored procedures you can use temp tables to hold intermediate results. (You can use temp tables from outer layers as well, although it may require some careful use of your client
API.)
In this case, the dividing line goes between sending
SQL from the client or running stored procedures. If the stored procedures use static
SQL only, or invoke dynamic
SQL does not matter, nor does it matter if it is a
CLR procedure. You still get the gains of reduced network traffic.
This is not a question of security or performance, but one of good programming practice and modularising your code. By using stored procedures, you don't have to bog down your client code with the construction of
SQL statements. Then again, it depends a little on what you put into those stored procedure. Myself, I am of the school that the business logic should be where the data is, and in this case there is no dispute that you should use stored procedures to encapsulate your logic.
But there are also people who like to see the database as a unintelligent container of data, and who prefer to have the business logic elsewhere. In this case, the arguments for using stored procedures for encapsulation may not be equally compelling. You could just as well employ careful programming practices in your client language and send
SQL strings.
Nothing of this changes if you use dynamic
SQL in your stored procedures. The stored procedure is still a container for some piece of logic, and how it looks on the inside does not matter. I'm here assuming that most of your procedures use static
SQL only. If all your stored procedures generate dynamic
SQL, then you are probably better off in this regard to do it all in client code. Then again, sometimes there is no other application than Query Analyzer or
SQL Server Management Studio. (Typically this would be tasks that are run by an admin.) In this case, the only container of logic available is stored procedures, and it's immaterial whether they use dynamic
SQL or not.
In a complex system with hundreds of tables, you may need to know where a certain table or column is referenced, because you are considering changing or dropping it. If all access to tables is from static
SQL in stored procedures, you may be able find all references by using the system stored procedure
sp_depends or query a system table directly. (
sysdepends in
SQL 2000,
sys.sql_dependencies in
SQL 2005 and later. In
SQL 2008 there is also
sys.sql_expression_dependencies.) I say may, because it is very difficult to maintain complete dependency information in
SQL Server. If you drop and recreate a table, all dependency information for the table is lost. What I do myself is to regularly build an empty database from our version-control system, and since our build tool loads all tables before any stored procedure or trigger, I know that I can trust the dependency information in that database.
If you throw dynamic
SQL into the mix – be that
SQL sent from client, dynamic
SQL in
T-SQL procedures, or
SQL generated by
CLR stored procedures - you lose this opportunity. The alternative is to employ brute-force search, and if the construction of dynamic
SQL is confined to some well-defined set of modules, this may work. If not, you may end up with a database where no one ever dares to drop or change a column or a table, and which eventually becomes unbearable complex and inefficient because of all the legacy baggage it's carrying around.
While the main dividing line here is between static
SQL and any form of dynamic
SQL, dynamic
SQL in
T-SQL stored procedures is probably the least harmful, as there is less code to search. You can even search the column
sys.sql_modules.definition using
SQL. Available since
SQL 2005. In
SQL 2000 you can search
syscomments, but as the procedure text there is chopped into 4000-char slices, this is less reliable.
In any case, an occasional stored procedure that uses dynamic
SQL is not likely cause the Armageddon I pictured above. But it is a good argument for being restrictive with dynamic
SQL in any form.
One distinct advantage of writing stored
T-SQL procedures is that you get a syntax check directly. With dynamic
SQL, a trivial syntax error may not show up until run time. Even if you test your code carefully, there may be some query, or some variation of a query, that is only run in odd cases and not covered in your test suite.
It has to be admitted that the strength of this argument is somewhat reduced by the fact that
T-SQL is not too industrious on reporting semantic errors. Because of deferred name resolution,
SQL Server will not examine queries in stored procedures, where one or more tables are missing, be that misspellings or temp tables created within the procedure. Nevertheless,
SQL Server does report sufficiently many errors, for this to be a very important reason to use stored procedures.
Another side of this coin is that when you write dynamic
SQL, you embed the
SQL code into strings, which makes programming far more complex. Your
SQL code is a string delimited by single quotes(
'
), and this string may include strings itself, and to include a single quote into the string you need to double it. You can easily get lost in a maze of quotes if you don't watch out. (In the section
Good Coding Practices and Tips for Dynamic SQL, we will look a little closer on how to deal with this problem.) The most commonly used client languages with
T-SQL - Visual Basic,
C#, C++, VBScript – all use the double quote (
"
) as their string delimiter, so dynamic
SQL in client code or
CLR stored procedures is less prone to that particular problem. Then again, in VB you don't have multi-line strings, so at the end of each line you have to have a double quote, an ampersand and an underscore for continuation. It sure does not serve to make coding easier. You are relieved from all this hassle, if you use stored procedures with static
SQL only.
Somewhat surprisingly, one of the strongest arguments for stored procedures today may be that they permit you to quickly address bugs and performance problems in the application.
Say that you generate
SQL statements in your application, and that there is an error in it. Or that it simply performs unbearably slow. To fix it, you need to build a new executable or
DLL, which is likely to contain other code that also has changed since the module was shipped. This means that before the fix can be put into production, the module will have to go through QA and testing.
On the other hand, if the problem is in a stored procedure, and the fix is trivial, you may be able to deploy a fix into production within an hour after the problem was reported.
This difference is even more emphasised, if you are an
ISV and you ship a product that the customer is supposed administer himself. If your application uses stored procedures, a
DBA may be able to address problems directly without opening a support case. For instance, if a procedure runs unacceptably slow, he may be able to fix that by adding an index hint. In contrast, with an application that generates
SQL in the client, his hands will be tied. Of course, as an
ISV you may not want your customers to poke around in your code, even less to change it. You may also prefer to ship your procedures
WITH ENCRYPTION to protect your intellectual property, but this is best controlled through license agreements. (If you encrypt your procedures, the
DBA can still change them, as long as he is able to find a way to decrypt them. Which any
DBA that knows how to use Google can do.)
In this case, it does not matter whether the stored procedure uses static
SQL only, or if it also uses dynamic
SQL. For
CLR procedures it depends on many objects you have in your assemblies. If you have one assembly per object, installing a new version of a
CLR procedure is as simple as replacing a
T-SQL procedure.
(I should add that
SQL 2005 offers a new feature that permits the
DBA to change the plan for a query without altering the code, by adding a
plan guide. This feature has been further enhanced in
SQL 2008. This is quite an advanced feature, and I refer to Books Online for details.)
Writing dynamic
SQL is a task that requires discipline to avoid losing control over your code. If you just go ahead, your code can become very messy, and be difficult to read, troubleshoot and maintain. In this section, we will look at how to avoid this. I will also discuss some special cases: how you can use
sp_executesql for input longer than 4000 chars in
SQL 2000, and how to use dynamic
SQL with cursors, and the combination of dynamic
SQL and user-defined functions.
When you write a stored procedure that generates dynamic
SQL, you should always include a @debug parameter:
CREATE PROCEDURE dynsql_sp @par1 int,
...
@debug bit = 0 AS
...
IF @debug = 1 PRINT @sql
When you get a syntax error from the dynamic
SQL, it can be very confusing, and you may not even discern where it comes from. And even when you do, it can be very difficult to spot the error only by looking at the code that constructs the
SQL. Once the
SQL code is slapped in your face, the error is much more likely to be apparent to you. So always include a @debug parameter and a
PRINT!
As I've already mentioned, one problem with dynamic
SQL is that you often need to deal with nested string delimiters. For instance, in the beginning of this article, I showed you the procedure
general_select2. Here it is again:
CREATE PROCEDURE general_select2 @tblname nvarchar(127),
@key varchar(10) AS
EXEC('SELECT col1, col2, col3
FROM ' + @tblname + '
WHERE keycol = ''' + @key + '''')
(Again, I like to emphasise that this sort of procedure is poor use of dynamic
SQL.)
SQL is one of those language where the method to include a string delimiter itself in a string literal is to double it. So those four consecutive single quotes
('''')
is a string literal with the value of a one single quote (
'
). This is a fairly simple example; it can get a lot worse. If you work with dynamic
SQL, you must learn to master nested strings. Obviously, in this case you can easily escape the mess by using
sp_executesql instead – yet another reason to use parameterised statements. However, there are situations when you need to deal with nested quotes even with
sp_executesql. For instance, earlier in this article, I had this code:
N' WHERE LastUpdated BETWEEN @fromdate AND '
N' coalesce(@todate, ''99991231'')'
We will look at some tips of dealing with nested strings later in this section.
Another thing to be careful with is the spacing as you concatenate the parts of a query. Here is an example where it goes wrong:
EXEC('SELECT col1, col2, col3
FROM' + @tblname + '
WHERE keycol = ''' + @key + '''')
See that there is a space missing after
FROM? When you compile the stored procedure you will get no error, but when you run it, you will be told that the columns
keycol,
col1, col2, col3 are missing. And since you know that the table you passed to the procedure has these columns you will be mighty confused. But this is the actual code generated, assuming the parameters
foo and
abc:
SELECT col1, col2, col3
FROMfoo
WHERE keycol = 'abc'
This is not a syntax error, because
FROMfoo
is a column alias to
col3. And, yes, it's legal to use a
WHERE clause, even if there is no
FROM clause. But since the columns cannot exist out of the blue, you get an error for that. This is also a good example why you should use debug prints. If the code looks like this:
SELECT @sql =' SELECT col1, col2, col3
FROM' + @tblname + '
WHERE keycol = ''' + @key + ''''
IF @debug = 1 PRINT @sql
EXEC(@sql)
It would be much easier to find the error by running the procedure with @debug = 1. (Obviously, had we included the
dbo prefix, this error could not occur at all.)
Overall, good formatting is essential when working with dynamic
SQL. Try to write the query as you would have written it in static
SQL, and then add the string delimiters outside of that.
T-SQL permits you to embed newlines in string literals (as testified by the example above), so in contrast to VB, you don't need a string delimiter on each line. An advantage of this is that your debug
PRINT is easier to read, and in the case of a syntax error the line number in the error message may guide you.
You may prefer, though, to have a string terminator on each line. A tip in such case is to do something like this:
EXEC(' SELECT col1, col2, col3 ' +
' FROM ' + @tblname +
' WHERE keycol = ''' + @key + '''')
As you see, I have a space after the opening single quote on each line to avoid syntax problems due to missing spaces.
Passing table and column names as parameters to a procedure with dynamic
SQL is rarely a good idea for application code. (It can make perfectly sense for admin tasks). As I've said, you cannot pass a table or a column name as a parameter to
sp_executesql, but you must interpolate it into the
SQL string. Still you should protect it against
SQL injection, as a matter of routine. It could be that bad it comes from user input.
To this end, you should use the built-in function
quotename() (added in
SQL 7).
quotename() takes two parameters: the first is a string, and the second is a pair of delimiters to wrap the string in. The default for the second parameter is []. Thus,
quotename('Orders')
returns
[Orders]
.
quotename() takes care of nested delimiters, so if you have a really crazy table name like
Left]Bracket,
quotename() will return
[Left]]Bracket]
.
Note that when you work with names with several components, each component should be quoted separately.
quotename('dbo.Orders')
returns
[dbo.Orders]
, but that is a table in an unknown schema of which the first four characters are
d,
b,
o and a dot. As long as you only work with the
dbo schema, best practice is to add
dbo in the dynamic
SQL and only pass the table name. If you work with different schemas, pass the schema as a separate parameter. (Although you could use the built-in function
parsename() to split up a @tblname parameter in parts.)
While
general_select still is a poor idea as a stored procedure, here is nevertheless a version that summarises some good coding virtues for dynamic
SQL:
CREATE PROCEDURE general_select @tblname nvarchar(128),
@key varchar(10),
@debug bit = 0 AS
DECLARE @sql nvarchar(4000)
SET @sql = 'SELECT col1, col2, col3
FROM dbo.' + quotename(@tblname) + '
WHERE keycol = @key'
IF @debug = 1 PRINT @sql
EXEC sp_executesql @sql, N'@key varchar(10)', @key = @key
- I'm using sp_executesql rather than EXEC().
- I'm prefixing the table name with dbo.
- I'm wrapping @tblname in quotename().
- There is a @debug parameter.
The main purpose of
quotename() is to quote object names, which is why the default for the second parameter is brackets. But you can specify other delimiters as well, including single quotes, which means that any single quote in the input is doubled. Thus, if you for some reason prefer to use
EXEC(), you can use
quotename() to protect yourself against
SQL injection by help of this function. Here is an example.
IF @custname IS NOT NULL
SELECT @sql = @sql + ' AND custname = ' + quotename(@custname, '''')
Say that @custname has the value
D'Artagnan. This part of the dynamic
SQL becomes:
AND custname = 'D''Artagnan'
There is a limitation with
quotename(): its input parameter is
nvarchar(128), so it does not handle long strings. A remedy is this user-defined function:
CREATE FUNCTION quotestring(@str nvarchar(1998)) RETURNS nvarchar(4000) AS
BEGIN
DECLARE @ret nvarchar(4000),
@sq char(1)
SELECT @sq = ''''
SELECT @ret = replace(@str, @sq, @sq + @sq)
RETURN(@sq + @ret + @sq)
END
This version is for
SQL 2000. On
SQL 2005 and later, replace 1998 and 4000 with
MAX, to make it work for any string length. Here is an example of using this function:
IF @custname IS NOT NULL
SELECT @sql = @sql + ' AND custname = ' + dbo.quotestring(@custname)
The result is the same as above.
On
SQL 7, you would have to implement
quotestring as a stored procedure.
SQL 6.5 does not have
replace(), so you are a bit out of luck there.
So with
quotename() and
quotestring(), do we have as good protection against
SQL injection as we have with parameterised commands? Maybe. I don't know of any way to inject
SQL that slips through
quotename() or
quotestring(). Nevertheless, you are interpolating user input into the
SQL string, whereas with parameterised commands, you don't.
(I should add that I got the suggestion to use
quotename() or a user-defined function from
SQL Server
MVP Steve Kass.)
Another alternative to escape the mess of nested quotes, is make use of the fact that
T-SQL actually has two string delimiters. To wit, if the setting
QUOTED_IDENTIFIER is
OFF, you can also use double quotes(
"
) as a string delimiter. The default for this setting depends on context, but the preferred setting is
ON, and it must be
ON in order to use XQuery, indexed views and indexes on computed columns. Thus, this is not a first-rate alternative, but if you are aware of the caveats, you can do this:
CREATE PROCEDURE general_select @tblname nvarchar(127),
@key key_type,
@debug bit = 0 AS
DECLARE @sql nvarchar(4000)
SET @sql = 'SET QUOTED_IDENTIFIER OFF
SELECT col1, col2, col3
FROM dbo.' + quotename(@tblname) + '
WHERE keycol = "' + @key + '"'
IF @debug = 1 PRINT @sql
EXEC(@sql)
Since there are two different quote characters, the code is much easier to read. The single quotes are for the
SQL string and the double quotes are for the embedded string literals.
All and all, this is an inferior method to both
sp_executesql and
quotestring(), since you are not protected against
SQL injection (what if @key includes a double quote?). But it would be OK to do for some sysadmin task (where
SQL injection is not likely to be an issue), and it may be the best way to go on
SQL 6.5.
There is a limitation with
sp_executesql on
SQL 2000 and
SQL 7, since you cannot use longer
SQL strings than 4000 characters. (On
SQL 2005 and later, you should use
nvarchar(MAX) to avoid this problem.) If you want to use
sp_executesql when your query string exceeds this limit to make use of parameterised query plans, there is actually a workaround. To wit, you can wrap
sp_executesql in
EXEC():
DECLARE @sql1 nvarchar(4000),
@sql2 nvarchar(4000),
@state char(2)
SELECT @state = 'CA'
SELECT @sql1 = N'SELECT COUNT(*)'
SELECT @sql2 = N'FROM dbo.authors WHERE state = @state'
EXEC('EXEC sp_executesql N''' + @sql1 + @sql2 + ''',
N''@state char(2)'',
@state = ''' + @state + '''')
This works, because the @stmt parameter to
sp_executesql is
ntext, so by itself, it does not have any limitation in size.
You can even use output parameters by using
INSERT-EXEC, as in this example:
CREATE TABLE #result (cnt int NOT NULL)
DECLARE @sql1 nvarchar(4000),
@sql2 nvarchar(4000),
@state char(2),
@mycnt int
SELECT @state = 'CA'
SELECT @sql1 = N'SELECT @cnt = COUNT(*)'
SELECT @sql2 = N'FROM dbo.authors WHERE state = @state'
INSERT #result (cnt)
EXEC('DECLARE @cnt int
EXEC sp_executesql N''' + @sql1 + @sql2 + ''',
N''@state char(2),
@cnt int OUTPUT'',
@state = ''' + @state + ''',
@cnt = @cnt OUTPUT
SELECT @cnt')
SELECT @mycnt = cnt FROM #result
You have my understanding if you think this is too messy to be worth it.
This very simple: you
cannot use dynamic
SQL from used-defined functions written in
T-SQL. This is because you are not permitted do anything in a
UDF that could change the database state (as the
UDF may be invoked as part of a query). Since you can do anything from dynamic
SQL, including updates, it is obvious why dynamic
SQL is not permitted.
I've seen more than one post on the newsgroups where people have been banging their head against this. But if you want to use dynamic
SQL in a
UDF, back out and redo your design. You have hit a roadblock, and in
SQL 2000 there is no way out.
In
SQL 2005 and later, you could implement your function as a
CLR function. Recall that all data access from the
CLR is dynamic
SQL. (You are safe-guarded, so that if you perform an update operation from your function, you will get caught.) A word of warning though: data access from scalar UDFs can often give performance problems. If you say
SELECT ... FROM tbl WHERE dbo.MyUdf(somecol) = @value
and
MyUdf performs data access, you have more or less created a hidden cursor.
Not that cursors are something you should use very frequently, but people often ask about using dynamic
SQL with cursors, so I give an example for the sake of completeness. You cannot say
DECLARE CURSOR EXEC(); you have to put the entire
DECLARE CURSOR statement in dynamic
SQL:
SELECT @sql = 'DECLARE my_cur INSENSITIVE CURSOR FOR ' +
'SELECT col1, col2, col3 FROM ' + @table
EXEC sp_executesql @sql
You may be used to using the
LOCAL keyword with your cursors. However, it is important to understand that you must use a global cursor, as a local cursor will disappear when the dynamic
SQL exits. (Because, as you know by now, the dynamic
SQL is its own scope.) Once you have declared the cursor in this way, you can use the cursor in a normal fashion. You must be extra careful with error-handling though, so that you don't exit the procedure without deallocating the cursor.
There is however a way to use locally-scoped cursors with dynamic
SQL. Anthony Faull pointed out to me that you can achieve this with cursor variables, as in this example:
DECLARE @my_cur CURSOR
EXEC sp_executesql
N'SET @my_cur = CURSOR STATIC FOR
SELECT name FROM dbo.sysobjects;
OPEN @my_cur',
N'@my_cur cursor OUTPUT', @my_cur OUTPUT
FETCH NEXT FROM @my_cur
You refer to a cursor variable, just like named cursors, but there is an @ in front, and, as you see from the example, you can pass them as a parameters. (I have to confess I have never seen any use for cursor variables until Anthony Faull was kind to send me this example.)
A special feature added in
SQL 2005 is that you can use
EXEC() to run pass-through queries on a linked server. This could be another instance of
SQL Server, but it could also be an Oracle server, an Access database, Active directory or whatever. The
SQL could be a single query or a sequence of statements, and could it be composed dynamically or be entirely static. The syntax is simple, as seen by this example:
EXEC('SELECT COUNT(*) FROM ' + @db + '.dbo.sysobjects') AT SQL2K
SQL2K is here a linked server that has been defined with
sp_addlinkedserver.
There is one thing that you can do with
EXEC() at a linked server, that you cannot do with
EXEC() on a local server: you can use parameters, both for input and output. The confuse matters, you don't use parameters with names starting with @, instead you use question marks (
?
) as parameter holders. Say that you are on an
SQL 2005 box, and you are dying to know how many orders
VINET had in the
Northwind database. Unfortunately,
SQL 2005 does not ship with
Northwind, but you have a linked server set up to an instance of
SQL 2000 with
Northwind. You can run this:
DECLARE @cnt int
EXEC('SELECT ? = COUNT(*) FROM Northwind.dbo.Orders WHERE CustomerID = ?',
@cnt OUTPUT, N'VINET') AT SQL2K
SELECT @cnt
Note here that the parameter values must appear in the order the parameter markers appear in the query. When passing a parameter, you can either specify a constant value or a variable.
You may ask why the inconsistency with a different parameter marker from
sp_executesql? Recall that linked servers in
SQL Server are always accessed through an
OLE DB provider, and
OLE DB uses
?
as the parameter marker, a convention inherited from
ODBC. OLE DB translates that parameter marker as is appropriate for the data source on the other end. (Not all
RDBMS use @ for variables.)
As with regular
EXEC(), you can specify
AS USER/LOGIN to use impersonation:
EXEC('SELECT COUNT(*) FROM ' + @db + '.dbo.sysobjects')
AS USER = 'davidson' AT SQL2K
This begs the question: is
davidson here a local user or a remote user at
SQL2K? Books Online is not very clear about this, but I did some quick experimenting, and found that what you are impersonating is a local user or login, not a login on the remote server. (The login to use on the remote server can be defined with
sp_addlinkedsrvlogin.)
When you read the various newsgroups on
SQL Server, there is almost every day someone who asks a question that is answered with
use dynamic SQL with a quick example to illustrate, but ever so often the person answering forgets to tell about the implications on permissions or
SQL injection. On top of that, far too many examples use
EXEC() without any thought of query plans. And while many of these questions taken by the letter have no other answer than dynamic
SQL, there is often a real business problem which has a completely different solution without dynamic
SQL – and a much better one.
So, in this section I will explore some situations where you
could use dynamic
SQL. You will see that sometimes dynamic
SQL is a good choice, but also that in many cases that it is an outright bad idea.
A common question is why the following does not work:
CREATE PROCEDURE my_proc @tablename sysname AS
SELECT * FROM @tablename
As we have seen, we can make this procedure work with help of dynamic
SQL, but it should also be clear that we gain none of the advantages with generating that dynamic
SQL in a stored procedure. You could just as well send the dynamic
SQL from the client. So, OK: 1) if the
SQL statement is very complex, you save some network traffic and you do encapsulation. 2) As we have seen, starting with
SQL 2005 there are methods to deal with permissions. Nevertheless, this
is a bad idea.
There seems to be several reasons why people want to parameterise the table name. One camp appears to be people who are new to
SQL programming, but have experience from other languages such as
C++, VB etc where parameterisation is a good thing. Parameterising the table name to achieve generic code and to increase maintainability seems like good programmer virtue.
But it is just that when it comes to database objects, the old truth does not hold. In a proper database design, each table is unique, as it describes a unique entity. (Or at least it should!) Of course, it is not uncommon to end up with a dozen or more look-up tables that all have an id, a name column and some auditing columns. But they do describe different entities, and their semblance should be regarded as mere chance, and future requirements may make the tables more dissimilar.
Furthermore, when it comes to building a query plan, each table has its set of statistics and presumptions that are by no means interchangeable, as far as
SQL Server is concerned. Finally, in a complex data model, it is important to get a grip of what's being used. When you start to pass table and column names as parameters, you definitely lose control.
So if you want to do the above (save the fact that
SELECT * should not be used in production code), to save some typing, you are on the wrong path. It is much better to write ten or twenty stored procedures, even if they are similar to each other.
(If your
SQL statements are complex, so that there actually is a considerable gain in maintainability to only have them in one place, despite different tables being used, you could consider using a pre-processor like the one in
C/C++. You would still have one set of procedures per table, but the code would be in one single include file.)
This is a variation of the previous case, where there is a suite of tables that actually do describe the same entity. All tables have the same columns, and the name includes some partitioning component, typically year and sometimes also month. New tables are created as a new year/month begins.
In this case, writing one stored procedure per table is not really feasible. Not the least, because the user may want to specify a date range for a search, so even with one procedure per table you would still need a dynamic dispatcher.
Now, let's make this very clear: this is a flawed table design. You should not have one
sales table per month, you should have one single
sales table, and the month that appear in the table name, should be the first column of the primary key in the united
sales table. But you may be stuck with a legacy application where you cannot easily change the table design. And, admittedly, there are situations where partitioning makes sense. The table may be
huge (say over 10 GB in size), or you want to be able age to out old data quickly. But in such case you should do partitioning properly.
In the following, I will look at three approaches to deal with partitioning without using dynamic
SQL.
Partitioned Tables
Partitioned tables were added in
SQL 2005. You can divide a table in up to 999 partition according to a partition function. These partitions can be split up over different filegroups to spread out the load. Another important benefit of partitioned tables is that deleting a partition is a pure meta-data operation, which means that if you want to throw away all orders that are more than, say, 12 months old, you can do this with the wink of an eye.
Table partitioning is only available in Enterprise and Developer Edition, not in Standard. For this reason, I'm not going into further details, but refer you to Books Online.
Views and Partitioned Views
If you have an old application, where you cannot easily merge the umpteen sales tables into one, because it would break other parts of the application, a simple approach is to define a view like this:
CREATE VIEW sales AS
SELECT year = '2006', col1, col2, ... FROM dbo.sales2006
UNION ALL
SELECT year = '2005', col1, col2, ... FROM dbo.sales2005
UNION ALL
...
Instead of composing the table name dynamically, you can now say:
SELECT ... FROM sales WHERE year = '2006' AND ...
Also, it's easy to add new tables to the view or remove old tables as the data is aged out. Unfortunately, this view is not terribly efficient, as the query will access all tables in the view. Furthermore, the view is not updateable. But with a few more steps, you could make it into what
SQL Server knows as a
partitioned view, a feature added in
SQL 2000 (and available in all editions of
SQL Server). A true partitioned view can be very efficient, because for queries that include the partitioning column in the
WHERE clause,
SQL Server will only access the relevant table(s). And such a view is updatable, so you can insert data into it, and the data will end up in the right table.
Here is a quick example/demo on how to properly set up a partitioned view. Assume that as legacy of a poor design we have these three tables:
SELECT OrderID + 0 AS OrderID, OrderDate, CustomerID, EmployeeID
INTO Orders96 FROM Northwind..Orders WHERE year(OrderDate) = 1996
ALTER TABLE Orders96 ALTER COLUMN OrderID int NOT NULL
SELECT OrderID + 0 AS OrderID, OrderDate, CustomerID, EmployeeID
INTO Orders97 FROM Northwind..Orders WHERE year(OrderDate) = 1997
ALTER TABLE Orders97 ALTER COLUMN OrderID int NOT NULL
SELECT OrderID + 0 AS OrderID, OrderDate, CustomerID, EmployeeID
INTO Orders98 FROM Northwind..Orders WHERE year(OrderDate) = 1998
ALTER TABLE Orders98 ALTER COLUMN OrderID int NOT NULL
go
ALTER TABLE Orders97 ADD CONSTRAINT pk97 PRIMARY KEY (OrderID)
ALTER TABLE Orders96 ADD CONSTRAINT pk96 PRIMARY KEY (OrderID)
ALTER TABLE Orders98 ADD CONSTRAINT pk98 PRIMARY KEY (OrderID)
First step is to a add
Year column to each table. These columns need a default (so that processes that insert directly into these tables are unaffected) and a
CHECK constraint. Here is how it looks for
Orders96:
ALTER TABLE Orders96 ADD Year char(4) NOT NULL
CONSTRAINT def96 DEFAULT '1996'
CONSTRAINT check96 CHECK (Year = '1996')
This column must be the first column in the primary key, so we need to drop the current primary key and recreate it:
ALTER TABLE Orders96 DROP CONSTRAINT pk96
ALTER TABLE Orders96 ADD CONSTRAINT pk96 PRIMARY KEY (Year, OrderID)
Again, this must be performed for all three tables. Finally, you can create the view:
CREATE VIEW Orders AS
SELECT * FROM dbo.Orders96
UNION ALL
SELECT * FROM dbo.Orders97
UNION ALL
SELECT * FROM dbo.Orders98
Note: I have here use
SELECT *
to save some space in the article, but when you define your real view, you should list the colunms explicitly. There is a risk that columns could come in different order in the tables.
You now have a proper partitioned view that you can perform inserts and updates through. For instance you can run:
INSERT Orders(Year, OrderID, OrderDate, CustomerID, EmployeeID)
VALUES ('1997', 12000, '19970101', 'BERGS', 2)
And if you run a query like:
SELECT OrderID, OrderDate, EmployeeID
FROM Orders
WHERE Year = @year
AND CustomerID = N'BERGS'
SQL Server will at run-time only access the OrdersNN table that maps to @year. If you look at the query plan casually, it may seem that all three tables are accessed, but if you check the
Filter operators you will find something called
STARTUP EXPR. This means that
SQL Server determines at
run-time whether to access the table or not. (In fact, when I tested this, I only got this result on
SQL 2005 and
SQL 2008. On
SQL 2000, the start-up expression was not included for some reason I have not been able to understand.)
For your real-world case you may find it prohibitive to change the primary key. In this case you could add a
UNIQUE constraint with the partitioning column + the real primary key. This will not be a proper partitioned view, and the view will not be updatable, but with some luck
SQL Server may still apply start-up expressions, and access only one of the base tables. At least I got it to work, when I ran a quick test. You should verify that it works for your situation.
When a new table is added with a new year, the view needs to be redefined. If this happens frequently, for instance by each month, you should probably set up a job for this. I leave out example code, but it requires running a cursor over
sysobjects to compose a
CREATE VIEW statement that you then execute with
sp_executesql or
EXEC(). That would be an example of good use of dynamic
SQL.
This was a concentrated introduction to partitioned views. You can find the full rules for partitioned views under the topic for
CREATE VIEW in Books Online. Good reading is also Stefan Delmarco's detailed article
SQL Server 2000 Partitioned Views.
Compatibility Views
If you have very many tables, there is a risk that you will hit a roadblock with a partitioned view:
SQL Server only permits 256 tables in a query. Henrik Staun Poulsen suggested an alternate solution that evades this restriction. You first create that new table, with all the data in it. Then you drop the old tables, but replace them with views:
CREATE VIEW sales200612 AS
SELECT col1, col2, col3
FROM sales
WHERE yearmonth = '200612'
Old functions that uses dynamic
SQL or whatever they do, can continue to do so. If they perform
INSERT, UPDATE or
DELETE operations, you need to implement
INSTEAD OF triggers to support this.
Obviously, this solution requires you to produce a lot of code, but you don't have to write it by hand; you can easily write a program in the language of your choice to generate the views and triggers.
In this case people want to update a column which they select at run time. The above is actually legal in
T-SQL, but what happens is simply that the variable @colname is assigned the value in @value for each affected row in the table.
In this case dynamic
SQL would call for the user to have
UPDATE permissions on the table, something not to take lightly. So there is all reason to avoid it. Here is a fairly simple workaround:
UPDATE tbl
SET col1 = CASE @colname WHEN 'col1' THEN @value ELSE col1 END,
col2 = CASE @colname WHEN 'col2' THEN @value ELSE col2 END,
...
If you don't know about the
CASE expression, please look it up in Books Online. It's a very powerful
SQL feature.
Then again, one would wonder why people want to do this. Maybe it's because their tables look like this:
CREATE TABLE products (prodid prodid_type NOT NULL,
prodname name_type NOT NULL,
...
sales_1 money NULL,
sales_2 money NULL,
...
sales_12 money NULL,
PRIMARY KEY (prodid))
It could make more sense to move these
sales_n columns to a second table:
CREATE TABLE product_sales (prodid prodid_type NOT NULL,
month tinyint NOT NULL,
sales money NOT NULL,
PRIMARY KEY (prodid, month))
The request here is to determine the name for a column in a result set at run-time. My gut reaction, is that this should be handled client-side. But if your client is a query window is Management Studio or similar, this is kind of difficult. In any case, this is simple to do without any dynamic
SQL on
SQL 2005 and later:
DECLARE @mycolalias sysname
SELECT @mycolalias = 'This week''s alias'
CREATE TABLE #temp (a int NOT NULL,
b int NOT NULL)
INSERT #temp(a, b) SELECT 12, 17
EXEC tempdb..sp_rename '#temp.b', @mycolalias, 'COLUMN'
SELECT * FROM #temp
That is, you first get the data into a temp table, and then you use
sp_rename to rename the column along your needs. (You need to qualify
sp_rename with tempdb to have it to operate in that database.) You will get an informational message
Caution: Changing any part of an object name could break scripts and stored procedures, but you may be able to live with that.
This trick works on
SQL 2000 too, although not entirely without dynamic
SQL: you need put the
SELECT from the temp table in
EXEC():
EXEC('SELECT * FROM #temp')
This is because on
SQL 2000, sp_rename apparently does not trigger a recompile, so if the the
SELECT is in the same batch, the statement fails with
Invalid column name 'b'. There is yet one thing to be aware of on
SQL 2000: you cannot use sp_rename in a stored procedure that is to be run by plain users, as sp_rename thinks you need to be a member of the
db_owner or
db_ddladmin database roles, even if this is only a temp table. This issue has been addressed in
SQL 2005.
In this case the table is in another database which is somehow determined dynamically. There seems to be several reasons why people want to do this, and depending on your underlying reason, the solution is different.
If you for some reason have your application spread over two databases, what you absolutely not should do is to have code that says:
SELECT ... FROM otherdb.dbo.tbl JOIN ...
This is bad, because if someone asks for a second environment on the same server, you have a lot of code to change.
The best solution for this particular problem is to use synonyms, added in
SQL 2005:
CREATE SYNONYM otherdbtbl FOR otherdb.dbo.tbl
You can then refer to
otherdb.dbo.tbl as just
otherdbtbl. If there is a need for a second set of databases, you only have to update the synonyms, and there is no need to use dynamic
SQL.
Yet a way to avoid dynamic
SQL is to use stored procedures for all inter-database communication. That is, if you are in
db1 and need to get data from
db2, you call a stored procedure in
db2. This can be dynamic, because
EXEC permits you to specify a variable that holds the name of the procedure to execute.
SELECT @dbname = quotename(dbname) FROM ...
SELECT @sp = @dbname + '..some_sp'
EXEC @ret = @sp @par1, @par2...
If you want to get result sets back from
db2, look at my article
How to Share Data between Stored Procedures for suggestions.
There may still be cases you may find that dynamic
SQL is the only feasible situation. This can be done in two ways. The most obvious is:
SELECT @dbname = quotename(dbname) FROM ...
SELECT @sql = ' SELECT ... FROM ' + @dbname + ' .dbo.otherdbtbl ' +
' JOIN dbo.localtbl ... '
EXEC sp_executesql @sql, @params, ...
But, if the query is complex, and most of the tables are in the remote database you can also do:
SELECT @sql = ' SELECT ... FROM dbo.othertbl ' +
' JOIN ' + quotename(db_name()) + '.dbo.localtbl ... '
SELECT @dbname = quotename(dbname) FROM ...
SELECT @sp_executesql = @dbname + '..sp_executesql'
EXEC @sp_executesql @sql, @params, ...
As above, I make use of that you can specify the procedure name dynamically with
EXEC. The trick here is that when you specify a system stored procedure in three-part notation with the database name, the procedure executes in the context of that database. Thus, the dynamic
SQL in this example runs in @dbname, not the current database.
This sounds to me like some sysadmin venture, and for sysadmin tasks dynamic
SQL is usually a fair game, because neither caching nor permissions are issues. Nevertheless there is an kind of alternative:
sp_MSforeachdb, demonstrated by this example:
sp_MSforeachdb 'SELECT ''?'', COUNT(*) FROM sysobjects'
As you might guess,
sp_MSforeachdb uses dynamic
SQL internally, so what you win is that you don't have to write the control loop yourself. I should hasten to add that
sp_MSforeachdb is not documented in Books Online, which also means that use of it is not supported by Microsoft and it could be changed or withdrawn from
SQL Server without notice.
The scenario here is that you have a suite of databases with identical schema. The typical reason they are different databases and not one, is that every database serves a different customer, and each customer can access his database (but of course no one else's). Some people see a problem with the same stored procedures in fifty databases, and believe that they face a maintenance nightmare. So they get the idea that they should put the procedures in a "master" database. Yes, you can do that. It will give you a much bigger maintenance problem, because your code will entirely littered with dynamic
SQL. In fact, if you feel that this is the only alternative, you are better off skipping stored procedures altogether and do all access from client code instead. In such case there is only one place you need to specify the database: the connection string.
What else can you do? Some people might suggest that you should collapse the databases into one, and employ a strict row-level security scheme. Personally, I would never accept such a solution as a potential customer. In a complex application, bugs can easily lead to that information is exposed to people who should not see it. Besides, row-level security cannot be implemented entirely waterproof in
SQL Server. Whereas queries only would return the data they should, query plans and error messages may indirectly disclose information to users who are not authorised to see it.
Another wild approach is to use
SQL Server's own
master database and install the application procedures as system procedures. I have not played with this for a long time, but I am told that it still works in
SQL 2008. In any case, this is entirely unsupported. So while I mention the possibility, I don't give you the details on how to do it and I strongly recommend that you don't go there.
What then is the real solution? Install the stored procedures in each database and develop rollout routines for your
SQL objects. You need this anyway, the day you want to update the table definitions. This also permits you to have some flexibility. Some customers may prefer to skip an upgrade. Other customers may be prepared to pay for extra functions that only they have access to. Even more importantly, it permits you to easily scale out and move some databases to a second server. I mentioned that as a customer, I would not accept to share database with other customers. In fact, a security-aware customer would not even accept to share the same instance of
SQL Server, but require his own instance.
(You may ask whether not synonyms could be used to implement the "master" database. I have not been able to think of anything useful, but if you find out something, please drop me a line.)
This question sometimes comes up. Most often people have problems with the
USE command. The correct solution is to avoid
USE altogether in this case. In fact, we have already seen how to do this:
SELECT @sql = 'CREATE VIEW ...'
SELECT @sp_executesql = quotename(@dbname) + '..sp_executesql'
EXEC @sp_executesql @sql
That is, make use of that you can set the database context by calling
sp_executesql with three-part notation.
It is fascinating how may people who put '1,2,3,4' in @list, and then are puzzled why the query above does not return any rows. Well, if there is a row where
col has the value '1,2,3,4', you will get a match. These two conditions are the same:
col IN (@list)
col = @list
IN does
not mean "parse whatever data there is at runtime as a comma-separated list". It's a
compile-time shortcut for
col = @a OR col = @b OR
...
This is a very common question on the newsgroups, and
Use dynamic SQL is a far too common answer. Yes, you can do this with dynamic
SQL, but it is an extremely poor solution. You cannot pass the list as a parameter to
sp_executesql, so you would have to use
EXEC() and be open to
SQL injection. On top of that, for long lists,
IN has extremely poor performance – in some tests I did, it took
SQL Server 15 seconds to build the query plan for a list with 10000 elements.
The correct method is to unpack the list into a table with a user-defined function or a stored procedure. In my article,
Arrays and Lists in SQL Server, I describe a whole range of ways to do this. I also present performance data for the various methods. (Dynamic
SQL is at the bottom of that list!) This is a long article, but there are jump-start links in the beginning of the article, depending on which version of
SQL Server you are using.
If you are considering to write the procedure
CREATE PROCEDURE search_sp @condition varchar(8000) AS
SELECT * FROM tbl WHERE @condition
Just forget it. If you are doing this, you have not completed the transition to use stored procedure and you are still assembling your
SQL code in the client. But this example lapses into
A not too uncommon case is that the users should be able to select data from a broad set of parameters. The procedure
search_orders in the section on
SQL injection is a very simple example of this.
Any programmer that tackles this realises that writing a static solution with a tailor-made query for each combination of input parameters is impossible. It most cases, it's simple to write a single static query with conditions like:
AND (CustomerID = @custid OR @custid IS NULL)
But in
SQL 2005 and earlier if is not possible to get good performance from such a query, but the only option for good performance is to use dynamic
SQL. This changed in
SQL 2008, provided that you use the
RECOMPILE hint. However that is a bit complicated, because the original implementation had a serious bug, so Microsoft reverted on that change for a while. Rather than going into details here, I refer you to my article,
Dynamic Search Conditions, where I discuss this type of searches in more detail and where I present several methods, both with dynamic
SQL and static
SQL. This article exists in two versions, one for
SQL 2008 and later, and one
for earlier versions.
Another common request is to make a dynamic crosstab query, where you transform rows into columns. For instance, say that you want to display the number of orders handled by each employee in Northwind with one column per year. This query works well:
SELECT E.LastName,
[1996] = SUM(CASE Year(OrderDate) WHEN '1996' THEN 1 ELSE 0 END),
[1997] = SUM(CASE Year(OrderDate) WHEN '1997' THEN 1 ELSE 0 END),
[1998] = SUM(CASE Year(OrderDate) WHEN '1998' THEN 1 ELSE 0 END)
FROM Orders O
JOIN Employees E ON O.EmployeeID = E.EmployeeID
GROUP BY E.LastName
But in many situations you don't exactly which columns you will have in the data, or even how many there will be. For instance, in this example, we may not know beforehand for which years there are orders.
One approach is to set an upper limit of how many output columns you support and use dummy names for the columns. Once you have run the query, you use the technique I described in the section
SELECT col AS @myname.
However, in the very most cases, you will want to employ dynamic
SQL for this. And there is not really any alternative. A
SELECT statement returns a table, and a table has a known number of columns with known names, so there is no way you can write a static
SELECT statement to achieve this.
The general technique is a two-step operation: 1) Get the unique values to pivot on. 2) with those values, generate a query like the one above. While it's a short description, it takes some time to get everything in place. You can make a shortcut with the stored procedure
pivot_sp, something I have adapted from a procedure originally written by
SQL Server
MVP Itzik Ben-Gan. The procedure takes a number of parameters permitting you to specify the query, the rows to group by and to pivot by, and which aggregation operation you want.
As this article is already long enough, I don't go into details to try to explain how it works, but leave it to you to explore it on your own. I like to stress one thing though: The way
pivot_sp is written, it is wide-open to
SQL injection, and it is very difficult, not to say impossible to make the procedure fool-proof since it accepts query text as parameter. This is no problem as long as you use it as your own utility procedure and have full control over the input, but you should not make a procedure like this one accessible to anyone. Rather I recommend that you do
DENY EXECUTE ON pivot_sp TO public
To make sure that plain users cannot run it. If you make a call to
pivot_sp in a stored procedure, this call will succeed as ownership chaining applies. The file for
pivot_sp includes an example to demonstrate this.
Another option for dynamic crosstab is
RAC, which is a third-party tool. I have never used it myself, but I have heard several good comments about it.
This can easily be handled without dynamic
SQL in this way:
SELECT col1, col2, col3
FROM dbo.tbl
ORDER BY CASE @col1
WHEN 'col1' THEN col1
WHEN 'col2' THEN col2
WHEN 'col3' THEN col3
END
Again, review the
CASE expression in Books Online, if you are not acquainted with it.
Note that if the columns have different data types you cannot lump them into the same
CASE expression, as the data type of a
CASE expression is always one and the same. Instead, you can do this:
SELECT col1, col2, col3
FROM dbo.tbl
ORDER BY CASE @col1 WHEN 'col1' THEN col1 ELSE NULL END,
CASE @col1 WHEN 'col2' THEN col2 ELSE NULL END,
CASE @col1 WHEN 'col3' THEN col3 ELSE NULL END
If you also want to make it dynamic whether the order should be ascending or descending, add one more
CASE:
SELECT col1, col2, col3
FROM dbo.tbl
ORDER BY CASE @sortorder
WHEN 'ASC' THEN CASE @col1
WHEN 'col1' THEN col1
WHEN 'col2' THEN col2
WHEN 'col3' THEN col3
END
ELSE NULL
END ASC,
CASE @sortorder
WHEN 'DESC' THEN CASE @col1
WHEN 'col1' THEN col1
WHEN 'col2' THEN col2
WHEN 'col3' THEN col3
END
ELSE NULL
END
Or use the form in the second example to deal with different data types.
SQL Server
MVP Itzik Ben-Gan had a good article on this topic in the March 2001 issue of
SQL Server Magazine, where he offers other suggestions.
It should be added that these solutions has the disadvantage that they will always cause a sort which for a large data set could be expensive. If you add an
ORDER BY clause in dynamic
SQL, the optimizer may avoid the sort if there is a suitable index.
This is no longer an issue, since
SQL 2005 added new syntax that permits a variable:
SELECT TOP(@n) col1, col2 FROM tbl
On
SQL 2000,
TOP does not accept variables, so you need to use dynamic
SQL to use
TOP. But there is an alternative:
CREATE PROCEDURE get_first_n @n int AS
SET ROWCOUNT @n
SELECT au_id, au_lname, au_fname
FROM authors
ORDER BY au_id
SET ROWCOUNT 0
It can be disputed whether
SET ROWCOUNT @n is really a better solution than running a dynamic
SQL statement with
TOP. A dynamic
TOP is probably a better choice, as long as you can accept the security implications. (But it's not worth to change the permissions only for this.)
I guess a common reason for wanting to do this is to implement paging in web applications.
SQL Server
MVP Aaron Bertrand has an article which is the
standard reference on this topic.
The desire here is to create a table of which the name is determined at run-time.
If we just look at the arguments against using dynamic
SQL in stored procedures, few of them are really applicable here. If a stored procedure has a static
CREATE TABLE in it, the user who runs the procedure must have permissions to create tables, so dynamic
SQL will not change anything. Plan caching obviously has nothing to do with it. Etc.
Nevertheless:
Why? Why would you want to do this? If you are creating tables on the fly in your application, you have missed some fundamentals about database design. In a relational database, the set of tables and columns are supposed to be constant. They may change with the installation of new versions, but not during run-time.
Sometimes when people are doing this, it appears that they want to construct unique names for temporary tables. This is completely unnecessary, as this is a built-in feature in
SQL Server. If you say:
CREATE TABLE #nisse (a int NOT NULL)
then the actual name behind the scenes will be something much longer, and no other connections will be able to see this instance of
#nisse.
If you want to create a permanent table which is unique to a user, but you don't want to stay connected and therefore cannot use temp tables, it may be better to create one table that all clients can share, but where the first column is a key which is private to the client. I discuss this method a little more closely in my article
How to Share Data between Stored Procedures.
Sometimes I see persons on the newsgroups that are unhappy, because they create a temp table from dynamic
SQL, and then they can't access it, because it disappeared when the dynamic
SQL exited. When told that they have to create the table outside the dynamic
SQL, they respond that they can't, because they don't know the structure of the table until run-time.
One solution is to create a global temp table, one with two # in the name, for instance
##temp. Such a table is visible to all processes (so you may have to take precautions to make the name unique), and unless you explicitly drop it, it exists until your process exits.
But the real question is:
what are these guys up to? If you are working with a relational database, and you don't know the structure of your data until run-time, then there is something fundamentally wrong. As I have never been able to fully understand what the underlying business requirements are, I can't really provide any alternatives. But I would suggest that if you need to go this road, you should seriously consider to run your
SQL from a client program. Because, all access to that table would have to be through dynamic
SQL, and composing dynamic
SQL strings is easier in languages with better string capabilities, be that
C#, VB or Perl.
This is similar to parameterising the
database name, but in this case we want to access a linked server of which the name is determined at run-time.
Two of the solutions for dynamic database names apply here as well:
- On SQL 2005 and later, the best solution is probably to use synonyms:
CREATE SYNONYM myremotetbl FOR Server.db.dbo.remotetbl
- If you can confine the access to the linked server to a stored procedure call, you can build the SP name dynamically:
SET @sp = @server + 'db.dbo.some_sp'
EXEC @ret = @sp @par1, @par2...
If you want to join a local table with a remote table on some remote server, determined in the flux of the moment, dynamic
SQL is probably the best way if you are on
SQL 2000. There exists however an alternative, although it's only usable in some situations. You can use
sp_addlinkedserver to define the linked server at run-time, as demonstrated by this snippet:
EXEC sp_addlinkedserver MYSRV, @srvproduct='Any',
@provider='SQLOLEDB', @datasrc=@@SERVERNAME
go
CREATE PROCEDURE linksrv_demo_inner WITH RECOMPILE AS
SELECT * FROM MYSRV.master.dbo.sysdatabases
go
EXEC sp_dropserver MYSRV
go
CREATE PROCEDURE linksrv_demo @server sysname AS
IF EXISTS (SELECT * FROM master..sysservers WHERE srvname = 'MYSRV')
EXEC sp_dropserver MYSRV
EXEC sp_addlinkedserver MYSRV, @srvproduct='Any',
@provider='SQLOLEDB', @datasrc=@server
EXEC linksrv_demo_inner
EXEC sp_dropserver MYSRV
go
EXEC linksrv_demo 'Server1'
EXEC linksrv_demo 'Server2'
There are two procedures.
linksrv_demo_inner is the procedure where we actually access the linked server. As the linked server must exist when the procedure is created, I first create a dummy entry for
MYSRV, which I subsequently drop once the procedure has been created. (Not only must the linked server exist, it must also have the database and tables that you access.)
linksrv_demo is the outside interface which takes a server name as a parameter, and then at run-time defines
MYSRV to point to @server.
The above is only possible under certain conditions:
- The procedure must be run by someone who has privileges to set up linked servers, normally only the roles sysadmin and setupadmin have these permissions. Thus, plain users do not apply.
- Since you change a server-wide definition, you cannot have several instances of the procedure running. (It goes without saying, that you should use the alias in this procedure only.)
As you can see in the example, I've added
WITH RECOMPILE to
linksrv_demo_inner. This is a safety precaution, to prevent that a cached plan does not access a different server. I don't think this is really necessary, as
SQL Server should sense the changed definition. In fact, you may not even have to split the code over two procedures, but as they say, better safe than sorry.
The rowset functions
OPENQUERY and
OPENROWSET often calls for dynamic
SQL. Their second argument is an
SQL string, and they do no accept variables. (This is because the optimizer builds a plan for the distributed query when the procedure is compiled.) So any single parameter you want to pass to the
SQL statement for that remote server requires you to use dynamic
SQL. Since the remote
SQL string can include string literals, you may have to deal with up to three levels of nested quotes. If you don't watch out, you can spend a full day looking at things like:
DECLARE @sql varchar(8000)
SELECT @sql = 'SELECT * FROM OPENQUERY(MYSRV, ' +
'''SELECT * FROM Northwind.dbo.Orders ' +
'WHERE CustomerID = N''''VINET'''''')'
PRINT @sql
EXEC(@sql)
to try to find out if you might you have one
'
too many or too few.
Strict discipline is absolutely necessary when working with dynamic
SQL for
OPENQUERY. The function
quotestring() that I showed you earlier can be of great help:
DECLARE @remotesql nvarchar(4000),
@localsql nvarchar(4000),
@state char(2)
SELECT @state = 'CA'
SELECT @remotesql = 'SELECT * FROM pubs.dbo.authors WHERE state = ' +
dbo.quotestring(@state)
SELECT @localsql = 'SELECT * FROM OPENQUERY(MYSRV, ' +
dbo.quotestring(@remotesql) + ')'
PRINT @localsql
EXEC (@localsql)
The built-in function
quotename() is usually not useful here, as the
SQL statement easily can exceed the limit of 129 characters for the input parameter to
quotename().
On
SQL 2005 and later, you can use
EXEC() to run an
SQL statement on a
linked server. Since
EXEC() at linked servers can take parameters, this can make things considerably easier. Then again, you can join
OPENQUERY with local tables, so that only rows of interest are brought across the wire. This you cannot do with
EXEC().
Say that you write a stored procedure that is to present some data, and the
GUI it is to be run from is Query Analyzer or
SQL Server Management Studio (presumably because it is a sysadmin procedure). To make the output easy to digest, you want the column width to be so wide that no data is truncated, but neither do you want any extraneous spaces. This is something you can achieve with dynamic
SQL. Typically you would use a temp table to hold the data, in which case there are no permission issues.
Rather than giving an example, I refer you to the source code for the popular (but undocumented) system procedure
sp_who2. You can find the code by entering
exec master..sp_helptext sp_who2.
I've written this text with a main focus on application code, because it is mainly in application tasks, bad usage of dynamic
SQL can cause serious harm by opening for
SQL injection, poor
query-plan reuse, and result in code that is difficult to read and maintain.
Here, I like to briefly discuss code is for maintenance jobs, code that runs once a night or once a week or even less frequently. Generally, for this sort of code, dynamic
SQL is almost always a fair game. Query plans are rarely an issue. And if the code is to be run by users with
sysadmin privileges, there are no permissions issues. The same applies to code that does not require permissions outside the database, and is to be run by users with
db_owner privileges.
There are however, two points about
SQL injection I like to make.
- If you are a DBA that writes some stored procedure to be run by junior operators that do not have sysadmin privilege themselves, you must of course take precaution against SQL injection, so that they don't outsmart you.
- If you write a job that performs operations on tables in every database, be careful to use quotename() when you build the SQL strings. This is particularly important if there are non-sysadmin users that own databases. A user could create a table with a name that injects an SQL command into your maintenance script when you run it. If you are the DBA at a hosting company, this is a risk that you definitely should not neglect.