//
you're reading...
SQL Server, T-SQL

A stored procedure "best practices" checklist

When developing stored procedures, there seems to be a lot of emphasis on “get it done fast.” Which means type all lower case, pay little attention to formatting, and sometimes throw best practices out the window. Personally, I would rather front-load my development time; I think that the costs I pay in initial development far outweigh what I might have paid in maintenance down the road. Not to mention you will want whoever comes behind you to encounter a well documented well developed piece of work. The main objective of any T-SQL developer should be to deliver readable and maintainable code that also performs well that is delivered in a timely manner. Although we may not always have the luxury of time I have found that it is fairly easy to fall into good development habits.

As the  popular adage goes, “you can have it fast, cheap, or good. Pick two…” Good  & Fast is usually expensive, Good & Cheap is usually slow and Fast & Cheap is usually of poor quality. In my experience the business will normally opt for “Good as long as it is Fast & Cheap” with caveat that we will fix it later, and of course, later never comes. If you develop good basic development habits then the code you deliver will hopefully be fast and good.

For demonstration purposes here is a fictitious piece of code. It is fake but realistic example of the kinds of procedures you might encounter:

create proc foo(@i int,@bar int=null,@hr int output,@xd datetime) as
declare @c varchar
declare @s nchar(2)
declare @x int
set @grok=’Beverly’
set @korg=’MA’
set @x=5
select customers.customerid,firstname,lastname,orderdate from customers join orders on
customers.customerid=orders.customerid where status=@i or status<=@bar and orderdate<=@xd
set @hr = @@rowcount
select customers.customerid,count(*) from customers left join orders on
customers.customerid=orders.customerid where customers.city=@c and customers.state=@s
group by customers.customerid having count(*)>=@x
return (@@rowcount)

So, what is wrong with the above sample, you may ask? Well, let me go through my own personal (and quite subjective) subconscious checklist of best practices when I write my own stored procedures.

Upper casing T-SQL keywords and built-in functions

Use upper case for T-SQL keywords. For example use CREATE PROCEDURE and not create procedure or Create Procedure. Same goes for all of the code throughout my objects… you will always see SELECT, FROM, WHERE and not select, from, where. I find if much more readable when all of the keywords are capitalized.

Using a proper and consistent naming scheme

Name objects using {target}_{verb}. So for example, if I have a Customers table, I would have procedures such as:

dbo.Customer_Create
dbo.Customer_Update
dbo.Customer_Delete
dbo.Customer_GetList
dbo.Customer_GetDetails

This allows them to sort nicely in Object Explorer / Object Explorer Details, and also narrows down my search quickly in an IntelliSense auto- complete list. If you have a stored procedures named in the style dbo.GetCustomerList, they get mixed up in the list with dbo.GetClientList and dbo.GetCreditList. You could argue that maybe these should be organized by schema, which I also do but more for security reasons then organizational reaseons. However for many of the applications I develop, ownership/schema is pretty simple and doesn’t need to be made more complex.

Of course I rarely name stored procedures using the sp_ prefix, unless I have a global stored procedure that I want to reside in the master schema that will transcend a specific database. Normally I prefer to prefix my user\database specific stored procedures with “usp_” .

Most importantly is coming up with a proper naming scheme that  you apply consistently. No one wants to see procedures named inconsistently like dbo.Customer_Create, dbo.Update_Customer and dbo.GetCustomerDetails.

Using the schema prefix

Specify the database and schema prefix when creating stored procedures. This way I know which database the object resides in and that it will be schema.procedure_name no matter who I am logged in as when I create it. Similarly, my code always has the database\schema prefix on all object references. This prevents the database engine from checking for an object under my schema first, and also avoids the issue where multiple plans are cached for the exact same statement/batch just because they were executed by users with different default schemas.

Code alignment (Lining up parameter names, data types, and default values)

 

This much easier to read:

CREATE PROCEDURE dbo.User_Update
@CustomerID      INT,
@FirstName         VARCHAR(32)         = NULL,
@LastName         VARCHAR(32)         = NULL,
@Password          VARCHAR(16)         = NULL,
@EmailAddress   VARCHAR(320)      = NULL,
@Active                   BIT                            = 1,
@LastLogin           SMALLDATETIME  = NULL
AS
BEGIN

…than this:

CREATE PROCEDURE dbo.User_Update
@CustomerID INT,
@FirstName VARCHAR(32) = NULL,
@LastName VARCHAR(32) = NULL,
@Password VARCHAR(16) = NULL,
@EmailAddress VARCHAR(320) = NULL,
@Active BIT = 1,
@LastLogin SMALLDATETIME = NULL
AS
BEGIN

Using spaces and line breaks liberally

This is a simple one, use spaces in all comparison operators  between column/variable and operator. For example instead of @foo int=null or where @foo>1 I try @foo INT = NULL or WHERE @foo > 1.

Also try to place at least a carriage return between individual statements, especially in stored procedures where many statements spill over multiple lines.

Both of these are just about readability, nothing more. While in some interpreted languages like JavaScript, where compressing / obfuscating code to make it as small as possible may provide a slight benefit, in T- SQL this is not the case as all whitespace is stripped out before it gets to the optimizer. So, its ok to lean to the side of readability.

Avoiding data type / function prefixes on column / parameter names

I have mixed feelings on this one. You will often see prefixes like @iCustomerID, @prmInputParameter, @varLocalVariable, @strStringVariable in stored procedures and although imparts a certain amount of information simply by its name, I am not a big fan. Here is the mixed feelings part, even though you will see me utilizing a similar naming convention in SSIS packages for example,  I tend to avoid it in T-SQL and here’s why:  It makes it much harder to change the data type of a column in the stored proc when not only do you have to change all the column and variable declarations but you must now change any related parameter declarations. i.e. @iVarName to @bigintVarName, etc. It becomes much easier just to name any variable\parameter  for what it is. If you have a column EmailAddress VARCHAR(500), then make your variable/parameter declaration @EmailAddress VARCHAR(500). There is no need to include the datatype in the naming convention. No need to use @strEmailAddress … if you need to find out the data type, just go to the declaration line! or make sure it is documented in your sproc parameters section.

Using lengths on parameters, even when optional

This is a simple one. Always specify a length for parameters and variables period!. If you need a Varchar(MAX) then code it. Otherwise not coding a length can be very dangerous and lead to some unpredictable side effects. You may get silent truncations at 30 characters, and\or possibly a silent truncations at 1 character. Obviously this can result in data loss with no overt warning or indication.

Listing output parameters last

Try to list OUTPUT parameters last. No specific reasoning other then it is the order that most people conceptually think about the parameters… in then out.

Using BEGIN / END liberally

Not much explanation needed here, it greatly aids readability.

Using statement terminators

I have quickly adapted to the habit of ending all statements with proper statement terminators (;). This was always a habit in languages like JavaScript (where it is optional) and C# (where it is not). But as T-SQL gets more and more extensions (e.g. CTEs) that require it,it does not hurt anything and makes for good programming practice.

Using SET NOCOUNT ON

Use SET NOCOUNT ON whenever feasible. This prevents DONE_IN_PROC messages from needlessly being sent back to the client after every row-affecting statement, which increases network traffic and in many cases can fool applications, SSIS for one, into believing there is an additional recordset available for consumption. However there are some cases where consumers are utilizing record counts in which case you would not want to set the NOCOUNT ON.

Using local variables

When possible, use a single DECLARE statement to initialize local variables. Similarly, try to use a single SELECT to apply values to those variables that are being used like local constants. You’ll see a lot of like this:

declare @foo int
declare @bar int
declare @x int
set @foo = 5
set @bar = 6
set @x = -1

It’s much harder to track down variables in longer and more complex procedures when the declaration and/or assignments can happen anywhere… I would much rather have as much of this as possible occurring in the beginning of the code.

DECLARE
@foo    INT,
@bar    INT,
@x      INT;
SELECT
@foo    = 5,
@bar    = 6,
@x      = -1;

And to simplify even more SQL Server 2008 forward, allows the following syntax:

DECLARE
@foo    INT = 5,
@bar    INT = 6,
@x      INT = -1;

Declaration and initialization in the same statement, so much nicer. Not to mention using meaningful variables names, rather than something like @i, @x, etc.

I tend to place the commas at the beginning of each new line, e.g.:

DECLARE
@foo    INT = 5
,@bar   INT = 6
,@x     INT = -1;

Some may not like this suggestion, I find it easier when inserting or commenting out code.

Using table aliases

I use aliases a lot. Nobody wants to read (never mind type) the full schema\tablename for every column, even though you will see *many* examples of this posted to the public SQL Server newsgroups:

SELECT
dbo.table_X_with_long_name.column1,
dbo.table_X_with_long_name.column2,
dbo.table_X_with_long_name.column3,
dbo.table_X_with_long_name.column4,
dbo.table_X_with_long_name.column5,
dbo.table_H_with_long_name.column1,
dbo.table_H_with_long_name.column2,
dbo.table_H_with_long_name.column3,
dbo.table_H_with_long_name.column4
FROM
dbo.table_X_with_long_name
INNER JOIN
dbo.table_H_with_long_name
ON
dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
WHERE
dbo.table_X_with_long_name.column1 >= 5
AND dbo.table_X_with_long_name.column1 < 10;

Sensible aliasing code can make for a more readable query:

SELECT
X.column1,
X.column2,
X.column3,
X.column4,
X.column5,
H.column1,
H.column2,
H.column3,
H.column4
FROM
dbo.table_X_with_long_name AS X
INNER JOIN
dbo.table_H_with_long_name AS H
ON
X.column1 = H.column1
OR X.column2 = H.column2
OR X.column3 = H.column3
WHERE
X.column1 >= 5
AND X.column1 < 10;

Note: The “AS” when aliasing tables and columns is optional but I have been trying very hard to make myself use it (only because the standard defines it that way). When writing multi-table queries, I try not to give tables meaningless shorthand like a, b, c or t1, t2, t3. Even though this might fly for simple queries, but for complex queries, you will regret it when you have to go back and edit it. Also same rule of thumb for CTEs as well, give em meaningful names.

Using column aliases

I am guilty of going both ways here, but the proper syntax is alias=[column expression] . A lot of people prefer to alias expressions / columns using this syntax:

SELECT [column expression] AS alias

The proper approach is:

SELECT alias = [column expression]

One reason for the alias=[column expression] is that if all of your column names are listed down the left hand side of the column list, instead of at the end. It is much easier to scan column names when they are vertically aligned.

Another justification for the alias=[column expression]  is that it makes it easier should you ever have to move the query into a subquery, or cte, or derived table, etc.

Using consistent formatting

I am very fussy about formatting. I like my queries to be consistently readable and laid out in a predictable way. For example for a join that includes a CTE and a subquery, this is how it would look:

WITH cte AS
(
SELECT
t.col1,
t.col2,
t.col3
FROM
dbo.sometable AS t
)
SELECT
cte.col1,
cte.col2,
cte.col3,
c.col4
FROM
cte
INNER JOIN
dbo.Customers AS c
ON c.CustomerID = cte.col1
WHERE EXISTS
(
SELECT 1
FROM dbo.Orders o
WHERE o.CustomerID = c.CustomerID
)
AND c.Status = ‘LIVE’;

In keeping all of the columns in a nice vertical line, and visually separating each table in the join and each where clause. Inside a subquery or derived table, I am less strict about the visual separation, although I still put each fundamental portion on its own line. In addition I always use SELECT 1 in this type of EXISTS() clause, instead of SELECT * or SELECT COUNT(*), to make it immediately clear to others that the query inside does NOT retrieve actual data.

Matching case of underlying objects / columns

Always try to match the case of the underlying object, as I can never be too certain that my application will always be on a case-sensitive collation. This is much easier if you are using SQL Server 2008 Management Studio against a SQL Server 2008 instance or later release, as you will automatically get the correct case when selecting from the auto-complete list.

Qualifying column names with table/alias prefix

We touched on this earlier.Always qualify column names with the table name and\or table alias period. In addition I prefix the table name in the FROM or JOIN clause with the database name, then there is no ambiguity about the source of the data or intent of the query.

Using RETURN and OUTPUT appropriately

Try not to use RETURN to provide any data back to the client (e.g. the SCOPE_IDENTITY() value or @@ROWCOUNT). The RETURN should be used exclusively for returning stored procedure status, such as ERROR_NUMBER() / @@ERROR. If you need to return data to the caller, use a resultset or an OUTPUT parameter instead.

Avoiding keyword shorthands

Use full keywords as opposed to their shorthand equivalents like  “BEGIN TRAN” and “CREATE PROC” might save me a few keystrokes, and I’m sure the shorthand equivalents are here to stay, but it just looks more professional in my opinion with the keywords typed out fully. It’s the same with the parameters for built-in functions like DATEDIFF(), DATEADD() and DATEPART(). Why use WK or DW when you can use WEEK or WEEKDAY? Additionally use the fullhand notation for your joins as well, “INNER JOIN or “LEFT OUTER JOIN”… never just “join” or “left join.” Again, no real good reason behind that, just readability and clarity.

Using parentheses liberally around AND / OR blocks

Always group clauses when mixing AND and OR.

So, after all of this, here is an example of the procedure I listed at the start of the post :

CREATE PROCEDURE dbo.Customer_GetOlderOrders
@OrderStatus        INT,
@MaxOrderStatus     INT = NULL,
@OrderDate          SMALLDATETIME,
@RC1                INT OUTPUT,
@RC2                INT OUTPUT
AS
BEGIN
SET NOCOUNT ON;
DECLARE
@City           VARCHAR(32) = ‘Beverly’,
@State          CHAR (2)    = ‘MA’,
@MinOrderCount  INT         = 5;
SELECT
c.CustomerID,
c.FirstName,
c.LastName,
c.OrderDate
FROM
dbo.Customers c
INNER JOIN
dbo.Orders o
ON c.CustomerID = o.CustomerID
WHERE
(
o.OrderStatus       = @OrderStatus
OR o.OrderStatus    <= @MaxOrderStatus
)
AND o.OrderDate         <= @MaxOrderDate;
SET @RC1 = @@ROWCOUNT;
SELECT
c.CustomerID,
OrderCount = COUNT(*)
FROM
dbo.Customers c
LEFT OUTER JOIN
dbo.Orders o
ON c.CustomerID = o.CustomerID
WHERE
c.City = @City
AND c.State = @State
GROUP BY
c.CustomerID
HAVING
COUNT(*) >= @MinOrderCount;
SET @RC2 = @@ROWCOUNT;
RETURN;
END
GO

OK you be the judge… Copy both procedures to SSMS or Query Analyzer, and which one is easier to read / understand? And is it worth the three extra minutes it would take you to convert the original query?

Just to recap:

  • Upper casing T-SQL keywords and built-in functions
  • Using a proper and consistent naming scheme
  • Using the schema prefix
  • Code alignment (Lining up parameter names, data types, and default values)
  • Using spaces and line breaks liberally
  • Avoiding data type / function prefixes on column / parameter names
  • Using lengths on parameters, even when optional
  • Listing output parameters last
  • Using BEGIN / END liberally
  • Using statement terminators
  • Using SET NOCOUNT ON
  • Using local variables
  • Using table aliases
  • Using column aliases
  • Using consistent formatting
  • Matching case of underlying objects / columns
  • Qualifying column names with table/alias prefix
  • Using RETURN and OUTPUT appropriately
  • Avoiding keyword shorthands
  • Using parentheses liberally around AND / OR blocks
Advertisements

About ldgaller

Accomplished Data Warehouse Architect, DBA and Software Architect with over 15 years of professional experience and demonstrated success designing and implementing solutions that improve business functionality and productivity. Highly diverse technical background with proven ability to design, develop and implement technology on an enterprise level. I approach all projects with passion, diligence, integrity, and exceptional aptitude.

Discussion

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: