An interesting take on “Stored Procedures” (link)

Another great post SQL-related post over at TheDailyWTF regarding a, umm, "creative" use of "Stored Procedures":

http://thedailywtf.com/Articles/For-the-Ease-of-Maintenance.aspx



Programming is like dreaming?

This is from March, so it's a little old, but I just stumbled upon it and found it a bit interesting. I never thought of programming this way before, but it does make sense:

Programming is like a dream


Is it just me …

... or is about time I got back to some blogging?


Processing an OLAP cube with a T-SQL Stored Procedure

Here's a simple SQL Server stored procedure that you can call to process an OLAP cube using T-SQL.  The parameters should be self-explanatory.  To me, this is a little easier and more flexible than processing using DTS packages.

create procedure ProcessCube

    @Database varchar(100),

    @Cube varchar(100),

    @Partition varchar(100)  = null, -- If NULL, process the entire Cube

    @Server varchar(100) = 'localhost'

as

 

/*

    Author:        Jeff Smith

    Version:    10/27/2008

*/

 

/* variables used to store object handles */

declare @o_svr int, @o_db int, @o_cube int, @o_part int, @o_mds int

declare @hr int

 

/* different cube processing options. This SP uses "default" */

declare @PROCESS_DEFAULT int

declare @PROCESS_FULL int

declare @PROCESS_REFRESH_DATA int

 

set @PROCESS_DEFAULT = 0

set @PROCESS_FULL = 1

set @PROCESS_REFRESH_DATA = 2

 

-- create a DSO.Server object:

exec @hr = sp_OACreate 'DSO.Server', @o_svr out

if @hr <> 0

    begin

        print 'Error at create server:'

        exec sp_OAGetErrorInfo @o_svr

        goto cleanup

    end

 

-- Connect to the server:

exec @hr = sp_OAMethod @o_svr, 'Connect', null, @Server

if @hr <> 0

    begin

        print 'Error at connect to server:'

        exec sp_OAGetErrorInfo @o_svr

        goto cleanup

    end

 

-- Get the MDStores property from the Server:

exec @hr = sp_OAGetProperty @o_svr,'MDStores', @o_mds OUT

if @hr <> 0

    begin

        print 'Error at get getting Server MDStores:'

        exec sp_OAGetErrorInfo @o_svr

        goto cleanup

    end

 

-- Get the database from the MDStores:

exec @hr = sp_OAGetProperty @o_mds,'Item',@o_db OUT,@Database

if @hr <> 0

    begin

        print 'Error at get database:'

        exec sp_OAGetErrorInfo @o_mds

        goto cleanup

    end

 

-- get the MDStores property from the database:

exec sp_OADestroy @o_mds

exec @hr = sp_OAGetProperty @o_db,'MDStores', @o_mds OUT

if @hr <> 0

    begin

        print 'Error at get database MDStores:'

        exec sp_OAGetErrorInfo @o_db

        goto cleanup

    end

 

-- get the Cube from the MDStores

exec @hr = sp_OAGetProperty @o_mds,'Item',@o_cube OUT, @Cube

if @hr <> 0

    begin

        print 'Error at get Cube:'

        exec sp_OAGetErrorInfo @o_mds

        goto cleanup

    end

 

if @Partition is null -- Process the entire Cube, not just a single partition

    begin

        exec @hr = sp_OAMethod @o_cube, 'Process', null, @PROCESS_DEFAULT

        if @hr <> 0

            begin

                print 'Error at process Cube:'

                exec sp_OAGetErrorInfo @o_cube

                goto cleanup

            end

    end

 

else    -- just process the specified Partition

 

    begin

        -- Get the MDStores property of the Cube:

        exec sp_OADestroy @o_mds

        exec @hr = sp_OAGetProperty @o_cube,'MDStores', @o_mds OUT

        if @hr <> 0

            begin

                print 'Error at get Cube MDStores:'

                exec sp_OAGetErrorInfo @o_cube

                goto cleanup

            end

 

        -- Get the partition to process:

        exec @hr = sp_OAGetProperty @o_mds,'Item',@o_part OUT, @Partition

        if @hr <> 0

            begin

                print 'Error at get Parition:'

                exec sp_OAGetErrorInfo @o_mds

                goto cleanup

            end

 

        -- Process the partition:

        exec @hr = sp_OAMethod @o_part, 'Process', null, @PROCESS_DEFAULT

        if @hr <> 0

            begin

                print 'Error at process Partition:'

                exec sp_OAGetErrorInfo @o_part

                goto cleanup

            end

    end

 

-- And unlock all objects on the server:

exec @hr = sp_OAMethod @o_svr, 'UnlockAllObjects'

if @hr <> 0

    begin

        print 'Error at unlock all server objects:'

        exec sp_OAGetErrorInfo @o_svr

        goto cleanup

    end

 

cleanup:

 

if @o_mds is not null exec sp_OADestroy @o_mds

if @o_Part is not null exec sp_OADestroy @o_Part

if @o_cube is not null exec sp_OADestroy @o_cube

if @o_db is not null exec sp_OADestroy @o_db

if @o_svr is not null exec sp_OADestroy @o_svr

I’m back … with some news!

I apologize for not posting any new content in quite some time, but now I am back and will soon start posting on a (hopefully!) regular basis once again.

The reason for my hiatus was a pretty good one, though: I recently became a father with the birth of my son Benjamin on October 2, 2008!   He is doing great and already knows that he should always do his formatting at the client and never within the database.  He's a natural!

As if that wasn't enough, I also found out recently I have been named as a 2009 SQL Server MVP!  Now I can finally get a decent seat at a restaurant by pulling the "do you know who I am?" routine!  In all seriousness, though, I am very honored and excited by the award and I intend to live up to it in 2009 by continuing to update this blog and help users out in the SQLTeam forums.

Thank you to everyone who reads this blog and participates in the discussions, and I promise more updates are coming soon.  If you have any specific topics or ideas for a blog post that you think I may be able to cover effectively, just let me know.

Best Regards,

Jeff Smith
SQL Server MVP
 
(Hey -- I like the way that looks!)

Foreign Key Constraints: SET NULL and SET DEFAULT

Most people know about cascading updates and deletes, but did you know there are two other foreign key constraint options you can use to maintain referential integrity?

Read all about them in my latest article over at SQLTeam.com.

These features, introduced with SQL Server 2005, haven't got a lot of publicity, but they can be very useful.  I just used the SET NULL option recently for the first time (inspiring me to put together an article on it) and it works great.

Database Column Names != Report Headings

Always remember that the column names returned in a result set do not have to be the same as what you eventually output at your presentation layer.  

For example, suppose you have stored procedure that accepts a @CurrentYear parameter and returns a sales variance between the current year and the previous year for each customer.  I often see programmers struggling with writing dynamic SQL to produce output like this:

CustomerID   2008 Total    2007 Total   Variance
----------   ----------    ----------   --------
ABC          $100          $50          $50
DEF          $200          $250         -$50

That is, the names of the columns vary based on the data; that is not a good way to return data from your database!  A much better result set to return is simply this:

CustomerID   CurrentYear   PrevYear   Variance
----------   ----------    ---------  --------
ABC          $100          $50        $50
DEF          $200          $250       -$50

Notice that with that set of columns, no dynamic SQL is needed, and the column names returned are always constant regardless of the value of the @CurrYear parameter.  

As mentioned, the fact that your data set has columns labelled "CurrentYear" and "PrevYear" does not mean that you cannot re-label them any way that you like on your report or web page. 

If your client code called the stored procedure and provided a @CurrentYear parameter, then it knows exactly what "CurrentYear" and "PrevYear" represent, and you can easily label the columns in the final result exactly as needed with simple formulas or a few lines of code. 

Remember that in the world of relational database programming, table names and column names should be constant -- only the data itself should change.   Focus on returning consistently structured data from your database, and let your client applications handle the labeling of columns to make them look nice. 

How To Calculate the Number of Week Days Between two Dates

If the start date and end date are both week days, then the total number of week days in between is simply:

(total difference in days) - (total difference in weeks) * 2

or
 DateDiff(dd, @start, @end) - DateDiff(ww, @start, @end)*2

... since the DateDiff() function with weeks returns the number of week "boundaries" that are crossed; i.e., the number of weekends.

If you have a table of holidays, then you can simply subtract them out as well:

DateDiff(dd, @start, @end) -
DateDiff(ww, @start, @end)*2 - 
(select count(*) from holidays where holiday_date between @start and @end)

Now, what if the start day or the end day is on a weekend?  In that case, you need to define what to do in those situations in your requirements.

For example, if the start date is Sunday, Nov 20th, and the end day is Monday, Nov 21st -- how many week days are between those dates? There's no universal correct answer; it could be 0, or 1, or perhaps even "undefined" (null) depending on your needs.

Convert input explicitly at your client; don’t rely on the database to “figure it out”

A common mistake beginners make when working with SQL is trying to format their output at the database layer, as opposed to simply doing this at the presentation layer (i.e., client application, reporting tool, web page, etc).  I've covered that quite a bit in various blog posts, but I've only touched upon another similar issue which I feel is equally as important and also commonly mishandled.

In the SqlTeam forums, I often see code that accepts DateTime input in the form of a string value (say, from a TextBox on a web form) and uploads that value to the database written like this:

SqlCommand c = new SqlCommand();
c.CommandText = "insert into SomeTable (DateCol) values ('" + txtDate.Text + "')";
c.ExecuteNonQuery();

Now, I think that hopefully even most beginners will agree that this is bad code.   The primary issue, of course, is SQL Injection.  Avoiding SQL Injection is very easy to do using Parameters.  So, let's say that you rewrite this code using parameters like this:

SqlCommand c = new SqlCommand();
c.CommandText = "insert into SomeTable (DateCol) values (@DateVal)";
c.Parameters.AddWithValue("@DateVal",txtDate.Text);
c.ExecuteNonQuery();

Looking at that, it seems we have done quite a bit better and should be happy with the code.  It works well, and no injection is possible.  But there is still an issue!  Why?  The txtDate.Text property returns a string, not a DateTime!   And, since we are not setting the data type of the parameter explicitly, the parameter being passed is a string (i.e., VARCHAR or NVARCHAR) value, not a true DateTime value.  This means that SQL Server must implicitly cast your string to a DateTime to store it in your table, and this may or may not work successfully, or as expected, depending on how the string is formatted. 

I've said it over and over and I'll say it again:  The concept of formatting dates should never be something that your database code should ever worry about.  The database layer should be accepting DateTime data from clients, and returning DateTime data to your clients. Where and how the client got the data before passing it to the database, or what the client does with the data in terms of formatting after receiving it from the database is of no concern to the database itself. 

So, we might decide that to fix this, we can simply declare the data type of the parameter explicitly:

SqlCommand c = new SqlCommand();
c.CommandText = "insert into SomeTable (DateCol) values (@DateVal)";
c.Parameters.Add("@DateVal", SqlDbType.DateTime).Value = txtDate.Text;
c.ExecuteNonQuery();

It appears that now we are in good shape, right?  Actually -- no! There is still an implicit conversion happening, because we are still passing a string value -- the txtDate.Text property -- to the parameter, not a true DateTime!

Let's try one more time.  How can we avoid these implicit conversions?  The answer that question is always the same: Convert explicitly!  Your client application is fully capable of handling the parsing, validation, and conversion of that string to a true DateTime value, so go ahead and do it:

DateTime dateval = DateTime.Parse(txtDate.Text); // plus more code to validate, of course

SqlCommand c = new SqlCommand();
c.CommandText = "insert into SomeTable (DateCol) values (@DateVal)";
c.Parameters.Add("@DateVal", SqlDbType.DateTime).Value = dateval;
c.ExecuteNonQuery();

Now we are in business!  Before we even create the SqlCommand object, we have a true DateTime value that we are ready to pass along to SQL Server.  Our SQL code doesn't need to worry about formatting, parsing, converting, or anything -- it is being passed a completely valid piece of data with the correct type.   In short, we can now be sure that whatever value we came up with for the date in our client code is exactly the value that will be stored in our database.  That's the idea, right?

So, please, don't rely on your database code to validate your input.    Don't just pass along generic string data and "hope" that at the end of the day the database can "handle it".  Eliminate the chance of anything going wrong and write your code to explicitly cast and convert and validate any and all input before the database even comes into the picture.  

The MailBag — Super-Sized Edition! String Parsing, Crosstabs, SQL Injection, and more.

OK, boys and girls, it's time for the mailbag!  There's lots of stuff to cover, so let's get to it!

---

Greg E
writes:

Hello Jeff,

I just found your blog and wanted to know if you could point me in the right direction or possibly toss me a solution.

I am looking at a badly formed telelphone number column in a MS SQL Server db. Entries contain '(555) 555-1212' or '555.555.1212, etc. Do you know how I would go about stripping out unwanted characters from the telephone number?

Thanks for the brain cycles.

Greg -- A simple UDF should do the trick for you.  For example, something like this:

create function NumbersOnly(@txt varchar(1000))
returns varchar(100)
as
begin
    declare @i int
    declare @ret varchar(100)
  
    select @i = 1, @ret = ''

    while (@i <= len(@txt))
        select @ret = @ret + case when substring(@txt,@i,1) like '[0-9]'
                                  then substring(@txt,@i,1) else ''
                             end,
               @i = @i + 1

    return @ret
end

With that, you can write something like this:

select ID, dbo.NumbersOnly(PhoneColumn) as PhoneNumbersOnly
from YourTable

Over at SQLTeam's script library forum, there is a thread with a bunch of parsing functions that you may find useful if your needs are more complex.

And, in case you missed it, be sure to read this post.

---

In response to my blog post on passing arrays to stored procedures, Juan writes:

I know is not the right solution, but I have to say it for the sake of completeness of the discussion: if the amount of items in your "array parameter" is limited (say, for example 5 or 10 items), you can always use optional parameters (i.e. assign them to null when declaring them in the SP), then insert them in a temp table or do whatever you want with them, without using dynamic,nor xml, nor string manipulation.

Great point, something I missed in my article entirely.  Sometimes, it may make sense to declare @Val1, @Val2, ... @ValN parameters if there aren't too many and there's a clearly defined limit.  Thanks for bringing that up, Juan.  The simplest solution is usually the best, and in some cases that's probably all you need.  You still have clearly defined parameters with strong typing and no parsing, and those are the main issues with CSV parameters that I wanted to avoid.

---

Marc writes:

We have three tables.  They all share the same "type" of primary key: let's say ActivityCode.  I need to pull data using an ActivityCode, but there is a catch.  If table 1 has the data, I want to use it.  If table 2 has the data and Table 1 does not I want to use Table 2.  If table 3 has the data and Table 1 and Table 2 does not, I want to use that.  The ActivityCode can be found in both Table 1 and Table 2.  Once I determine which table i am using I will need to do several other inner and/or outer joins with other tables.  I am using JDBC.  I want to be able to do this using a single SQL statement, but I am willing to use multiple statements if it makes more sense.  I just need to keep it to a single transaction under JDBC.

Marc -- I think what you are looking for is described here.  The key is to OUTER JOIN to all of your tables, and then use a CASE expression to determine which of those joined tables has the data you need.

--

Mary writes:

I struggled with a thorny SQL problem all day yesterday and found your post on set based thinking very helpful.  I needed to write an update query that updated a table with many records with the same key from a table with the key and the corresponding new value.  The table with new values didn't exist - I had to derive it from a different table showing the key, new value and date (the new value changed over time.)

Your observations that one needs to break the problem down into its simplest components helped me realize something else.

I made the classic rookie error of grabbing some code that did a similar type of update and try to hack it into my solution.  When I finally realized I was going in the wrong direction (because my solution was getting messier and messier), I went back to the beginning.

I defined the problem in its simplest terms and learned I could do a simple "update  A set A.value = B.value from A join B on B.key = A.key" .

I didn't realize I could update from but once the problem was simply defined a quick question to one of our senior engineers resulted in a quick answer leading to an elegant solution.  The whole thing was completed in less than half an hour.

The moral of the story:  Define the problem first!  Don't even think about syntax until you have written a clear, concise spec from the problem just defined.  Then if you find yourself spending an inordinate amount of time and/or the solution seems too messy or seems to run too long - google or talk to your colleagues.

Thanks for a great blog; your post made me realize it's more about how we think than throwing code at the problem - the code should be the last thing!

Thanks, Mary!  I'm glad I would be of assistance.  The "moral" that you wrote says it all. 90% of programming isn't writing code at all, it is simply defining what your code will do -- and that's always the hardest part! 

---

In response to my post on grouping by month, Mark writes:

I'm so close! I've tried all the things in this article, but can't seem to do what I want to do. I've been tearing my hair out for days! Here's what I'm trying to do.

Basically I need a sql procedure that looks at an invoicing table that totals amounts by month/year and quarter at the same time. Here's how my table looks:

Project ID  Date      Amount
1                3/11/08    10.00
1                4/18/08    10.00
1                6/22/08    10.00
2                3/01/08    10.00
2                9/15/08    10.00

I would like the output to have dynamic columns, so an output may look like:
Project ID  Jan'08  Feb'08  Mar'08  Q1'08  Apr'08  May'08  Jun'08  Q2'08  Jul'08  Aug'08  Sep'08  Q3'08
1              0.00    0.00      10.00    10.00  10.00    0.00      10.00  20.00  0.00    0.00      0.00      0.00
2              0.00    0.00      10.00    10.00  0.00      0.00      0.00    0.00  0.00    0.00      10.00    10.00

I would like the query to know if there was no value in Jan &amp; Feb'08, but still list all the months in Q1.

I'm not opposed to using a calendar table, but would like to try to avoid it if possible.

Any help would be greatly appreciated!

Hi Mark -- First off, never be afraid to use a calendar table!  There is nothing hacky or unusual or tricky about them, they can make your life much easier, your code much shorter, and everything much more efficient.  If grouping by month or some other time period is important to your reporting,  then defining those months in a permanent, nicely indexed table makes perfect sense.

In this case, though, since you are outputting one column per month for a single year, I recommend to simply use CASE expressions to "cross tab" your data.  You can alias your columns as M1,M2,M3...M12 and Q1-Q4 so that no matter what year you are running the report for, your columns will be consistently named, and you can let your presentation layer handle outputting nice column headers with the current year/month for each one.

So, all you really need is something like this:

select projectID, Y as [Year],
  sum(case when m=1 then amount else 0 end) as M1,
  sum(case when m=2 then amount else 0 end) as M2,
  sum(case when m=3 then amount else 0 end) as M3,
  sum(case when m in (1,2,3) then amount else 0 end) as Q1,
  ...
  sum(case when m=12 then amount else 0 end) as M12,
  sum(amount) as Total
from
(
  select projectID, Amount, DatePart(Month, [Date]) as M, DatePart(Year, [Date]) as Y
  from YourTable
  where [Date] >= @StartDate and Date < @EndDate
) x

Of course, you'd define @StartDate and @endDate as '01-01-2008' and '01-01-2009', respectively.

---

In this comment, Stewy writes:

I have an issue with both DISTINCT and GROUP BY.

The issue is that using either one, the results comes back ordered as if using order by.

I need the unique results in the order they are in the database. How can I do this? Thanks

Stewy -- Relational databases have no obligation to store data in any specific order, or to keep track of the order that things were entered, or to return things "as they are in the database."  There is no such thing as getting data out "the way it is stored" because a relational database may move or re-order data temporarily to efficiently execute a query depending on indexes available.   You must always explicitly specify how you want your results using an ORDER BY clause.  If you want to keep track of the order that you added data to a table, you should have a "timestamp" column that records the exact moment each row was added via a DEFAULT value or a trigger.  Or, at the very least, you can use an IDENTITY.  Then, you can simply order by that column.  This is a very important concept to understand when working with relational databases.  Things are returned based on the data itself, not based on physical storage characteristics.  I hope this helps.

---

In response to Always Use Parameters, Karuna writes:

Hi Jeff,
Just wondering if I build the Sql in Stored Procedure (Dynamic Sql) based on the parameters passed to stored proc, will it still be a possible candidate for Sql Injection? Basically I want to build the Sql in the stored procedure instead of doing it in .Net code as displayed in the article.

Dim cm As New SqlCommand("", YourConnection)
cm.CommandText = "DELETE FROM YourTable WHERE ID=@ID "
cm.Parameters.Add("@ID", SqlDbType.Int).Value = ID

If Name <> "" Then
cm.CommandText &= " And Name=@name"
cm.Parameters.Add("@Name", SqlDbType.VarChar).Value = Name
End If
If TranDate <> DateTime.MinValue Then
cm.CommandText &= " And TranDate = @TranDate"
cm.Parameters.Add("@TranDate", SqlDbType.DateTime).Value = TranDate

Hi Karuna -- you are absolutely 100% safe from SQL Injection by doing this.  Remember, SQL Injection is not about genereal SQL concatenation or about building a SQL statement dynamically.  It only can happen when you concatenate user input into a SQL string and execute it.  If you put together a big SQL statement via concatenation but you only incorporate user input via parameters, there's no need for scrubbing data or worrying in any way about SQL Injection -- it will never happen, under any circumstance. 

Avoiding SQL Injection is the easiest thing in the world -- simply do things the easy and correct way and you'll never need to worry about it.   It's like if there was a big controversy in the news about thousands of people crashing their cars because they are driving them with decorative tin foil covering their windshields, and asking the experts "how can we solve this crisis?"   Should we cut holes in the tin foil, or add mirrors, or incorporate a camera and a tv monitor?  Uh .. no.  You should just take the tin foil off of your windshield and do things the easy, simple and correct way and don't make things over complicated.   That's basically what this whole SQL Injection thing is about -- bad programmers doing stupid things when all they need to do is write decent code the easy way -- simply by using parameters.

---

Gocs writes:

I have tried to compute the number of hours based on the datetime in MS SQL 2005.  However, I am not sure the hours is correct.  Do you have any idea on how to do it correctly?


Gocs -- I think you really need to read this very carefully.  I'll be waiting!