Identifying SSIS execution errors in SQL Server 2012+

First of, if you are not using the project deployment model in SSIS 2012+ this blog post is not for you – sorry. The techniques described below assume you are storing your packages in SSISDB. However, you don’t necessarily need to execute your packages through SQL Agent jobs.

Okay, let’s get right into it. After getting too frustrated with waiting for SSIS execution reports out of SSMS, which are not only pretty slow but also give you a lot of information you don’t really need, I figured there has to be a better way to retrieve relevant information directly. What better place to look than tapping directly into SSISDB, the system database which is created automatically when you create an Integration Services Catalog?

Even though this is far from a deep dive into the table structure of SSISDB, I’d like to give you a little more background information on the tables we care about for this script (note that most of the documentation refers to objects with the catalog schema (Microsoft Documentation), those are views that sit on top of the internal schema tables which we use here):

  • internal.executions: lots of helpful information in this table, in particular the [package_name], which account was used to execute the package ([executed_as_name]), additional information about the project the package is part of ([project_name]) and if an environment was used to schedule the package we get information about this as well ([environment_name])
  • internal.operations: this table will tell us when exactly the package was executed ([start_time] and [end_time])
  • internal.operation_messages: holds the error message ([message]) we came here for in the first place. This table holds a LOT of information, so we will have to limit our query to those lines that hold errors, we use the [message_type] column to identify what we want to bring back. It might also help to know the exact time the error occurred ([message_time])
  • internal.event_messages: tells us which step in the SSIS package failed, given you name each component in your package ([message_source_name]), otherwise you’ll see a lot of [Data Flow Task]’s and [Execute SQL Task]’s here
  • internal.execution_parameter_values: all package parameters are listed in this table ([parameter_name] & [parameter_value])
  • internal.event_message_context: last but not least the probably second most useful piece of information is right here, execution-time variable and connection string values ([context_source_name]/[package_path] & [property_value])

To tie it all together there are 2 key identifiers we have to use. Unfortunately Microsoft decided to use [execution_id] on some tables as the primary key and [operation_id] on others, just like you find [event_message_id] here and [operation_message_id] there. Confusing at times and doesn’t make it easy to get a grasp of the structure at first but I am sure they have good reasons for doing so…?!

Now lets have a look at the script (finally!) and I’ll walk you through it:

Since you might want to limit your output to a specific time window (I usually only care about the last ~3 hours) we have the variable @Offset to allow us to define how many hours you want to go back in time. The script itself is pretty straight forward, a few INNER JOINs to link all the tables together and two OUTER APPLYs to generate XML outputs which come in handy when you try to identify if the error is caused by a wrong connection string, incorrect package parameter or variable. The WHERE clause limits the output to anything within the specified time window based on [message_time] and [message_type] = 120 ensures that we only return errors. I like to see most recent errors on top, but feel free to remove the ORDER BY in case you don’t care. Just go ahead and remove it!

If you check the code within the OUTER APPLYs you might notice that I limit the output somewhat, for example I am not returning any sensitive package parameters as those will be scrambled anyways and won’t help you figuring out anything. Once again, feel free to add or remove WHERE clauses so it works best for you!

Since I am lazy I chose to cast the error message as XML so you can quickly click on it and see it in a new window for convenience. You might ask what in the devils name made me use WITH (NOLOCK) all over the place, but in a busy environment you might find yourself waiting for quite some time as SSISDB tables are often used heavily, the logging is quite verbose but of course it mostly depends on your SSIS usage. I also threw in an OPTION(RECOMPILE) for good measure as the number of records you want to retrieve can vary quite a bit and more often than not it does more good than harm for this specific query.

Now let’s not stop here. Say we also want to gather any information we can get about the SQL Agent job which was actually executing the SSIS package (you are executing packages that way, RIGHT?!). Back in the old days™ you were able to use the SQL Agent job history to troubleshoot SSIS errors, but nowadays all you get is a generic error message:

Executed as user: [USERACCOUNT HERE]. Microsoft (R) SQL Server Execute Package Utility Version xx.x.xxxx.x for 64-bit Copyright (C) Microsoft Corporation. All rights reserved. Started: xx:xx:xx PM Package execution on IS Server failed. Execution ID: 12345, Execution Status:4. To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report Started: xx:xx:xx PM Finished: xx:xx:xx PM Elapsed: xx.xxx seconds. The package execution failed. The step failed.

They tell you to open the reports for more information. But we don’t want to do that anymore, so I am eternally grateful that the error message contains the [execution_id], we will use it to tie everything back to SSISDB (hooray!). All we need to do now is join a few system tables in msdb and as if by magic we know the job name and step along with the command to execute the package (or whatever else you might want to retrieve):

And there you have it. Let me know if it helped you troubleshoot your painful SSIS issues and how you would improve it. Who knows, maybe I even got it wrong somewhere?

 

 

Okay, you are still here, ey? I got one more for you then. Here is a bonus index for you to improve query performance:

That’s all folks!

Getting started with XML in SQL Server – PART 2 – Import XML files into your SQL Server

In order to parse XML data you need to have it ready in your SQL server. Since you will not always want to use SSIS or the Import Wizard let’s have a look at 2 different ways to import XML data into a table using TSQL.

 

1. OPENROWSET:

Probably the most common and easiest way to load any sort of BLOB data into your SQL server is the OPENROWSET function that comes with SQL server. To utilize it you need to ensure that it is enabled. Use the following code to enable it:

Once that is taken care of you can import any XML file residing on your machine into a table. Let’s assume you have an XML file in your C:\ drive in a folder called SQL with the filename Collection.xml. The code below will insert it into a temporary table which we create on the fly:

You can now query the column XML_Data_Column and display the file you just loaded.

 

2.  OLE Automation Stored Procedures

Not always does the XML data reside on your server or local machine, often times it needs to be retrieved from the internet using various APIs out there. Even though this is a little more complicated than using OPENROWSET there is no need to worry, SQL server got you covered. Using OLE Automated Stored Procedures you will be able to retrieve your data directly with a few lines of TSQL code. Once again we need to ensure that the functionality is enabled in our SQL server, to make sure it works use the code below:

In order to make the method call we need to do a few things:

  1. we need to create an object
  2. we need to open this object
  3. we need to let the object know what exactly we want to do
  4. we need to send the request
  5. optionally we can check the response status to see if our request was successful
  6. finally we need to save the response to a table.

Now this seems to be an awful lot to do, but it allows us very conveniently to do everything straight out of SQL server without having to set up a retrieval process outside of our code, so jumping through a few extra hoops might be worth it.

Let’s have a look at an example. The code below makes an API call to the website www.boardgamegeek.com and retrieves a list of all board games I own and have actually found the time to play. Since the site usually does not send the response we are looking for in the first go – they send an acknowledgement response first – we need to put the request in a loop until we get the output we really want. The code looks like this:

You can filter out any responses that do not have a HTTPStatus of 200 so you won’t save any trash in your DB, but for this example I decided it makes sense to leave it in there.

And there you have it. 2 ways to import XML using TSQL into your database. I am sure there are other ways (using a linked server connection comes to mind *ugh*) but for now these 2 will have to do.

Feel free to leave any feedback in the comments below and look out for Part 3 where I’ll show you how to get XML into your SQL server using SSIS.

 

Getting started with XML in SQL Server

Part 1 – Parsing XML

Part 3 (not available yet) will use SSIS to achieve the same results as in Part 1 & 2

Part 4 (not available yet) will show you how to use XQuery and create XML files out of SQL Server

Getting started with XML in SQL Server – Part 1 – Parsing XML

A lot of people don’t like XML. It has a lot of overhead, it’s not easy to read for those unfamiliar with the structure and parsing it is a pain in the ass. Well, let’s see what SQL Server offers to load it anyways.

 

Let’s use this simple XML as an example:

This example represents the data of 2 items. Each item has 3 data elements and one attribute. It is important to know the difference between elements and attributes as they are parsed differently. To keep it simple this example does not use a namespace.

One thing you will notice quickly when dealing with XML is that you need to know the structure as well as the datatypes, you will not be able to do much without that knowledge.

The code below assigns our sample data to a variable with the datatype XML, a SELECT statement displays the assigned value of the variable:

 

XML output columns usually are represented as hyperlinks in SSMS (XML columns can – just like VARBINARY – hold quite a lot of data, so think twice before including them in your queries directly). If the XML is not too big you can click on it and a new tab will pop up with the XML code inside:

XML_in_SSMS

XML displayed in SSMS

 

Notice something? SQL Server got rid of the XML declaration. Keep this in mind when dealing with XML files, sometimes SQL Server refuses to convert a string to XML because of the declaration. In those cases you will have to get rid of it. A simple statement using STUFF in combination with PATINDEX will do the trick:

 

No how do we parse our variable? SQL Server offers different ways to parse XML in SQL server, see 2 different approaches below.

 

1. Using CROSS APPLY and the value() method:

If you execute this statement and have a look at the result you will hopefully see why more than one CROSS APPLY is needed to actually parse the data. Value() requires to specify an ordinal number to ensure it will return a singleton, this is done in square brackets. [1] will return the first available value matching the expression used, [2] the second and so on. To flatten the output a CROSS APPLY for each child element with multiple elements will do the trick, if only the root node is referenced the query will always return the first value taken from the first match for all results (take a look at the output on the columns id_without_crossaply / name_without_crossapply / name_with_one_crossapply).

As mentioned earlier attributes and elements are treated differently. To retrieve the value of an attribute you will add the @ sign to the name of the attribute. Elements will require the exact path, their value is retrieved using text().

 

2. Using OPENXML:

This approach is quite different. The system stored procedure sp_xml_preparedocument is used to create an object, referenced by an integer that is stored in a variable. This integer is required when using OPENXML to reference the right rowset provider. To access the data a WITH clause is used to define the output columns, allowing the use of aliases or the given name of attributes and elements. If you chose to use the same name for attributes as used in the XML you do not need to specify the path to the attribute (‘./@id’ is not required in this case). Elements however require a path and text() is used once again to retrieve the value.

Note that you should always destroy the object using sp_xml_removedocument once the XML is no longer required to free system resources.

In this example we chose to directly use the /items/item path, which allows to flatten the XML directly. See how the result changes if you change the path to /items (adding /item to the path of each element in the WITH clause): only the first row will be displayed.

 

Which approach is better? I don’t know. Each XML will be different, these example will not work for every XML out there. But at least now you have the basic means to start tackling XML.

 

Getting started with XML in SQL Server

Part 2 – Import XML files into your SQL Server

Part 3 (not available yet) will use SSIS to achieve the same results as in Part 1 & 2

Part 4 (not available yet) will show you how to use XQuery and create XML files out of SQL Server