Friday, August 10, 2012

SSIS Event Handlers: Behavior In Package Hierarchy


I recently stumbled onto what seemed to be an odd performance problem in an SSIS project I was working on. The problem was that I had a number of straightforward packages that ran extremely fast but sometimes, without explanation, ran really slow.

Luckily the project included a robust logging system that provided, among other things, the execution duration of every package and Control Flow task.  The logs showed that the Data Flows in the packages in question were running in 10 to 20 seconds yet the packages were at times taking upwards of 20 minutes. Furthermore, I found that when called on their own, each package did not run much longer than its single data flow, yet when run as a child of another package (via an execute package task) the execution duration more than doubled. In one example I had a package that ran in 10 seconds on its own, 23 seconds as a child, 56 seconds as a child 3 levels deep, and almost 3 minutes when run as a child four levels deep. In that same example the Data Flow took only 7 seconds. So why was the package so slow when run as a child of one or more packages?

Time for SQL Server Profiler.

The logging mechanisms in place were stored procedures being called by Execute SQL tasks and were being passed things like a batch Id, Package Id, Source Name, Source GUID, Start Time, End Time, etc. Once I began profiling it was clear that there were way too many Pre and Post execute logging stored procedure calls.

Let me take a step back and explain the project/package design. The project contained about 50 SSIS packages, some of which were being called by Execute Package tasks within the project. For example, there was a Master package which called some staging packages and some staging packages that called lower level staging packages, and those in turn called some generic logging packages. There were as many as four packages in any given package execution hierarchy. In each package, regardless of where in the package hierarchy it fell, there were four event handlers:
  1. OnError: Handled exceptions that were then logged to the log database.
  2. OnVariableValueChanged: Handled the change in those variables we were interested in and logged to the log database.
  3. OnPreExecute: Handled the logging of the pre-execution data (e.g. start time, source name, etc.) for all package tasks.
  4. OnPostExecute: Handled the logging of post-execution data (e.g. end time) for all package tasks.
Getting back to what I was seeing in Profiler (many calls to the Pre and Post execute logging stored procedures), it only made sense that the OnPreExecute and OnPostExecute event handlers were being fired each time the logging stored procedures were being called. So why would those event handlers fire so many times?

I started to research event handlers propagation to understand what happens in SSIS when we have package level event handlers and a package design in which packages call other packages. I quickly ran into Jamie Thomson’s “SSIS: Understanding event propagation” blog post. In his post he talks about three things:
  1. Events being caught by more than one event handler.
  2. Event handlers firing events.
  3. The System::Propagate variable.

Numbers 1 and 2 were what I was most interested in. Number 3 is something I’ve written about in the past in a blog post title “SSIS Foreach Loop Container: Continue on Error” when explaining how to continue a Foreach Loop container when a child task fails.

I took Jamie’s example and modified it to fit my situation which was a four package hierarchy with a single Control Flow task (empty script task just to get the event handler to fire) in the lowest level package and an OnPostExecute event handler in all packages. Instead of an empty Control Flow task in the parent packages, an Execute Package task was used to call the package one level down (e.g. Package_Level2 called Package_Level1, PacakageLevel3 called Package_Level2, etc.). Screen shots below depict the design.

Package_Level1 Control Flow

Package_Level1 OnPostExecute Event Handler

Package_Level2 Control Flow w/ Execute Package Edit Dialog


When running this sample package(s) I found that each subsequent level in the hierarchy more than doubles the number of events fired. When running Package_Level1 the OnPostExecute event handler is executed twice. Once for the Level1_EmptyControlFlowTask task and then again for the OnPostExecute of the package itself.  When running Package_Level2 the OnPostExecute event handler is fired a total of 8 times. See the table below. Package_Level1 bubbles up the Level1_EventHandler (twice), the Level1_EmptyControlFlowTask, and the Package_Level1 OnPostExecute events to the Package_Level2 event handler. In addition to those 4 events we still have the original two events we say when executing Package_Level1 on its own. And, we have the Package_Level2 OnPostExecute events for the Level2_ExecuteChildPackage (call to execute Pacakge_Level1 package) and the Package_Level2 package execution.

Sequence
Event Handler Package
Task
Event
1
Package_Level1
Level1_EmptyControlFlowTask
OnPostExecute
2
Package_Level2
Level1_EventHandler
OnPostExecute
3
Package_Level2
Level1_EmptyControlFlowTask
OnPostExecute
4
Package_Level1
Package_Level1
OnPostExecute
5
Package_Level2
Level1_EventHandler
OnPostExecute
6
Package_Level2
Package_Level1
OnPostExecute
7
Package_Level2
Level2_ExecuteChildPackage
OnPostExecute
8
Package_Level2
Package_Level2
OnPostExecute

When we execute Package_Level4 we see a total of 52 events fired, all triggered by the single Level1_EmptyControlFlowTask task in Package_Level1.
Below is the formula for determining how many additional events will be fired by the top-level package in a package hierarchy, per Control Flow task when there is a single OnPostExecute event handler task in all packages in the hierarchy.

y = 2^n + x

x = previous package level’s additional events
n = level in package hierarchy (starting at base package level 1)
y = total additional events fired at level n

Package Level
Calculation
Additional Events Fired
Total Events Fired
1
2^1 + 0
2
2
2
2^2 + 2
6
8
3
2^3 + 6
14
22
4
2^4 + 14
30
52
5
2^5 + 30
62
114

As you can see, having nested packages with package level event handlers can create quite a bit of overhead. I initially failed to see this in my logs because my logging stored procedures were ignoring the event handlers that were fired by other event handlers. However, the process still had the overhead of all the events firing and the stored procedure calls, the data just wasn’t being persisted to the database.

Event handler behavior can be very tricky. Don’t be like me, understand what’s happening before implementing your event handler logic.

2 comments: