Wednesday, March 5, 2014

SSIS Lookup Transformation Using as a Data Source

Pragmatic Works offers a Source component as part of its Task Factory SSIS component library. Using this component along with the out-of-the-box SSIS Cache Transform, it is possible to populate a Lookup Transformation with data.

*Note: this post assumes you have, at a minimum, installed the trial version of the Pragmatic Works TaskFactory product.

Example scenario where this is useful: We have a data warehouse that contains customer account data. Each customer account can have a broker that is responsible for servicing the account (each account can have 0 to 1 broker and each broker can have 0 to many accounts). We want to write this data warehouse account data to a Account object. The Account object also has a relationship to a Broker object. In order for us to maintain this relationship in, it is necessary for us to pass the internal Broker Id when writing the data to Since we do not maintain the Broker Id in our data warehouse me must look it up from using a Broker Number that is a unique Broker identifier in our data warehouse. Since we cannot use the Connection Manager in a Lookup Transformation we will use an approach that leverages the Cache Transform to allow us to have a Lookup Transformation that uses data.

Let's take a look at the Control Flow of our package. 

As you can see, this simple package contains two Data Flows. The first (PopulateLookupCache) populates the Cache Transform  and the second (OleDBSourceToSalesforce) writes data to . I've highlighted three of the components necessary to make this happen. The first is the PopulateLookupCache Data Flow and the second and third are Connection Managers of types CACHE and SALESFORCE.

Let's look at the PopulateLookupCache Data Flow.

This Data Flow uses the Pragmatic Works Source component to retrieve data from using a SOQL query and populates a Cache Transform (LookupCache) with the internal Id and the external Id (BrokerNumber). Below is the Cache Connection Manager definition. Note the Index Position value of 1 for the BrokerNumber column. This means that BrokerNumber will be the column used to perform the lookup when we define our Lookup Transformation.

Now that we've populated our Cache Transform we can move on to our primary Data Flow that will perform our lookup using our cached data.

In this data flow we retrieve Account data from a SQL Server database using our OleDBSource Connection Manager to write data to the Account object in In between our source and destination components we have 1) a Data Conversion transformation to convert our source system's BrokerNumber from non-unicode (VARCHAR/STR) to unicode (NVARCHAR/WSTR) and 2) our broker lookup that uses data to retrieve the Account's internal Broker Id using our source system's broker number.

Let's take a look at our lookup. As you can see, we're using the Cache connection manager connection type as opposed to the OLE DB connection manager connection type that we're so used to. This allows us to use the data that we wrote to the Cache Transform in our previous data flow.

Next, we set the Connection to our LookupCache Connection Manager.

Once we've set our data source we can configure the lookup columns.

We join on our data warehouse BrokerNumber to retrieve the Salesforce Broker Id that we alias as Broker__c since this is the name of our Account object's broker field.

As you can see from this post, using the Pragmatic Works Task Factory Source component and the out-of-the-box Cache Transform component, it is possible to populate a Lookup Transformation with data even though the Lookup Transformation does not directly support a Connection Manager.

Friday, February 28, 2014

RapidMiner jTDS Network error: IOException: Connection refused: connect

Network error: IOException: Connection refused: connect

There a quite a number of reasons why you might get this error. However, assuming the following:
  1. You're using jTDS to connect to connect to a SQL Server instance 
  2. You're trying to use Windows authentication (not passing SQL Server credentials)
  3. You've installed the ntlmauth.dll library (link to download page). 
Confirm the following via the SQL Server Configuration manager (Start->Run->SQLServerManager11.msc):

1.  The TCP/IP protocol is enabled.

2.  Your SQL Server instance is listening on the SQL Server default port of 1433 -or- you have entered the correct port in your jTDS configuration/connection string.

3.  The SQL Server Browser service is running.

If you confirmed and/or changed your configuration based on the above recommendations, restarted your SQL Server instance, and still cannot successfully connect, refer to the jTDS SourceForge FAQ page.