Tuesday, November 15, 2011

Overwrite Excel File in SSIS: Workaround for Excel Connection Manager

I recently needed to create a process that would ultimately create an Excel file for import into another system. The other system was built and maintained by a third party and I had no control over it. It required that an Excel file be present in a specific directory with a specified name. The solution required that 1) if the file did not already exist it would be created and 2) if the file did already exist that it would be overwritten and a copy of the new file would be written to an archive folder with the date and time appended to the file name (so that the other historical files wouldn't be overwritten and so that it would be easy to know what date and time the file was created).

Current File Name: Extract.xls

Historical File Name: Extract_YYYYMMDD_MMSS.xls

I figured this would be easy. The first thing I did was look for the OverwriteFile or ReplaceFileproperty. What I found was that the only non-connection specific property the Excel Connection Manager exposed was FirstRowHasColumnName. At that point I figured I would experiment and find out what would happen if I tried to write to a pre-existing file. The result was that the data being written to the file was simply appended to the pre-existing data in the file. Definitely not the behavior I was looking for.

I then hit "the Google" as George W. so fondly calls it. I found solutions that proposed deleting data from worksheet(s), deleting worksheets, and creating new worksheets using the Jet 4.0 SQL syntax to "DROP" and "CREATE" tables. In Jet/Excel terms, a table is an Excel worksheet. I found that the DROP didn't actually delete the worksheet but deleted rows from the worksheet. This could have worked except that the next time the worksheet is written to, the new rows start after the last row of the previously deleted data, leaving a bunch of empty rows in the beginning of the worksheet. This approach did not cut the mustard. So ended my search for the wheel I was attempting to not reinvent.

The approach I came up with does the following.

  1. Creates a shell (only the headers) Excel file with the name Extract_wip.xls.
  2. Uses a Data Flow to populate the wip Excel file.
  3. Copies the wip file to an Archive folder but renames to Extract_YYYYMMDD_MMSS.xls.
  4. Renames the wip file to Extract.xls, overwriting the previously created Extract.xls files (if one exists).
This is a screen shot of the Control Flow.

*If you don't have an archival/historical requirement you can remove that functionality and make this solution even simpler.

Connection Managers

The Connection Manager used in the final step of the process to rename the wip file to the final file name. Nothing fancy here although I would suggest making this configurable.

The Connection Manager used to copy the wip file to an archive folder with the date and time in the file name. Below is the expression used to generate the file name. You will want to set the Connection Manager's ConnectionString expression to this.

"C:\\ExcelExtract\\Archive\\Extract_" + RIGHT( "0000" + (DT_STR, 4 , 1252) DATEPART( "year" , GETDATE() ) , 4 ) + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "month" , GETDATE() ) , 2 ) + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "day" , GETDATE() ) , 2 ) + "_" + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "hour" , GETDATE() ) , 2 ) + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "minute" , GETDATE() ) , 2 ) + ".xls"

This Connection Manager should point to the wip file. You'll want to have a ConnectionString expression similar to the following.

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\ExcelExtract\Extract_wip.xls;Extended Properties=\"EXCEL 8.0;HDR=YES\";"

*HDR is whether the file has a header row.

This Connection Manager points to the wip file and is used to copy and rename the populated Excel file.

This is used by the data flow. There could be one or more Connection Managers for your data flow.

Control Flow Tasks

Connection Type: EXCEL
Connection: ExcelFileForDataFlow
SQL Statement:

CREATE TABLE [ThisWillBeYourWorksheetName](
[SomeCharVal1] nvarchar(25),
[SomeCharVal2] nvarchar(255),
[SomeCharVal3] nvarchar(50),
[SomeDecimalVal1] double precision,
[SomeDecimalVal2] double precision,
[SomeIntegerVal1] integer)

*It is important that you use the JET syntax including the appropriate data types when writing your CREATE TABLE script.

This task will create a new Excel file with a worksheet named using the name in your CREATE TABLE script. It will use the file name found in the ExcelFileForDataFlow Connection Manager.

*If a file by the same name exists it will add a worksheet to that file. This is why we want to make sure this file does not exist when we start our process. Hence the wip file.

This is the Data Flow that writes to your Excel File. It doesn't matter what the data flow does, just make sure that it writes to an Excel Destination that uses the ExcelFileForDataFlow Connection Manager.

Destination Connection: ExcelFileForArchive
OverwriteDestination: True
Operation: Copy File
Source Connection: ExcelFileWIP

This File System Task copies the wip file to an Archive (or any other) folder using the Extract_YYYYMMDD_MMSS.xls naming convention.

Destination Connection: ExcelFile
OverwriteDestination: True
Operation: Rename File
Source Connection: ExcelFileWIP

This File System Task renames the wip file to the Extract.xls name, effectively overwriting the previously created Extract.xls file (if one exists).


While it seems like it would make sense for the Excel Connection Manager to allow for the overwriting of an existing file, it doesn't. The above process is the simplest way I have found to reproduce this seemingly lacking functionality.


  1. Not sure how this would work. When you use ExcelFileForDataFlow as destination for df_PopulateExcelFile it creates a WIP file and because it creates the WIP file the CreateExcelFileAndHeader task fails. If you delete the excel file the package wont run because the WIP file used as destination does not exist. Or am I missing something.