Artikelen

Package generation with SSIS - an overview

SSIS

Package generation with SSIS - an overview

Although you can hardly ignore BIML (with bloggers like Cathrine Wilhelmsen and Bill Fellows sharing good stuff), there are still other options for package generation in SSIS. While enjoying a small conversation on LinkedIn about the automation options, I suddenly remembered still having a slidedeck from a presentation I gave early 2014.

Please note that these are my early views on Biml, dating from end-2013. Please do not base any decisions on this: things are outdated (BimlExpress now provides free syntax highlighting for BimlScript inside Visual Studio; BimlT files can alter existing packages) and technology descriptions may not be 100% accurate (nothing confirmed by Varigence)

The presentation was about SSIS 2012, so not everything will work the same way in SSIS 2014+. For example: from what I've heard, EzAPI doesn't have support for SQL Server 2014+. I'm planning to write an update shortly, but meanwhile this might prove useful for gathering a high-level overview.

Several products are available for Data Warehouse automation in SQL Server. Examples are Quipu (Data Vault), LeapFrogBI (Dimensional Model) and BI Accelerator (although they haven't posted anything since 2012). Besides, almost all BI consultancy firms seem to have developed their own fancy ETL / DWH generator. I won't digress into these products, but focus on the frameworks you can use to create these automation tasks instead.

In short, there are three (four) well-known ways to create SSIS-packages programatically:

  • SSIS Automation API
  • EzAPI
  • BIML
  • (Writing out SSIS packages directly - which has become somewhat easier starting with SSIS 2012)

SSIS Automation API uses the SSIS Object Model to perform tasks in packages - AFAIK this is what Visual Studio uses in the background to create your packages (which doesn't mean that everything that Visual Studio does is available via the API - things like detecting column names and file structures aren't, for example). You can access the SSIS Automation API and write code against it in .Net (C# or VB).

Writing code in the Automation API isn't the most modern way of interacting with an API. While the Task Flow is still readable, handling Data Flows feels quite cumbersome..

Creating a Task Flow using SSIS Automation API
Creating a Task Flow using SSIS Automation API
Creating a Data Flow using SSIS Automation API
Creating a Data Flow using SSIS Automation API

The reason Data Flows are so hard, is that we're working directly with COM objects here:

IDTSComponentMetaData100 destination = dataFlowTask.ComponentMetaDataCollection.New();
destination.ComponentClassID = "DTSAdapter.OleDbDestination";
CManagedComponentWrapper destDesignTime = destination.Instantiate();
destDesignTime.ProvideComponentProperties();

The SSIS Automation API can be used for generating new packages as well as altering existing packages. Everything happens in code (C# / VB.Net), but the communication is somewhat outdated: you need to use COM classes.

EzAPI is a wrapper for the SSIS Automation API. The most important thing EzAPI does is providing .Net wrappers for many classes, so you don't need to write this COM-handling-code (code loosely based on Paul Rizza's intro to EzAPI):

//Create package and connection managers:
var package = new EzPackage();

var srcConn = new EzOleDbConnectionManager(package) {
  Name = "SRC Conn",
  ConnectionString = "Data Source=localhost;Provider=sqlncli11;Integrated Security=sspi;Initial Catalog=AdventureWorks2012"
};
var destConn = new EzOleDbConnectionManager(package) {
  Name = "DST Conn",
  ConnectionString = "Data Source=localhost;Provider=sqlncli11;Integrated Security=sspi;Initial Catalog=MSBIP_EZPI_Extract"
};

// Control Flow:
var truncateTableTask = new EzExecSqlTask(package) {
  Name="EST Truncate Table",
  Connection = srcConn,
  SqlStatementSource = "TRUNCATE TABLE " + destinationTableName
};

var dataFlowTask = new EzDataFlow(package) { Name = "DFT Extract" };

dataFlowTask.AttachTo(truncateTableTask);


// Add Data Flow elements:
EzOleDbSource source = Activator.CreateInstance(typeof(EzOleDbSource), new object[] { dataFlow }) as EzOleDbSource;
source.Connection = srcConn;
source.Name = "Source";
source.Table = "SourceTable";

EzOleDbDestination Dest = Activator.CreateInstance(typeof(EzOleDbDestination), new object[] { dataFlow }) as EzOleDbDestination;
Dest.Connection = destConn;
Dest.Name = "Destination";

// Connect the Data Flow:
Dest.AttachTo(source, 0, 0);
Dest.Table = "DestinationTable";
Dest.AccessMode = AccessMode.AM_OPENROWSET_FASTLOAD;
Dest.SetComponentProperty("FastLoadMaxInsertCommitSize", 100000);

This is much easier to read than the original Automation API: this looks like plain .Net-style, offering all pleasures like declarative style programming.

EzApi can also be used both for generating new packages as well as altering existing packages. You can write code in C# or VB.Net, and do so while using regular .Net-like constructions for the SSIS objects too.

BIML stands for BI Markup Language. It is a domain-specific XML to generate packages. I heard somebody once say it uses EzAPI, but I find that somewhat improbable.

Every SSIS-object is an XML element in BIML, which gives a highly declarative way to define how your packages should be generated: all nesting in SSIS (foreach containers, data flows) is modeled nested in BIML too:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <Package ConstraintMode="Parallel" Name="Voorbeeld">
      <Tasks>
        <Dataflow Name="Import CSV">
          <Transformations>
            <FlatFileSource ConnectionName="CSV connection" Name="Read CSV" FileNameColumnName="filename" />
            <DerivedColumns Name="DC Audit info">
               <Columns>
                <Column Name="Load_DateTime" DataType="DateTime">@[System::StartTime]</Column>
              </Columns>
            </DerivedColumns>
            <DataConversion Name="Convert Project_naam to fixed length">
              <Columns>
                <Column SourceColumn="project_naam ntext" DataType="StringFixedLength" Length="50" TargetColumn="project_naam" />
              </Columns>
            </DataConversion>
            <OleDbDestination ConnectionName="DB" Name="DB Write to table">
              <ExternalTableOutput Table="[schema].[table]" />
            </OleDbDestination>
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

You can run this BIML script from within SQL Server Data Tools, which - if no errors are found - will result in a brand new SSIS package inside the project where you added your BIML file. You just need to have the free and great plugin BIDS Helper installed - which you should have anyway.

In itself, BIML is static. So out of the BIML above, exactly one type of SSIS package can be generated. The clever simpleness and highly descriptive way of expressing the modeling of your package in BIML allows for easy BIML generation though, in two ways:

  1. Generate BIML from another script (like generating XML)
  2. Use BIMLScript

Generating BIML from another script isn't mentioned often, but is not that hard. You can easily use string templates to insert dynamical data in a BIML template, for example:

<OleDbConnection Name="DB" ConnectionString="Data Source={4};Initial Catalog={3};Provider=SQLNCLI11;Integrated Security=SSPI;Connect Timeout=30;" DelayValidation="true">

Viewed from a high level, what will happen is this:

 

Biml-templatingBad thing: you have this 'in-between' BIML-script, and have to execute two separate processes: one for the string mappings, and one for the BIML to DTSX compilation (I've tried to address the BIML engine directly, but haven't succeeded so far).

Using BIMLScript is the 'native' BIML way for automating things. It's like an ASP or PHP style of generating web pages:

<FileFormats>
 <FlatFileFormat Name="CSVFormat" RowDelimiter="CRLF" ColumnNamesInFirstDataRow="true" IsUnicode="false"  TextQualifer='"'>
  <Columns>
  <# foreach(KeyValuePair<string, string> column in columns) {{ #>
   <Column Name="<#=column.Key#>" DataType="AnsiString" Length="500" Delimiter="<#=column.Value#>" />
  <# }} #>
  </Columns>
 </FlatFileFormat>
</FileFormats>

The key difference with ASP or PHP is that the 'in-between' product (namely, the 'compiled' BIML script) is not visible[ref]Of course, this may be because it doesn't exist at all, but is just 'parsed' directly into the internal BIML engine.[/ref]:

BIMLScript

While syntax highlighting and auto-completion in SSDT works perfect for BIML (as long as you have installed BIDSHelper), it doesn't work for BIMLScript. This makes it hard to debug BIMLScript, because you don't see the in-between BIML code, but the BIML compiler's error messages are about exactly this part (which is invisible to you). The developers of BIML, Varigence, have a solution to that.

Mist is Varigence's development environment for BIMLScript. It is not a free tool, but has a rather impressive feature list (check it out here). It makes the development of BIMLScript much easier because you have syntax highlighting and see what's causing errors on your screen.

Editing BIMLScript in SSDT
Editing BIMLScript in SSDT
Editing BIMLScript in Varigence Mist
Editing BIMLScript in Varigence Mist

I have used BIMLScript sometimes in a kind-of in-between way, by using PowerShell to generate BIMLScript, which compiled to SSIS packages. Although this sounds cumbersome, neither my client nor my employer back then wanted to buy Mist, and developing all logic inside BIMLScript caused too much errors for me.

Finally keep in mind that it all remains SSIS - with all caveats, and 'features' of SSIS still in place. Also, keep in mind that BIML overwrites existing objects inside your project. One issue that I faced was the fact that my project connection managers were overwritten, and my project parameters didn't work. But that was early 2014 - chances are high that this is fixed meanwhile.

 

SSIS API EzAPI BIML
Language C# / VB.Net (+COM interfacing) C# / VB.Net BIML (XML)
Develop in Visual Studio Visual Studio SSDT BI (Bids Helper)
Varigence Mist
Execute Executable Executable SSDT BI (Bids Helper)
Varigence Mist
Automation options Create packages
Alter packages
Create packages
Alter packages
Create packages
Alter packages

Comment (1)

  1. Hi Koos

    There is a data warehouse automation tool called Dimodelo Architect which uses EZAPI to generate SSIS packages for SQL Server 2008, 2012, 2014 and 2016. And it's a Visual Studio plug-in.

Comments are closed.