Artikelen

Notes from the Dutch Biml Usergroup kickoff

Biml / Data Vault / SSIS / Testing

Notes from the Dutch Biml Usergroup kickoff

Today, I attended the kickoff of the Biml Usergroup in the Netherlands. We had several hours of training from Scott Currie at a wonderful location (Impact Hub Amsterdam). Also some hints were dropped about the future of Biml and Varigence products. Here are my notes:

Metadata

Varigence sees three "layers" of metadata:

  1. Business metadata: business rules etc.
  2. Technical metadata: mapping, datatype transformation, etc.
  3. Operational metadata: server locations, capabilities, profiles, usage loads, SLA's, scheduling windows, etc.

Few organizations have currently deployed the third layer in a full way. Personally I see a lot of businesses doing some of the third layer (for example server locations and scheduling windows), but few have implemented it in a full way which is used automatically inside the generation / control of ETL flows.

Varigence also sees a natural evolution of metadata registration in three stages / levels:

  1. Natural
    • Metadata is created as a side-effect of business processes
    • Must be machine readable, relatively correct
    • Examples: self-describing web-service, db schema's, data dictionaries, etc.
  2. Hybrid
    • Extra information needed for you BimlScripts
    • Often stored in Biml annotations, Db tables, or extended properties
  3. Synthetic
    • When you've added around 6 hybrid attributes you're going to have a hard time - most organizations start to switch to a full "external" metadata model
    • Often stored in MDS, Web services, Sharepoint lists (really) or Biml Metadata Models

Although the stages / levels usually follow each other, one isn't better than the other: as you "jump in" on a "higher" level of metadata administration, you'll probably do it wrong, because you don't have the experience of what's needed in a lower level

BimlFlex

  • Biml is a framework generation tool: you can build your ETL framework, consuming your self-built metadata models.
  • As requests came dropping in for a pre-populated framework too, BimlFlex was developed
  • Varigence's two years of ETL automation experience is included
  • Scott showed an early version of the metadata model, containing a lot of the metadata needed for most scenarios:
    • schema 'meta' (all as tables defining the possible options):
      • ChangeType (Type 1, Type 2, Hist)
      • CompressionType
      • ConnectionType (OLEDB, ADONET, FILE, ..)
      • DataType (Biml datatypes mapping)
      • IntegrationType (SRC, EXT, STG, DWH)
    • schema 'app'
      • Object definitions like tables
      • Column definitions
      • Connection definitions
  • This early version has been extended, particularly to support Data Vault scenarios
  • BimlFlex will be launched later this month on the World Wide Data Vault Conference

In the below video, Scott discusses a part of a Metadata Model, which is essentially an early form of what BimlFlex became.

BimlExpress

  • BimlExpress is the new freemium plugin for Biml development inside SSDT
  • Full syntax highlighting for BimlScript is here
  • Internally, it runs a website - exactly the same as BimlOnline
  • Reasons for not continuing using BidsHelper are:
    • Discussion about licensing (according to Scott basically some people trying to reverse engineer Biml, claiming "they didn't know it was not allowed").
    • Opportunity to count distinct users (which is the first thing the SSIS team asks whenever discussing an issue: "how many users are using this Biml thing?")
    • Independent release cycle

Biml Bundles

Varigence introduces a new concept called "Biml Bundles".

  • File extension of a bundle is '.BimlB'
  • A bundle is an encrypted framework, which you can use to apply your product at a customer without having to worry about anyone stealing your thunder.
  • Although everything is encrypted, there are so-called framework extension points to which users of your bundle can add "hooks" - I think this is a pretty neat way of encapsulation
  • Currently only in Mist (which is being renamed to Biml Studio)
  • BimlFlex is also a bundle
    • So it's not included in BimlExpress
    • ... but if you buy BimlFlex, you'll also receive Mist / Biml Studio

Biml Transformers

  • Transformers don't create new objects, but instead alter existing ones.
  • File extension of a transformer is '.BimlT'
  • This is not for all companies, but for some it'll be pretty useful
    • For example, when moving to SQL Datawarehouse (cloud) after having everything already on a regular database.
    • Think new datatype conversions, workaround, etc.
  • Will only be in the paid version

Data Lineage

Answering a question from one of the attendees (I thought Frenk van Beekveld), Scott shared some insights in data lineage:

  • Nowadays, data lineage tools just show how columns are built up ("read lineage")
  • In order to include it in a Varigence product, they're re-thinking the way data lineage works: could it become read-write?
  • Write means basically:
    • View the definition / build-up / lineage
    • Alter the definition of a column
    • Be warned: this causes the following n side-effects!
      • Do you want this to happen?
      • Or do you want to "fork" this in a new lineage?

Cloud

  • BimlOnline is open beta, has been for a while and will be for a while too. But hey, is Google Maps out of beta already?
  • ADF (Azure Data Factory) support will come, but Scott reminded we really have to see ADF primary as a SQL Agent more than an SSIS. It's far from a mature data integration tool
  • SSIS in the cloud is not yet here. The only way you can scale-out SSIS in the cloud is as follows:
    • Local small coördination node
    • Work Pile (queue of work-to-be-done) stored centrally
    • Workers (VM's in the cloud running SSIS) picking ETL tasks off this queue
    • Coördinator spins up / tears down workers when needed
  • As for migration towards the cloud:
    • Biml Transformers can aid in transforming existing ETL solutions
    • Varigence has some customer examples of moving to the cloud
    • Around 80% of the transformation was done automated.
      • Scott wasn't sure if things like distribution keys (which is a thing that could be supported by existing statistics about current running ETL quite well IMHO) are inside those 80%

Testing

Scott finally showed some features of the Biml Enabled SSIS Test framework.

  • Is the same as @SQLReeves showed last year on SQLRally Nordic
    • https://www.youtube.com/watch?v=O2QTILtxvUs&feature=youtu.be
  • Framework was updated last in 2014, but new additions to come:
    • New asserts
    • MSBuild integration
    • New outputs (other than db)
  • GitHub (https://github.com/bimlscript/best) will be updated after BimlFlex has been launched
  • BimlFlex will have this testing framework included

On Being Free

It's funny that Scott apologizes a lot for even briefly diving into features only on the paid products. Not necessary, according to me: the products are wonderful, lots of effort and experience are put into it![ref]Having said that IMHO the price is fair, still around $250/month per dev requires a productivity boost of at least 5% in comparison with free products - still have to prove that to my CFO 😉 [/ref]

  • Historically, Varigence has always given away lots of their work for free, and they'll continue to do so. There are few (maybe no) companies giving this percentage of their work away for free, without having the barrier set at "if you want to start working really, you need our paid product")
  • When features are introduced as free, they will stay free forever. Sometimes this means the introduction of features in the free product needs to be postponed to see the complete impact.
  • According to Scott, this is shown in the release of Biml Express: they could've easily dropped some features and move it into the paid versions of Biml, but they didn't. Only added new features.
  • The "free while in beta" announcement on Biml Online is mainly a lawyer thing - you can expect BimlOnline to remain free too.
  • If a good SaaS-model is developed, the tools will all be free. But we're not there yet...

Conclusion

I had a great time today. Met a lot of peers I usually only see online (which is nice), had some great discussions about test automation (more to come, stay tuned) and did some new insights. Here's what stood out for me:

  • Scott stated repeatedly he never wants features to be added just to fill checkboxes. When you hear that phrase in isolation, it sounds like a sales soundbite, but I think Varigence lives up to this:
    • No lineage visualization / insight, unless it can change the way ETL is performed today
    • No default framework (but a framework engine) until the experience (after two years of already using parts of it) was gathered and the product matured
    • No SSRS things yet: "We could maybe alleviate 20% of the real pain, while 80% of the work remains in place. Well, in that case we'd better allocate our development resources to developing features that have more impact"
  • Scott has a clear vision on the future of ETL, which he's quite willing to share.
  • Plans are out in the open. Barely any secrets: read all docs and re-build the framework from scratch, if you want (but it's way more cost-effective to just buy Biml / BimlFlex)
  • For an international company, the amount of attention given to Data Vault surprised me. In the past, Data Vault remained relatively small modeling technique (international companies I've worked with always wondered why the Dutch guys were giving it so much attention). Maybe the times are a'changing?

If you have the opportunity and are interested in the future of ETL, SSIS and BIML, attend training or sessions from Scott. And ask him questions. He'll answer your question - as well as two questions that you didn't ask, but probably should have asked 😉 . This gives some insight in the rationale behind the BIML development, and the vision of ETL in the future.

Comments (2)

  1. Hallo all. Thanks for this day of inspiration and feedback on the Biml way of doing integration & BI. This seems a good add-on to mid-class systems. Adding professional functionality you get in most top class systems, but with a smaller price-tag.
    The next day, I tried to give an elevator pitch on Biml to a colleague.
    After some discussion we came to the conclusion Biml is not really a new way of doing things. My colleague mentioned It is built on old concepts of scripted frameworks, metadata-models and 4gl-languages. However with new, much faster, technology and integrated simplicity. I couldn’t give a good reply why a customer should invest in BIML. Just the minimized development time of a project with a lot of repetitive tasks and similar interfaces. What do you think ? What should I have replied? What is the best one-liner to sell BIML in a business case?.
    (Note. Did anyone receive the pp-slides from the presentation by Scott Currie? )

    1. Hi Juriaan, thanks for your reply!

      As you concluded correctly, meta-modeling is nothing new (even with SSIS we did this before Biml, using the SSIS API via COM calls in the early days, and later on using EzAPI. See my also my earlier post "Package generation with SSIS - an overview" (but note that these are my early personal views from more than two years ago - so don't base any decisions upon my guesses about inner workings or capabilities back then 🙂 )

      The Metadata model that was discussed is inside BimlFlex. BimlFlex is not only Biml as a tool to help you build frameworks, but also buy a pre-built framework with that (so you can hit the ground running) and a way of automated testing and verifying the workings of your data flows.

      So whenever metamodels are already in place, there is no monetary investment required to start using Biml and BimlScript. You can use Biml just to consume your existing metadata model via Biml Scripts, executed by BimlExpress. So whenever your customer sees the value of a metadata model, then Biml is just the primary tool to generate ETL out of that. Basically what Biml has been all about: a tool to help you build frameworks. I think this is the big "selling point" of Biml: it's basically free, and for 75% of the solutions, the free version will suffice. Of course, there's still investment in time, but it's quite easy to get started (as you already noticed), and non-intrusive in your development stack (output is SSIS, so you can switch out Biml whenever you like).

      Personally, I think that if you're heavy on SSIS, then chances are high you will use Biml, as it significantly improves the way of work. On the other hand, if you've got an SSIS automation framework already in place (for example an SSIS XML-generating framework, or something using the SSIS the question remains what's more expensive: switching, or continue building your own tech.

Comments are closed.