Notes from the Dutch Biml Usergroup kickoff7 mei 2016 2016-05-07 23:26
Notes from the Dutch Biml Usergroup kickoff
Today, I attended the kickoff of the Biml Usergroup in the Netherlands. We had several hours of training from Scott Currie at a wonderful location (Impact Hub Amsterdam). Also some hints were dropped about the future of Biml and Varigence products. Here are my notes:
Varigence sees three "layers" of metadata:
- Business metadata: business rules etc.
- Technical metadata: mapping, datatype transformation, etc.
- Operational metadata: server locations, capabilities, profiles, usage loads, SLA's, scheduling windows, etc.
Few organizations have currently deployed the third layer in a full way. Personally I see a lot of businesses doing some of the third layer (for example server locations and scheduling windows), but few have implemented it in a full way which is used automatically inside the generation / control of ETL flows.
Varigence also sees a natural evolution of metadata registration in three stages / levels:
- Metadata is created as a side-effect of business processes
- Must be machine readable, relatively correct
- Examples: self-describing web-service, db schema's, data dictionaries, etc.
- Extra information needed for you BimlScripts
- Often stored in Biml annotations, Db tables, or extended properties
- When you've added around 6 hybrid attributes you're going to have a hard time - most organizations start to switch to a full "external" metadata model
- Often stored in MDS, Web services, Sharepoint lists (really) or Biml Metadata Models
Although the stages / levels usually follow each other, one isn't better than the other: as you "jump in" on a "higher" level of metadata administration, you'll probably do it wrong, because you don't have the experience of what's needed in a lower level
- Biml is a framework generation tool: you can build your ETL framework, consuming your self-built metadata models.
- As requests came dropping in for a pre-populated framework too, BimlFlex was developed
- Varigence's two years of ETL automation experience is included
- Scott showed an early version of the metadata model, containing a lot of the metadata needed for most scenarios:
- schema 'meta' (all as tables defining the possible options):
- ChangeType (Type 1, Type 2, Hist)
- ConnectionType (OLEDB, ADONET, FILE, ..)
- DataType (Biml datatypes mapping)
- IntegrationType (SRC, EXT, STG, DWH)
- schema 'app'
- Object definitions like tables
- Column definitions
- Connection definitions
- schema 'meta' (all as tables defining the possible options):
- This early version has been extended, particularly to support Data Vault scenarios
- BimlFlex will be launched later this month on the World Wide Data Vault Conference
In the below video, Scott discusses a part of a Metadata Model, which is essentially an early form of what BimlFlex became.
- BimlExpress is the new freemium plugin for Biml development inside SSDT
- Full syntax highlighting for BimlScript is here
- Internally, it runs a website - exactly the same as BimlOnline
- Reasons for not continuing using BidsHelper are:
- Discussion about licensing (according to Scott basically some people trying to reverse engineer Biml, claiming "they didn't know it was not allowed").
- Opportunity to count distinct users (which is the first thing the SSIS team asks whenever discussing an issue: "how many users are using this Biml thing?")
- Independent release cycle
Varigence introduces a new concept called "Biml Bundles".
- File extension of a bundle is '.BimlB'
- A bundle is an encrypted framework, which you can use to apply your product at a customer without having to worry about anyone stealing your thunder.
- Although everything is encrypted, there are so-called framework extension points to which users of your bundle can add "hooks" - I think this is a pretty neat way of encapsulation
- Currently only in Mist (which is being renamed to Biml Studio)
- BimlFlex is also a bundle
- So it's not included in BimlExpress
- ... but if you buy BimlFlex, you'll also receive Mist / Biml Studio
- Transformers don't create new objects, but instead alter existing ones.
- File extension of a transformer is '.BimlT'
- This is not for all companies, but for some it'll be pretty useful
- For example, when moving to SQL Datawarehouse (cloud) after having everything already on a regular database.
- Think new datatype conversions, workaround, etc.
- Will only be in the paid version
Answering a question from one of the attendees (I thought Frenk van Beekveld), Scott shared some insights in data lineage:
- Nowadays, data lineage tools just show how columns are built up ("read lineage")
- In order to include it in a Varigence product, they're re-thinking the way data lineage works: could it become read-write?
- Write means basically:
- View the definition / build-up / lineage
- Alter the definition of a column
- Be warned: this causes the following n side-effects!
- Do you want this to happen?
- Or do you want to "fork" this in a new lineage?
- BimlOnline is open beta, has been for a while and will be for a while too. But hey, is Google Maps out of beta already?
- ADF (Azure Data Factory) support will come, but Scott reminded we really have to see ADF primary as a SQL Agent more than an SSIS. It's far from a mature data integration tool
- SSIS in the cloud is not yet here. The only way you can scale-out SSIS in the cloud is as follows:
- Local small coördination node
- Work Pile (queue of work-to-be-done) stored centrally
- Workers (VM's in the cloud running SSIS) picking ETL tasks off this queue
- Coördinator spins up / tears down workers when needed
- As for migration towards the cloud:
- Biml Transformers can aid in transforming existing ETL solutions
- Varigence has some customer examples of moving to the cloud
- Around 80% of the transformation was done automated.
- Scott wasn't sure if things like distribution keys (which is a thing that could be supported by existing statistics about current running ETL quite well IMHO) are inside those 80%
Scott finally showed some features of the Biml Enabled SSIS Test framework.
- Is the same as @SQLReeves showed last year on SQLRally Nordic
- Framework was updated last in 2014, but new additions to come:
- New asserts
- MSBuild integration
- New outputs (other than db)
- GitHub (https://github.com/bimlscript/best) will be updated after BimlFlex has been launched
- BimlFlex will have this testing framework included
On Being Free
It's funny that Scott apologizes a lot for even briefly diving into features only on the paid products. Not necessary, according to me: the products are wonderful, lots of effort and experience are put into it![ref]Having said that IMHO the price is fair, still around $250/month per dev requires a productivity boost of at least 5% in comparison with free products - still have to prove that to my CFO 😉 [/ref]
- Historically, Varigence has always given away lots of their work for free, and they'll continue to do so. There are few (maybe no) companies giving this percentage of their work away for free, without having the barrier set at "if you want to start working really, you need our paid product")
- When features are introduced as free, they will stay free forever. Sometimes this means the introduction of features in the free product needs to be postponed to see the complete impact.
- According to Scott, this is shown in the release of Biml Express: they could've easily dropped some features and move it into the paid versions of Biml, but they didn't. Only added new features.
- The "free while in beta" announcement on Biml Online is mainly a lawyer thing - you can expect BimlOnline to remain free too.
- If a good SaaS-model is developed, the tools will all be free. But we're not there yet...
I had a great time today. Met a lot of peers I usually only see online (which is nice), had some great discussions about test automation (more to come, stay tuned) and did some new insights. Here's what stood out for me:
- Scott stated repeatedly he never wants features to be added just to fill checkboxes. When you hear that phrase in isolation, it sounds like a sales soundbite, but I think Varigence lives up to this:
- No lineage visualization / insight, unless it can change the way ETL is performed today
- No default framework (but a framework engine) until the experience (after two years of already using parts of it) was gathered and the product matured
- No SSRS things yet: "We could maybe alleviate 20% of the real pain, while 80% of the work remains in place. Well, in that case we'd better allocate our development resources to developing features that have more impact"
- Scott has a clear vision on the future of ETL, which he's quite willing to share.
- Plans are out in the open. Barely any secrets: read all docs and re-build the framework from scratch, if you want (but it's way more cost-effective to just buy Biml / BimlFlex)
- For an international company, the amount of attention given to Data Vault surprised me. In the past, Data Vault remained relatively small modeling technique (international companies I've worked with always wondered why the Dutch guys were giving it so much attention). Maybe the times are a'changing?
If you have the opportunity and are interested in the future of ETL, SSIS and BIML, attend training or sessions from Scott. And ask him questions. He'll answer your question - as well as two questions that you didn't ask, but probably should have asked 😉 . This gives some insight in the rationale behind the BIML development, and the vision of ETL in the future.