Artikelen

Azure ML Thursday

Azure / Azure ML

Azure ML Thursday

Past two weeks, I've explored some applications of Machine Learning by doing a Cortana Intelligence competition (https://gallery.cortanaintelligence.com/competitions). This is a Kaggle-like challenge, but restricted to the Azure ML environment, which creates some challenges of its own. Because the deadline of the challenge is on October 10th, I cannot post all my experiences (yet). So I've written a few blogposts about my experiences and scheduled them to appear online in the weeks towards the deadline. The goal: next 10 weeks, every Thursday one post about Azure ML. Welcome to Azure ML Thursday!

Starting out with Azure ML

Before I started, I was already quite comfortable programming Python and did some R programming in the past. This turned out pretty handy, though not really needed to start off with - because starting with Azure ML, the data flow can be created much like BI specialists are used to in SSIS.

A good place to start for me was the Tutorial competition (Iris Petal Competition). It provides you with a pre-filled workspace with everything in place to train and test your first ML model:

aml1

Ben Taylor recently posted an evaluation of how everything visualized in Azure ML in 8 blocks could be done in 7 lines of Python code or less. I really enjoyed his writing and subscribe to his point: the "Real" Data Scientist would never use Azure ML as his primary workspace. On the other hand, the blocks & lines view isn't that bad for starters like me: the 'blocks ' are a widespread visualization means for ETL data flows, and it's immediately visible what data goes where. Even administrators who don't know anything about ML or Python can inspect what goes wrong where - on each block you can view the complete output:

aml5

Although the blocks & lines view isn't bad for starters, I also noticed that as soon as you're gaining more experience in building ML models you'll quickly move over to Python, R etc. on your desktop. From that point on, the data flow view is of very little use.

Here's a screenshot of one of my more recent submissions in the competition[ref]the 'zip' in this image is a bundle of pickled Python objects. I'll discuss pickling in a later post.[/ref]

aml7

Deploying as a Web Service

As soon as you've trained and tested your Machine Learning model, it's good to think about how you're going to deliver the results to your customers. Microsoft has built a pretty easy-to-use web service framework. You just add the input and output blocks to your solution - in the screenshot below they're the blue blocks - and hit "deploy". That's it - really. There's no hidden configuration, no need to set up another server - this is real Machine Learning as a Service.

aml2

After the deployment succeeded, you can call the web service like any web service. As shown here, Microsoft provides several ways to test the web service: direct data entry, but also some Excel-sheets to query the web service in one of two ways:

  • Request/response (the data flow is run once for every request, perfect for solutions where only a few rows need to be predicted)
  • Batch execution (data is stored in Azure, then executed in batch. In this case, the data flow is run only once)

aml3

When you're on Excel 2013 or newer, an Excel Office App is available to help you query your web services. This provides an easy starting point of sharing the ML solution with business users:

aml4

Conclusion

With this post, I've just provided a high-level overview of Azure ML studio. Upcoming posts will dive deeper into choosing train / test sets, the effects of Data Quality, using provided Machine Learning models, Jupyter notebooks and using custom Machine Learning models in Python and R, and maybe even more.

What stands out for me here is the manageability of the Azure ML offering. In a few minutes, you're up and running, and administrators can investigate and get what's going on at data level without having to know how to code in Python / R or even know how ML works. The ease of use to sharing your ML solutions with minimal service overhead is another feat that stands out.