Azure ML Thursday 3: Tuning hyperparameters
1 september 2016 2016-09-01 11:30Azure ML Thursday 3: Tuning hyperparameters
On this third Azure ML Thursday we'll continue our series testing different models and tuning hyperparameters. Before playing with new algorithms or tuning parameters, be sure you know how to train and test your data!
Machine Learning Models
In Azure ML studio, the (very high-level) workflow is
- throw a Machine Learning model onto the canvas
- train it
- verify if the results are robust (for example via cross-validation)
- deploy as a web service
... And your ML application is in place[ref]Of course, that's not the entire story. There's always some changes in data structure, handling of NAs, but that's beyond today's scope[/ref]!
The success of your model depends mainly on its accuracy and robustness to predict new cases. Increasing the model's accuracy can be done in a few ways. You can:
- Change the data (preprocessing, cleansing, filling NAs)
- Change the model (exchange the Multiclass Logistic Regression for a Multiclass Decision Forest[ref]For an overview of the available models, see Microsoft's "Machine Learning Algorithm Cheat Sheet".[/ref])
- Change the way of training (adjust train / test set size, do cross-validations)
- Tune the model's properties (how fast the model should draw conclusions, error margins)
Today, we'll focus on tuning the model's properties. We won't discuss the details of all properties (you can easily look that up in the docs), instead we'll look at how to test for different parameter combinations insize Azure ML Studio.
As soon as you click on an untrained model inside your experiment, you'll be presented with some parameters - or, in ML parlance, hyperparameters - you can tweak.

There are basically two ways to tweak these parameters: either you fill them in by hand, or you do a so-called "parameter sweep".
Tuning parameters by hand
Tuning parameters by hand might be tedious, but gives you some direct feedback ideas about how the algorithms work. For example, we could try to adjust the memory size for L-BFGS[ref]if you're curious what L-BFGS stands for - and you should be - see the Azure ML documentation for Multiclass Logistic Regression[/ref] on the default Iris Flower competition workflow:
Memory size for L-BFGS | Overall accuracy (test) | Average accuracy (test) |
20 | 0.916667 | 0.944444 |
200 | 0.916667 | 0.944444 |
2 | 0.916667 | 0.944444 |
.. as you see, memory size for L-BFGS doesn't matter much in this example. Too bad.
Doing a parameter sweep
The other way of tuning parameters is doing a parameter sweep. This simply means you won't set "hard" values for the ML model, but provide ranges by selecting "parameter range" as trainer mode in the properties pane:
Having selected "parameter range" as trainer mode, you set ranges for the parameters, and connect the model to Tune Model Hyperparameters:
After you've connected Tune Model Hyperparameters correctly, it still bears a red sign. The reason is because it doesn't know what to predict yet (the fact we're predicting is often called the "label"). Therefor, you need to select Tune Model Hyperparameters and select a label. Select the 'class' column here:
After that, you select the parameter sweeping mode:
- Random sweep: Tries x random guesses out of the possible parameter values you provided along with the model.
- Entire grid: Calculates all possibilities. Perfect for testing a limited amount of parameter sets. All parameter combinations are covered (can take a lot of time!)
- Random grid: Creates a grid of all possibilities, then samples a limited amount of random tries out of that grid. Great to get insight in how combinations of parameters perform.
The Entire Grid sweep sounds quite thorough, but really takes a lot of time, and research shows it doesn't always lead to better models. By default, I'd choose the Random Grid for a parameter sweep.
Down below you can select metrics for measuring the performance. Assert that the metric you select fits the problem and model you're testing here! In our example, we'll use the Accuracy metric.
After running the experiment, the left port of Tune Model Hyperparameters contains the results of the tuning run. Having chosen for a random grid, the results may differ from run to run, but here is one possible output:
OptimizationTolerance | L1Weight | L2Weight | MemorySize | Accuracy |
0 | 0.1 | 0.01 | 20 | 0.944444 |
0.00001 | 0.1 | 0.01 | 5 | 0.944444 |
0 | 1 | 0.01 | 50 | 0.925926 |
0 | 0.01 | 1 | 20 | 0.888889 |
0 | 0.1 | 0.1 | 50 | 0.888889 |
0 | 0 | 0.1 | 50 | 0.888889 |
0 | 0 | 1 | 5 | 0.888889 |
0.00001 | 1 | 1 | 50 | 0.888889 |
0.00001 | 0.1 | 1 | 50 | 0.888889 |
0 | 1 | 1 | 20 | 0.87038 |
Notice that I've tuned the parameters only with the training set data here! This gives me the ability to test the best tuned model (which is outputted at the right side of Tune Model Hyperparameters) using the test data, resulting in the following outcome:
More precision and reliability
If you've read last week's post closely, a question should rise by now: doesn't this way of parameter sweeping have a huge risk of overfitting? After all, we're feeding the model training data, then tune it until score is maximized, and assuming that is the best model! Indeed, overfitting is a real danger here. We could work around it in two ways:
- Use the test set as leading measurement instead of the training set
- Cross-validate the results
Testset as leading measurement for parameter sweep
To use the testset as the leading measurement, you simply connect your testset to the right input of Tune Model Hyperparameters. Then, this is used as the leading indicator of success for the model.
Cross-validate parameter sweep
To cross-validate the parameter sweep, first divide the dataset into folds using the Partition and Sample block, selecting Assign to Folds as the operation. This element comes before Tune Model Hyperparameters:
Notice that you don't need to split the data before creating folds, as the cross-validation already creates test- and trainingsets[ref]You could still choose to do so, if you want to compare how a CV-trained model stacks up to a non-CV trained model for example.[/ref]
In the properties of the Partition and Sample element I choose "assign to folds", set the number of folds and indicate whether it should be a stratified[ref]See the previous post if you're wondering what that is.[/ref] split:
Even more precision
To achieve an even higher level of precision, we could try other ML models. Doing so is extremely easy: you just remove the Multiclass Logistic Regression, then drop in another model at the same place. If you want to learn more about which models to use, take a look at Brandon Rohrer's post "How to choose algorithms for Microsoft Azure Machine Learning", which covers exactly that.
Wrapping up - achieving 100%?
In this post we've explored how to tune your Machine Learning model inside Azure ML studio. Some datasets are easier to predict than others - the Iris Flower dataset used here has a very high predictability. Not only is the starter experiment's default prediction rate of 93.3% already extremely high for many real-world situations, but you can drive it up even further - by tuning the model's hyperparameters, but also by training the model with a higher portion of the data than the default 60%. 100% scores are possible using ML, but don't be fooled: the Iris Flower dataset is an open dataset which can be downloaded and used to fit the model for exactly this use case. I'm not saying that the 100% scores you see have been reached by cheating, but you could definitely do so!
Comments (3)
Yeefang Xiao
Hi Koos
Nice post and thank you for sharing your experiments and knowledge!
I setup Cross-validate parameter sweep using neural network regression. As shown in Azure tutorial "How to perform cross-validation with a parameter sweep".... I " Add the Cross-Validate Model module. Connect the output of Partition and Sample to the Dataset input, and connect the output of Tune Model Hyperparameters to the Untrained model input." What puzzles me is that the "mean" evaluation results of cross validate model module does not match the "sweep results" output from "Tune Model Hyperparameters". In my case, the former is much worse than the later. Could you provide some insights toward the issue? Perhaps my understanding of how the sweep results are calculated is wrong. I would think they are the mean absolute error, the mean coefficient of determination,...etc, calculated using k-fold cross validation defined in the "partition and sample" module. Thank you.
Jorge
Hi Koos
I am confused that how the random sweep managed to almost decrease the error every time? Is that a after-training sorting? Or there's a method used to determine whether the next combination of parameters will yield a better error?
Koos van Strien
Hi Jorge,
Sure, the table I display in the post is sorted - so you could easily see what parameter combinations performed best in this case :-).
Comments are closed.