Basic tuning

So you've built a model, now what? Can you call it a day? Chances are, you'll have some optimization to do on your model. A key part of the machine learning process is the optimization of our algorithms and methods. In this section, we'll be covering the basic concepts of optimization, and will be continuing our learning of tuning methods throughout the following chapters.

Sometimes, when our models do not perform well with new data it can be related to them overfitting or underfitting. Let's cover some methods that we can use to prevent this from happening. First off, let's look at the random forest classifier that we trained earlier. In your notebook, call the predict method on it and pass the x_test data in to receive some predictions:

predicted = rf_classifier.predict(x_test)

From this, we can create evaluate the performance of our classifier through something known as a confusion matrix, which maps out misclassifications for us. Pandas makes this easy for us with the crosstab command:

pd.crosstab(y_test, predicted, rownames=['Actual'], colnames=['Predicted'])

You should see the output as follows:

As you can see, our model performed fairly well on this dataset (it is a simple one after all!). What happens, however, if our model didn't perform well? Let's take a look at what could happen.