Introduction to Machine Learning using tidymodels.




These two packages will be mainly used.


Data Sampling

Split your dataset into training and testing.

# initial_split from rsample package which is part of tidymodels
iris_split <- initial_split(iris, prop = 0.7)

# extract training and testing sets
iris_train <- training(iris_split)
iris_test <- testing(iris_split)


Prepare your recipe

  1. recipe(): specify the formula
  2. step_xyz(): specify the pre-processing steps
iris_recipe <-
  #define your formula
  recipe(Species ~., data = iris_full) %>%
  # if you are planning to normalize your numerical values
  step_normalize(all_numeric()) %>%
  # if you are planning to knn-ly fill the missing values for categorical type

If you want to extract the pre-processed dataset itself, you can first prep() the recipe for a specific dataset and juice() the prepped recipe to extract the pre-processed data

iris_train_preprocessed <- iris_recipe %>%
  # apply the recipe to the training data
  prep(iris_train) %>%
  # extract the pre-processed training dataset

Model Training

Decide on what model you are planning to use.

knn_model <-
  # specify that the model is a k-Nearest Neigbhour (kNN)
  nearest_neighbor() %>%
  # select the package that the model coming from
  set_engine("kknn") %>%
  # choose mode


Now you can put the model and recipes together into a workflow

# set the workflow
knn_workflow <- workflow() %>%
  # add the recipe
  add_recipe(iris_recipe) %>%
  # add the model

You can now fit your model to the dataset.

knn_fit <- knn_workflow %>%
  # fit the final best model to the training set and evaluate the test set

Model Evaluation

Model evaluation is very important part in machine learning.


Here you can check on how good is your model performing

# Obtain and format results produced by tuning functions
knn_predictions <- knn_fit %>%

knn_performance <- knn_fit %>%

To generate the confusion matrix:

knn_predictions %>%
  conf_mat(truth = Species, estimate = .pred_class)

Use your final model

If you want to use your model to predict new data, you need to use the fit() function on your workflow and the dataset that you want to fit the final model. Remember: training dataset + testing dataset.

final_knnmodel <- fit(knn_workflow, iris)

For example lets create a new data:

new_iris <- tribble(~Sepal.Length, ~Sepal.Width, ~Petal.Length, ~Petal.Width, 5.1, 3, 2.7, 4)

If you wanted to predict the species of iris for new_iris, use predict() function.

predict(final_knnmodel, new_data = new_iris)

Video Tutorial

The full tutorial available in my YouTube channel. Enjoy and don’t forget to subscribe!