Introduction to Machine Learning using tidymodels.
These two packages will be mainly used.
Split your dataset into training and testing.
# initial_split from rsample package which is part of tidymodels iris_split <- initial_split(iris, prop = 0.7) # extract training and testing sets iris_train <- training(iris_split) iris_test <- testing(iris_split)
Prepare your recipe
- recipe(): specify the formula
- step_xyz(): specify the pre-processing steps
iris_recipe <- #define your formula recipe(Species ~., data = iris_full) %>% # if you are planning to normalize your numerical values step_normalize(all_numeric()) %>% # if you are planning to knn-ly fill the missing values for categorical type step_knnimpute(Species)
If you want to extract the pre-processed dataset itself, you can first prep() the recipe for a specific dataset and juice() the prepped recipe to extract the pre-processed data
iris_train_preprocessed <- iris_recipe %>% # apply the recipe to the training data prep(iris_train) %>% # extract the pre-processed training dataset juice()
Decide on what model you are planning to use.
knn_model <- # specify that the model is a k-Nearest Neigbhour (kNN) nearest_neighbor() %>% # select the package that the model coming from set_engine("kknn") %>% # choose mode set_mode("classification")
Now you can put the model and recipes together into a workflow
library(workflows) # set the workflow knn_workflow <- workflow() %>% # add the recipe add_recipe(iris_recipe) %>% # add the model add_model(knn_model)
You can now fit your model to the dataset.
knn_fit <- knn_workflow %>% # fit the final best model to the training set and evaluate the test set last_fit(iris_split)
Model evaluation is very important part in machine learning.
Here you can check on how good is your model performing
# Obtain and format results produced by tuning functions knn_predictions <- knn_fit %>% collect_predictions() knn_performance <- knn_fit %>% collect_metrics()
To generate the confusion matrix:
knn_predictions %>% conf_mat(truth = Species, estimate = .pred_class)
Use your final model
If you want to use your model to predict new data, you need to use the fit() function on your workflow and the dataset that you want to fit the final model. Remember: training dataset + testing dataset.
final_knnmodel <- fit(knn_workflow, iris)
For example lets create a new data:
new_iris <- tribble(~Sepal.Length, ~Sepal.Width, ~Petal.Length, ~Petal.Width, 5.1, 3, 2.7, 4)
If you wanted to predict the species of iris for new_iris, use predict() function.
predict(final_knnmodel, new_data = new_iris)
The full tutorial available in my YouTube channel. Enjoy and don’t forget to subscribe!