Title: | Intuitive Package for Meta-Ensemble Learning (Classification, Regression) that is Fully-Automated |
---|---|
Description: | This package significantly lowers the barrier for the practitioners to apply heterogeneous ensemble learning techniques in an amateur fashion to their everyday predictive problems. |
Authors: | Ajay Arunachalam |
Maintainer: | Ajay Arunachalam <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-10-31 18:36:34 UTC |
Source: | https://github.com/ajayarunachalam/metaensembler |
This function uses the base learners, and the final top layer learner to produce an ensemble prediction. The user can input the number of base learners and specify the final learner along with the train-validation-test data partition split ratio. The predictions on the unseen new data is the resultant of the ensemble meta-learning of the heterogeneous learners aimed to reduce the generalization error in the predictive models. Functions from caret are used for training and prediction of the base learners and the final learner.
ensembler.classifier(data , outcomeVARIABLEINDEX , IndividualModels , TopLayerModel , dstr , dsv , dst , unseen_new_data )
ensembler.classifier(data , outcomeVARIABLEINDEX , IndividualModels , TopLayerModel , dstr , dsv , dst , unseen_new_data )
data |
Data to be used for training, validation, and test. |
outcomeVARIABLEINDEX |
Index of the response/outcome variable. |
IndividualModels |
Training of base learners. |
TopLayerModel |
Top layer final learner. |
dstr |
Training data split ratio. |
dsv |
Validation data split ratio. |
dst |
Testing data split ratio. |
unseen_new_data |
Prediction on unseen new data. |
data |
Complete data or the indexed data. |
outcomeVARIABLEINDEX |
Integer value of the response/outcome variable index. |
IndividualModels |
A vector of base learners or standalone individual learner. |
TopLayerModel |
Name of the Top layer learner. |
dstr |
fraction integer. Not greater than 1. Total sum of dstr, dsv, dst should be equal to 1. |
dsv |
fraction integer. Not greater than 1. Total sum of dstr, dsv, dst should be equal to 1. |
dst |
fraction integer. Not greater than 1. Total sum of dstr, dsv, dst should be equal to 1. |
unseen_new_data |
Unseen new data provided either as a csv file or imported dataframe. |
The function ensembler.classifier
), returns a list with the following elements:
ensembler_return[1]Test data predictions.
ensembler_return[2]Prediction labels.
ensembler_return[3]Model Result.
ensembler_return[4]Unseen data predictions.
Ajay Arunachalam
library("metaEnsembleR") attach(iris) data("iris") unseen_new_data_testing <- iris[130:150,] write.csv(unseen_new_data_testing , 'unseen_check.csv' , fileEncoding = 'UTF-8' , row.names = FALSE) ensembler_return <- ensembler.classifier(iris[1:130,] , 5 , c('treebag','rpart') , 'gbm' , 0.60 , 0.20 , 0.20 , unseen_new_data_testing) # or ensembler_return <- ensembler.classifier(iris[1:130,] , 5 , c('treebag','rpart') , 'gbm' , 0.60 , 0.20 , 0.20 , read.csv('./unseen_check.csv')) testpreddata <- data.frame(ensembler_return[1]) table(testpreddata$actual_label) table(ensembler_return[2]) #### Performance comparison ##### modelresult <- ensembler_return[3] modelresult act_mybar <- qplot(testpreddata$actual_label, geom="bar") act_mybar pred_mybar <- qplot(testpreddata$predictions, geom='bar') pred_mybar act_tbl <- tableGrob(t(summary(testpreddata$actual_label))) pred_tbl <- tableGrob(t(summary(testpreddata$predictions))) ggsave("testdata_actual_vs_predicted_chart.pdf",grid.arrange(act_tbl, pred_tbl)) ggsave("testdata_actual_vs_predicted_plot.pdf",grid.arrange(act_mybar, pred_mybar)) #### unseen data ### unseenpreddata <- data.frame(ensembler_return[4]) table(unseenpreddata$unseenpreddata)
library("metaEnsembleR") attach(iris) data("iris") unseen_new_data_testing <- iris[130:150,] write.csv(unseen_new_data_testing , 'unseen_check.csv' , fileEncoding = 'UTF-8' , row.names = FALSE) ensembler_return <- ensembler.classifier(iris[1:130,] , 5 , c('treebag','rpart') , 'gbm' , 0.60 , 0.20 , 0.20 , unseen_new_data_testing) # or ensembler_return <- ensembler.classifier(iris[1:130,] , 5 , c('treebag','rpart') , 'gbm' , 0.60 , 0.20 , 0.20 , read.csv('./unseen_check.csv')) testpreddata <- data.frame(ensembler_return[1]) table(testpreddata$actual_label) table(ensembler_return[2]) #### Performance comparison ##### modelresult <- ensembler_return[3] modelresult act_mybar <- qplot(testpreddata$actual_label, geom="bar") act_mybar pred_mybar <- qplot(testpreddata$predictions, geom='bar') pred_mybar act_tbl <- tableGrob(t(summary(testpreddata$actual_label))) pred_tbl <- tableGrob(t(summary(testpreddata$predictions))) ggsave("testdata_actual_vs_predicted_chart.pdf",grid.arrange(act_tbl, pred_tbl)) ggsave("testdata_actual_vs_predicted_plot.pdf",grid.arrange(act_mybar, pred_mybar)) #### unseen data ### unseenpreddata <- data.frame(ensembler_return[4]) table(unseenpreddata$unseenpreddata)
This function uses the base learners, and the final top layer learner to produce an ensemble prediction. The user can input the number of base learners and specify the final learner along with the train-validation-test data partition split ratio. The predictions on the unseen new data is the resultant of the ensemble meta-learning of the heterogeneous learners aimed to reduce the generalization error in the predictive models. Functions from caret are used for training and prediction of the base learners and the final learner.
ensembler.regression(data , outcomeVARIABLEINDEX , IndividualModels , TopLayerModel , dstr , dsv , dst , unseen_new_data )
ensembler.regression(data , outcomeVARIABLEINDEX , IndividualModels , TopLayerModel , dstr , dsv , dst , unseen_new_data )
data |
Data to be used for training, validation, and test. |
outcomeVARIABLEINDEX |
Index of the response/outcome variable. |
IndividualModels |
Training of base learners. |
TopLayerModel |
Top layer final learner. |
dstr |
Training data split ratio. |
dsv |
Validation data split ratio. |
dst |
Testing data split ratio. |
unseen_new_data |
Prediction on unseen new data. |
data |
Complete data or the indexed data. |
outcomeVARIABLEINDEX |
Integer value of the response/outcome variable index. |
IndividualModels |
A vector of base learners or standalone individual learner. |
TopLayerModel |
Name of the Top layer learner. |
dstr |
fraction integer. Not greater than 1. Total sum of dstr, dsv, dst should be equal to 1. |
dsv |
fraction integer. Not greater than 1. Total sum of dstr, dsv, dst should be equal to 1. |
dst |
fraction integer. Not greater than 1. Total sum of dstr, dsv, dst should be equal to 1. |
unseen_new_data |
Unseen new data provided either as a csv file or imported dataframe. |
The function ensembler.regression
), returns a list with the following elements:
ensembler_return[1]Test data predictions.
ensembler_return[2]Prediction values.
ensembler_return[3]Model Result.
ensembler_return[4]Unseen data predictions.
Ajay Arunachalam
library("metaEnsembleR") data("rock") unseen_rock_data <- rock[30:48,] ensembler_return <- ensembler.regression(rock[1:30,] , 4 ,c('lm') , 'rf' , 0.40 , 0.30 , 0.30 , unseen_rock_data) testpreddata <- data.frame(ensembler_return[1]) ####Performance comparison##### modelresult <- ensembler_return[3] modelresult write.csv(modelresult[[1]], "performance_chart.csv") ####unseen data### unseenpreddata <- data.frame(ensembler_return[4])
library("metaEnsembleR") data("rock") unseen_rock_data <- rock[30:48,] ensembler_return <- ensembler.regression(rock[1:30,] , 4 ,c('lm') , 'rf' , 0.40 , 0.30 , 0.30 , unseen_rock_data) testpreddata <- data.frame(ensembler_return[1]) ####Performance comparison##### modelresult <- ensembler_return[3] modelresult write.csv(modelresult[[1]], "performance_chart.csv") ####unseen data### unseenpreddata <- data.frame(ensembler_return[4])