Deep stacked ensemble

From ISLAB/CAISR
Title Deep stacked ensemble
Summary This project aims at training multiple parallel deep networks in such a way to learn different representation of data which will be suitable to frame these networks in stacked ensemble framework.
Keywords Deep learning, Staked ensemble, Statistics
TimeFrame Fall 2022
References 1- David H.Wolpert, "Stacked generalisation" https://doi.org/10.1016/S0893-6080(05)80023-1

2- Jason Brownle, "How to Develop a Stacking Ensemble for Deep Learning Neural Networks in Python With Keras", https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/

3 - PS Mashhadi, S Nowaczyk, S Pashami. "Parallel orthogonal deep neural network" Neural Networks 140, 167-183

Prerequisites deep learning, data mining,

programming knowledge of one of deep learning frameworks such as tensorflow, pytorch or at least their APIs

Author
Supervisor Sławomir Nowaczyk, Peyman Mashhadi
Level Master
Status Open


Stacking is a form of ensemble learning that combines multiple models through a meta model. In its basic form it is made up of two layers: base layer and meta layer. The base layer models is trained on the original features of dataset, while the meta model consumes predictions of the base models to generate the final predictions. Stacking has won many prestigious machine learning competitions.

One important fact is that the meta model performs well when the base models have acceptable performances and at the same time have low correlations to each other. Currently, there is no automatic approach for selection the base models' structures. It is basically done based on trial and error and based on prior experiences and knowledge.

The aim of this project is to provide an integrated automatic stacking model in a deep learning fashion. This integrated stacked deep net is comprised of multiple parallel deep nets (with the exact same structure) which are followed by another network. The first part (parallel part) plays the role of base model, and the rest of the structure after parallel part plays the role of meta model. The idea is to train this structure in a way that each parallel network learn different representations of the data at the level of parallel part so that the meta model can take advantage of their low correlated predictions.