Deep stacked ensemble
Title | Deep stacked ensemble |
---|---|
Summary | This project aims at training multiple parallel deep networks in such a way to learn different representation of data which will be suitable to frame these networks in stacked ensemble framework. |
Keywords | Deep learning, Staked ensemble, Statistics |
TimeFrame | Fall 2019 |
References | 1- David H.Wolpert, "Stacked generalisation" https://doi.org/10.1016/S0893-6080(05)80023-1
2- Jason Brownle, "How to Develop a Stacking Ensemble for Deep Learning Neural Networks in Python With Keras", https://machinelearningmastery.com/stacking-ensemble-for-deep-learning-neural-networks/ |
Prerequisites | deep learning, data mining,
programming knowledge of one of deep learning frameworks such as tensorflow, pytorch or at least their APIs |
Author | |
Supervisor | Sławomir Nowaczyk, Peyman Mashhadi |
Level | Master |
Status | Open |
Stacking is a form of ensemble learning that combines multiple models through a meta model. In its basic form it is made up of two layers: base layer and meta layer. The base layer models is trained on the original features of dataset, while the meta model consumes predictions of the base models to generate the final predictions. Stacking has won many prestigious machine learning competitions.
One important fact is that the meta model performs well when the base models have acceptable performances and at the same time have low correlations to each other. Currently, there is no automatic approach for selection the base models' structures. It is basically done based on trial and error and based on prior experiences and knowledge.
The aim of this project is to provide an integrated automatic stacking model in a deep learning fashion. This integrated stacked deep net is comprised of multiple parallel deep nets (with the exact same structure) which are followed by another network. The first part (parallel part) plays the role of base model, and the rest of the structure after parallel part plays the role of meta model. The idea is to train this structure in a way that each parallel network learn different representations of the data at the level of parallel part so that the meta model can take advantage of their low correlated predictions.