Consensus clustering for categorizing orthogonal vehicle operations

From ISLAB/CAISR
Title Consensus clustering for categorizing orthogonal vehicle operations
Summary Discovering multiple clustering solutions, compare them, and find out if there is a single best (consensus) clustering, or multiple consistent clustering solutions.
Keywords Data Mining, Machine Learning, Clustering, Unsupervised learning, knowledge discovery
TimeFrame
References - Some slides: https://www.siam.org/meetings/sdm11/clustering.pdf

- Muller, E., Gunnemann, S., Farber, I., & Seidl, T. (2012, April). Discovering multiple clustering solutions: Grouping objects in different views of the data. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on (pp. 1207-1210). IEEE.

- Hu, J., & Pei, J. (2017). Subspace multi-clustering: a review. Knowledge and Information Systems, 1-28.

- Yang, S., & Zhang, L. (2017). Non-redundant multiple clustering by nonnegative matrix factorization. Machine Learning, 106(5), 695-712.

- Dang, X. H., & Bailey, J. (2015). A framework to uncover multiple alternative clusterings. Machine Learning, 98(1-2), 7-30.

- Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4.

- Qi, Z., & Davidson, I. (2009, June). A principled and flexible framework for finding alternative clusterings. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 717-726). ACM.

- Muller, E., Gunnemann, S., Farber, I., & Seidl, T. (2012). Discovering multiple clustering solutions: Grouping objects in different views of the data. In IEEE 28th International Conference on Data Engineering (ICDE), (pp. 1207-1210).

- Cui, Y., Fern, X. Z., & Dy, J. G. (2007). Non-redundant multi-view clustering via orthogonalization. In IEEE International Conference on Data Mining (ICDM), (pp. 133-142).

- Strehl, A., & Ghosh, J. (2002). Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, pp. 583-617.

Prerequisites Data mining course.
Author Dirar Sweidan
Supervisor Mohamed-Rafik Bouguelia, Sławomir Nowaczyk
Level Master
Status Finished


With the rapid development and growth of interconnected devices, more physical systems are integrated with computer-based systems, e.g. sensors and actuators can be sensed and controlled remotely across the network. It is enticing to analyze on-board sensor data streaming from devices (e.g. vehicle speed, engine torque etc.), in order to discover interesting patterns and knowledge. We have collected such data from Volvo buses in normal operation. It is interesting to analyze this data from the usage point of view, in order to discover and categorize various vehicle operations in an unsupervised way (using clustering).

Clustering is the task of grouping data in such a way that objects in the same group (i.e. cluster) are more similar to each other than to those in other groups (i.e. other clusters). Typical clustering algorithms output a single clustering (i.e. grouping) of the data. However, in real world applications (such as vehicle operation analysis), data can be interpreted in many different ways, leading to different groupings that are reasonable and interesting from different perspectives.

The goal of the thesis is to propose a method that allows to discover multiple clustering solutions, compare them, and find out if there is a single best (consensus) clustering, or multiple consistent clustering solutions. In the latter case, each data object would be grouped in multiple clusters, representing different perspectives on the data. (i.e. orthogonal, or independent clusterings).

Clustering solutions that differ in a significant but consistent way can be obtained by constructing different views of the data, for example: - Using different combinations of feature may reveal different structures of the data. - Using different similarity/distance measures. - Various data sources (different sources of the same data). - Varying the hyperparameters of the clustering algorithm. - Combining various clustering algorithms, etc.

While the main application focuses on grouping vehicle operations, the proposed method could be general and applicable for any data with such orthogonal clusters.

References: check the of references above.

Contact:

- Mohamed-Rafik Bouguelia ( mohbou@hh.se )

- Slawomir Nowaczyk ( slawomir.nowaczyk@hh.se )