Difference between revisions of "Consensus clustering for categorizing orthogonal vehicle operations"
m |
|||
Line 23: | Line 23: | ||
- Using different combinations of feature may reveal different structures of the data. | - Using different combinations of feature may reveal different structures of the data. | ||
- Using different similarity/distance measures. | - Using different similarity/distance measures. | ||
+ | - Various data sources (different sources of the same data). | ||
- Varying the hyperparameters of the clustering algorithm. | - Varying the hyperparameters of the clustering algorithm. | ||
- Combining various clustering algorithms, etc. | - Combining various clustering algorithms, etc. |
Revision as of 15:24, 27 September 2017
Title | Consensus clustering for categorizing orthogonal vehicle operations |
---|---|
Summary | Discovering multiple clustering solutions, compare them, and find out if there is a single best (consensus) clustering, or multiple consistent clustering solutions. |
Keywords | Data Mining, Machine Learning, Clustering, Unsupervised learning, knowledge discovery |
TimeFrame | |
References | - Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4.
- Muller, E., Gunnemann, S., Farber, I., & Seidl, T. (2012). Discovering multiple clustering solutions: Grouping objects in different views of the data. In IEEE 28th International Conference on Data Engineering (ICDE), (pp. 1207-1210). - Cui, Y., Fern, X. Z., & Dy, J. G. (2007). Non-redundant multi-view clustering via orthogonalization. In IEEE International Conference on Data Mining (ICDM), (pp. 133-142). - Strehl, A., & Ghosh, J. (2002). Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, pp. 583-617. |
Prerequisites | Data mining course. |
Author | |
Supervisor | Mohamed-Rafik Bouguelia, Sławomir Nowaczyk |
Level | Master |
Status | Open |
With the rapid development and growth of interconnected devices, more physical systems are integrated with computer-based systems, e.g. sensors and actuators can be sensed and controlled remotely across the network. It is enticing to analyze on-board sensor data streaming from devices (e.g. vehicle speed, engine torque etc.), in order to discover interesting patterns and knowledge. We have collected such data from Volvo buses in normal operation. It is interesting to analyze this data from the usage point of view, in order to discover and categorize various vehicle operations in an unsupervised way (using clustering).
Clustering is the task of grouping data in such a way that objects in the same group (i.e. cluster) are more similar to each other than to those in other groups (i.e. other clusters). Typical clustering algorithms output a single clustering (i.e. grouping) of the data. However, in real world applications (such as vehicle operation analysis), data can be interpreted in many different ways, leading to different groupings that are reasonable and interesting from different perspectives.
The goal of the thesis is to propose a method that allows to discover multiple clustering solutions, compare them, and find out if there is a single best (consensus) clustering, or multiple consistent clustering solutions. In the latter case, each data object would be grouped in multiple clusters, representing different perspectives on the data. (i.e. orthogonal clusters).
Clustering solutions that differ in a significant but consistent way can be obtained by constructing different views of the data, for example: - Using different combinations of feature may reveal different structures of the data. - Using different similarity/distance measures. - Various data sources (different sources of the same data). - Varying the hyperparameters of the clustering algorithm. - Combining various clustering algorithms, etc.
While the main application focuses on grouping vehicle operations, the proposed method could be general and applicable for any data with such orthogonal clusters.
More details to come ....
Some initial references:
- Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4.
- Muller, E., Gunnemann, S., Farber, I., & Seidl, T. (2012). Discovering multiple clustering solutions: Grouping objects in different views of the data. In IEEE 28th International Conference on Data Engineering (ICDE), (pp. 1207-1210).
- Cui, Y., Fern, X. Z., & Dy, J. G. (2007). Non-redundant multi-view clustering via orthogonalization. In IEEE International Conference on Data Mining (ICDM), (pp. 133-142).
- Strehl, A., & Ghosh, J. (2002). Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, pp. 583-617.
Contact:
- Mohamed-Rafik Bouguelia ( mohbou@hh.se )
- Slawomir Nowaczyk ( slawomir.nowaczyk@hh.se )