Citation: Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 15, pages 245–248 (2018)
Abstract. The selection of relevant features in large databases is one of the most important and challenging problems in data mining. Samples forming a given database are generally described by a predefined set of features, and the situation where not all such features can be used for classification purposes needs very often to be faced in real applications. This situation is very typical when the database is related to a phenomenon whose characteristics are not well known. In this context, the extraction of relevant features can therefore also provide additional information on the studied phenomena. We tackle the feature selection problem from an optimization point of view, by reducing it to the problem of finding a maximal consistent ``clustering'' grouping together the samples and the features of the database. In this work, we extend this approach to dynamical databases, where features are not represented by only one real value, but they are rather given as sequences of a predefined number of real values. Our main contribution consists in proposing an alternative representation of the database so that it fits with a tridimensional matrix with no missing entries, from which a consistent triclustering can be obtained. We present some preliminary computational experiments, where we apply the extended approach to human motions.
- S. Busygin, O.A. Prokopyev, P.M. Pardalos, Feature Selection for Consistent Biclustering via Fractional 0–1 Programming, Journal of Combinatorial Optimization 10, 7-21, 2005.
- S.A. Etemad, A. Arya, Correlation-Optimized Time Warping for Motion, The Visual Computer: International Journal of Computer Graphics 31(12), 1569–1586, 2015.
- P. Hansen and N. Mladenovic, Variable Neighborhood Search: Principles and Applications, European Journal of Operational Research 130(3), 449-467, 2001.
- L.-L. Hsiao, F. Dangond, T. Yoshida, R. Hong, R.V. Jensen, J. Misra, W. Dillon, K.F. Lee, K.E. Clark, P. Haverty, Z. Weng, G.L. Mutter, M.P. Frosch, M.E. MacDonald, E.L. Milford, C.P. Crum, R. Bueno, R.E. Pratt, M. Mahadevappa, J.A. Warrington, Gr. Stephanopoulos, Ge. Stephanopoulos, S.R. Gullans, A Compendium of Gene Expression in Normal Human Tissues, Physiological Genomics 7, 97-104, 2001.
- O.E. Kundakcioglu, P.M. Pardalos, The Complexity of Feature Selection for Consistent Biclustering. In: Clustering Challenges in Biological Networks, S. Butenko, P.M. Pardalos, W.A. Chaovalitwongse (Eds.), World Scientific Publishing, 257–266, 2009.
- A. Mucherino, D.S. Gonçalves, A. Bernardin, L. Hoyet, F. Multon, A Distance-Based Approach for Human Posture Simulations, IEEE Conference Proceedings, Federated Conference on Computer Science and Information Systems (FedCSIS17), Workshop on Computational Optimization (WCO17), Prague, Czech Republic, 441–444, 2017.
- A. Mucherino, L. Liberti, A VNS-based Heuristic for Feature Selection in Data Mining. In: “Hybrid Meta-Heuristics”, Studies in Computational Intelligence 434, E-G. Talbi (Ed.), 353–368, 2013.
- A. Mucherino, A. Urtubia, Consistent Biclustering and Applications to Agriculture, IbaI Conference Proceedings, Proceedings of the Industrial Conference on Data Mining (ICDM10), Workshop on Data Mining in Agriculture (DMA10), Berlin, Germany, 105–113, 2010.
- A. Mucherino, P. Papajorgji, P.M. Pardalos, Data Mining in Agriculture, 274 pages, Springer, 2009.
- A. Mucherino, P.J. Papajorgji, P.M. Pardalos, A Survey of Data Mining Techniques Applied to Agriculture, Operational Research: An International Journal 9(2), 121–140, 2009.
- G. Piatetsky-Shapiro, Advances in Knowledge Discovery and Data Mining. Usama M. Fayyad, Padhraic Smyth, Ramasamy Uthurusamy (Eds.), vol. 21. Menlo Park: AAAI press, 1996.
- X. Xi, E. Keogh, Ch. Shelton, L. Wei, C.A. Ratanamahatana, Fast Time Series Classification Using Numerosity Reduction, Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 8 pages, 2006.