OFS Technique with DE Algorithm based on Correlation and Clustering method along with its Application
P R Vhansure, A A Phatak, A S Shimpi
DOI: http://dx.doi.org/10.15439/2017R8
Citation: Proceedings of the Second International Conference on Research in Intelligent and Computing in Engineering, Vijender Kumar Solanki, Vijay Bhasker Semwal, Rubén González Crespo, Vishwanath Bijalwan (eds). ACSIS, Vol. 10, pages 359–364 (2017)
Abstract. Feature Selection is one of the important techniques in the Data mining. For the purpose of reducing the computational cost and reduction of noises to improve the accuracy of classification, the feature selection is very important technique for large-scale dataset. The result of feature selection has restricted to only batch learning. Different from batch learning technique online learning has selected by a motivational scalable, well-organized machine learning algorithm which has been used for large-scale dataset. In many defined techniques are not always conveniently helpful for the large-scale dataset. The real-world applications has huge amount of data which are having very long capacity or it costly to bring the entire set of attributes. Focusing on this loophole the concept of Online Feature Selection (OFS) is established. For every occurrence the online learning technique should be retrieve complete features/ attributes from large scale dataset volume. In OFS technique it is hard to online learner to keep a classifier that consist minimum and exact number of features. The OFS technique has primary defiance that, how to make accurate prediction from a large-scale dataset of iterations by using a fixed and small number of actively working features. In this article two different ways of OFS techniques are used its main work is to acquire minimum number of features. In first task a learner has allowed with the access of all the features to elect the subset of active features, and in the second task, a learner has allowed with access of only limited number of features for every iteration. We have used Differential Evolutionary (DE) algorithm in this study. By using new techniques such as Multiclass classification, DE algorithm, Correlation and clustering method the system is implemented to solve many real-world applications, problem and give their imperial performance analysis of the large-scale dataset.
References
- “Online Feature Selection and Its Applications” Jialei Wang, Peilin Zhao, Steven C.H. Hoi, Member, IEEE, and Rong Jin
- M. Dash and H. Liu, “Feature Selection for Classification,” Intelligent Data Analysis, vol. 1, nos. 1-4, pp. 131-156,1997.
- S. C. H. Hoi, J. Wang, P. Zhao, and R. Jin, “Online Feature Selection for Mining Big Data,” Proc. First Int’l Workshop Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications BigMine ’12 , pp. 93-100, 2012.
- S. C. H. Hoi, J. Wang, and P. Zhao, “LIBOL: A Library for Online Learning Algorithms,” Nanyang Technological Univ., 2012.
- M. Dredze, K. Crammer, and F. Pereira, “Confidence-Weighted Linear Classification,” Proc. 25th Int’l Conf. Machine Learning ICML ’08 , pp. 264-271, 2008.
- Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
- R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2,pp.273-324,1997.
- Y. Saeys, I. Inza, and P. Larran ̃ aga, “A Review of Feature Selection Techniques in Bioinformatics,” Bioinformatics, vol. 23, no. 19,pp. 2507-2517, 2007.
- “The Use of Various Data Mining and Feature Selection Methods in the Analysis of a Population Survey Dataset” Ellen Pitt and RichiNayak, Faculty of Information Technology, Queensland University of Technology George Street, Brisbane, 4001, Queensland.
- N. Cesa-Bianchi, S. Shalev-Shwartz, and O. Shamir, “Efficient Learning with Partially Observed Attributes,” J. Machine Learning Research, vol. 12, pp. 2857-2878, 2011.
- J. Duchi and Y. Singer, “Efficient Online and Batch Learning Using Forward Backward Splitting,” J. Machine Learning Research, vol. 10, pp. 2899-2934, 2009.
- J. Ren, Z. Qiu, W. Fan, H. Cheng, and P.S. Yu, “Forward Semi- Supervised Feature Selection,” Proc. 12th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD ’08 , pp. 970-976, 2008.
- Kumar, S.; Toshniwal, D. Analysis of road accident counts using hierarchical clustering and cophenetic correlation coefficient (CPCC). Journal of Big Data,3, 13:1-11.
- A. Rostamizadeh, A. Agarwal, and P.L. Bartlett, “Learning withMissing Features,” Proc. Conf. Uncertainty in Artificial Intelligence(UAI ’11 , pp. 635-642, 2011.