Logo PTI Logo FedCSIS

Proceedings of the 18th Conference on Computer Science and Intelligence Systems

Annals of Computer Science and Information Systems, Volume 35

Can Unlabelled Data Improve AI Applications? A Comparative Study on Self-Supervised Learning in Computer Vision.

,

DOI: http://dx.doi.org/10.15439/2023F8371

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 93101 ()

Full text

Abstract. Artificial Intelligence (AI) represents a highly investigated area of study at present and has already become an indispensable component within an extensive range of business models and applications. One major downside of current supervised AI approaches lies in the need of numerous annotated data points to train the models. Self-supervised learning (SSL) circumvents the need for annotation, by creating supervision signals such as labels from the data itself, rather than requiring experts for this task. Current approaches mainly include the use of generative methods such as autoencoders and joint embedding architectures to fulfil this task. Recent works present comparable results to supervised learning in downstream scenarios such as classification after SSL-pretraining. To achieve this, typically modifications are required to suit the approach for the exact downstream task. Yet, current review works haven't paid too much attention to the practical implications of using SSL. Thus, we investigated and implemented popular SSL approaches, suitable for downstream tasks such as classification, from an initial collection of more than 400 papers. We evaluate a selection of these approaches under real-world dataset conditions, and in direct comparison to the supervised learning scenario. We conclude that SSL has the potential to take up with supervised learning, if the right training methods are identified and applied. Furthermore, we also introduce future directions for SSL research, as well as current limitations in real-world applications.

References

  1. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2009.5206848
  2. M. Assran, R. Balestriero, Q. Duval, F. Bordes, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, and N. Ballas, “The hidden uniform cluster prior in self-supervised learning,” CoRR, vol. abs/2210.07277, 2022.
  3. R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Goldstein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, A. Schwarzschild, A. G. Wilson, J. Geiping, Q. Garrido, P. Fernandez, A. Bar, H. Pirsiavash, Y. LeCun, and M. Goldblum, “A cookbook of self-supervised learning,” CoRR, vol. abs/2304.12210, 2023.
  4. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, 2012. [Online]. Available: http://dx.doi.org/10.1109/MSP.2012.2211477
  5. M.-E. Nilsback and A. Zisserman, “A visual vocabulary for flower classification,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2006, pp. 1447–1454. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2006.42
  6. P. Omkar, M., V. Andrea, Z. Andrew, and J. C., V., “Cats and dogs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2012.6248092
  7. W. M. Bramer, G. B. D. Jonge, M. L. Rethlefsen, F. Mast, and J. Kleijnen, “A systematic approach to searching: an efficient and complete method to develop literature searches,” Journal of the Medical Library Association, vol. 106, no. 4, Oct. 2018. [Online]. Available: http://dx.doi.org/10.5195/jmla.2018.283
  8. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, dec 2010. [Online]. Available: http://dx.doi.org/10.5555/1756006.1953039
  9. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” CoRR, vol. abs/1312.6114, 2013.
  10. A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” CoRR, vol. abs/1511.05644, 2015.
  11. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” CoRR, vol. abs/2002.05709, 2020.
  12. J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” CoRR, vol. abs/2103.03230, 2021.
  13. M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” CoRR, vol. abs/1603.09246, 2016.
  14. S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” CoRR, vol. abs/1803.07728, 2018.
  15. Z. Wu, Y. Xiong, S. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance-level discrimination,” CoRR, vol. abs/1805.01978, 2018.
  16. I. Misra and L. van der Maaten, “Self-supervised learning of pretext-invariant representations,” CoRR, vol. abs/1912.01991, 2019.
  17. L. Ternes, M. Dane, S. Gross, M. Labrie, G. Mills, J. Gray, L. Heiser, and Y. H. Chang, “A multi-encoder variational autoencoder controls multiple transformational features in single-cell image analysis,” Communications Biology, vol. 5, no. 1, 2022. [Online]. Available: http://dx.doi.org/10.1038/s42003-022-03218-x
  18. W. Xiong, L. Zhang, B. Du, and D. Tao, “Combining local and global: Rich and robust feature pooling for visual recognition,” Pattern Recognition, vol. 62, 2017. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2016.08.006
  19. S. Zhang, M. Xu, J. Zhou, and S. Jia, “Unsupervised spatial-spectral cnn-based feature learning for hyperspectral image classification,” IEEE Transactions on Geoscience & Remote Sensing, 2022. [Online]. Available: http://dx.doi.org/10.1109/TGRS.2022.3153673
  20. C. Vununu, S.-H. Lee, and K.-R. Kwon, “A strictly unsupervised deep learning method for hep-2 cell image classification,” Sensors (14248220), vol. 20, no. 9, 2020. [Online]. Available: http://dx.doi.org/10.3390/s20092717
  21. V. Prasad, D. Das, and B. Bhowmick, “Variational clustering: Leveraging variational autoencoders for image clustering,” CoRR, vol. abs/2005.046132, 2020.
  22. J. Guérin, S. Thiery, E. Nyiri, O. Gibaru, and B. Boots, “Combining pretrained cnn feature extractors to enhance clustering of complex natural images,” Neurocomputing, vol. 423, 2021. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2020.10.068
  23. J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning of deep representations and image clusters,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2016.556
  24. X. Chen, C.-J. Hsieh, and B. Gong, “When vision transformers outperform resnets without pre-training or strong data augmentations,” CoRR, vol. abs/2106.01548, 2021.
  25. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
  26. H. Dong, L. Zhang, and B. Zou, “Exploring vision transformers for polarimetric sar image classification,” IEEE Transactions on Geoscience & Remote Sensing, 2022. [Online]. Available: http://dx.doi.org/10.1109/TGRS.2021.3137383
  27. X. Wang, J. Zhu, Z. Yan, Z. Zhang, Y. Zhang, Y. Chen, and H. Li, “Last: Label-free self-distillation contrastive learning with transformer architecture for remote sensing image scene classification,” IEEE Geoscience and Remote Sensing Letters, vol. 19, 2022. [Online]. Available: http://dx.doi.org/10.1109/LGRS.2022.3185088
  28. W. Zhou, Y. Hou, K. Ouyang, and S. Zhou, “Exploring complementary information of self–supervised pretext tasks for unsupervised video pre–training,” IET Computer Vision (Wiley-Blackwell), vol. 16, no. 3, 2022. [Online]. Available: http://dx.doi.org/10.1049/cvi2.12084
  29. J. Ding, E. Xie, H. Xu, C. Jiang, Z. Li, P. Luo, and G.-S. Xia, “Deeply unsupervised patch re-identification for pre-training object detectors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2022. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2022.3164911
  30. Y. Li, S. Kan, J. Yuan, W. Cao, and Z. He, “Spatial assembly networks for image representation learning,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 871–13 880. [Online]. Available: http://dx.doi.org/10.1109/CVPR46437.2021.01366
  31. L. Fan, S. Liu, P.-Y. Chen, G. Zhang, and C. Gan, “When does contrastive learning preserve adversarial robustness from pretraining to finetuning?” CoRR, vol. abs/2111.01124, 2021.
  32. P. Feng and H. Zhang, “Self-supervised image hash retrieval based on adversarial distillation,” in 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML), 2022, pp. 732–737. [Online]. Available: http://dx.doi.org/10.1109/CACML55074.2022.00127
  33. M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas, “Masked siamese networks for label-efficient learning,” CoRR, vol. abs/2204.07141, 2022.
  34. M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” CoRR, vol. abs/2301.08243, 2023.
  35. J. Yan, H. Chen, X. Li, and J. Yao, “Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis,” Computerized Medical Imaging & Graphics, vol. 97, pp. N.PAG–N.PAG, 2022. [Online]. Available: http://dx.doi.org/10.1016/j.compmedimag.2022.102053
  36. M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” CoRR, vol. abs/2006.09882, 2020.
  37. A. Gomez-Villa, B. Twardowski, L. Yu, A. D. Bagdanov, and J. van de Weijer, “Continually learning self-supervised representations with projected functional regularization,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 3866–3876. [Online]. Available: http://dx.doi.org/10.1109/CVPRW56347.2022.00432
  38. H. Kahng and S. B. Kim, “Self-supervised representation learning for wafer bin map defect pattern classification,” IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 1, 2021. [Online]. Available: http://dx.doi.org/10.1109/TSM.2020.3038165
  39. W. Dai, M. Erdt, and A. Sourin, “Self-supervised pairing image clustering for automated quality control,” Visual Computer, vol. 38, no. 4, 2022. [Online]. Available: http://dx.doi.org/10.1007/s00371-021-02137-y
  40. C.-H. Yeh, C.-Y. Hong, Y.-C. Hsu, and T.-L. Liu, “Saga: Self-augmentation with guided attention for representation learning,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3463–3467. [Online]. Available: http://dx.doi.org/10.1109/ICASSP43922.2022.9747302
  41. P. Yin, L. Qi, X. Xi, B. Zhang, and H. Qiao, “Nflb dropout: Improve generalization ability by dropping out the best -a biologically inspired adaptive dropout method for unsupervised learning,” in 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 1180–1186. [Online]. Available: http://dx.doi.org/10.1109/IJCNN.2016.7727331
  42. X. Li, X. Hu, X. Qi, L. Yu, W. Zhao, P.-A. Heng, and L. Xing, “Rotation-oriented collaborative self-supervised learning for retinal disease diagnosis,” IEEE Transactions on Medical Imaging, vol. 40, no. 9, 2021. [Online]. Available: http://dx.doi.org/10.1109/TMI.2021.3075244
  43. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” CoRR, vol. abs/1911.05722, 2019.
  44. K. Pang, K. Li, Y. Yang, H. Zhang, T. M. Hospedales, T. Xiang, and Y.-Z. Song, “Generalising fine-grained sketch-based image retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2019.00077
  45. J. Lu, L. Li, and C. Zhang, “Self-reinforcing unsupervised matching,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 8, 2022. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2021.3061945
  46. X. Fang, Y. Cai, Z. Cai, X. Jiang, and Z. Chen, “Sparse feature learning of hyperspectral imagery via multiobjective-based extreme learning machine,” Sensors (14248220), vol. 20, no. 5, 2020. [Online]. Available: http://dx.doi.org/10.3390/s20051262
  47. J. Liu, M. Gong, and H. He, “Deep associative neural network for associative memory based on unsupervised representation learning,” Neural Networks, vol. 113, 2019. [Online]. Available: http://dx.doi.org/10.1016/j.neunet.2019.01.004
  48. Y. LeCun, “A path towards autonomous machine intelligence,” under review, 2022.
  49. J. Zhang, H. Wang, J. Chu, S. Huang, T. Li, and Q. Zhao, “Improved gaussian–bernoulli restricted boltzmann machine for learning discriminative representations,” Knowledge-Based Systems, vol. 185, pp. N.PAG–N.PAG, 2019. [Online]. Available: http://dx.doi.org/10.1016/j.knosys.2019.104911
  50. B. Xiaojun and W. Haibo, “Contractive slab and spike convolutional deep boltzmann machine,” Neurocomputing, vol. 290, 2018. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2018.02.048
  51. M. Sakkari, M. Hamdi, H. Elmannai, A. AlGarni, and M. Zaied, “Feature extraction-based deep self-organizing map,” Circuits, Systems & Signal Processing, vol. 41, no. 5, 2022. [Online]. Available: http://dx.doi.org/10.1007/s00034-021-01914-3
  52. P. Goyal, Q. Duval, J. Reizenstein, M. Leavitt, M. Xu, B. Lefaudeux, M. Singh, V. Reis, M. Caron, P. Bojanowski, A. Joulin, and I. Misra, “VISSL,” https://github.com/facebookresearch/vissl, 2021.
  53. S. H. Lee, S. Lee, and B. C. Song, “Vision transformer for small-size datasets,” CoRR, vol. abs/2112.13492, 2021.
  54. Y. Zhong, H. Tang, J. Chen, J. Peng, and Y.-X. Wang, “Is self-supervised learning more robust than supervised learning?” CoRR, vol. abs/2206.05259, 2022.