Can Unlabelled Data Improve AI Applications? A Comparative Study on Self-Supervised Learning in Computer Vision.

Markus Bauer; Christoph Augenstein

Can Unlabelled Data Improve AI Applications? A Comparative Study on Self-Supervised Learning in Computer Vision.

Markus Bauer, Christoph Augenstein

DOI: http://dx.doi.org/10.15439/2023F8371

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 93–101 (2023)

Full text

Abstract. Artificial Intelligence (AI) represents a highly investigated area of study at present and has already become an indispensable component within an extensive range of business models and applications. One major downside of current supervised AI approaches lies in the need of numerous annotated data points to train the models. Self-supervised learning (SSL) circumvents the need for annotation, by creating supervision signals such as labels from the data itself, rather than requiring experts for this task. Current approaches mainly include the use of generative methods such as autoencoders and joint embedding architectures to fulfil this task. Recent works present comparable results to supervised learning in downstream scenarios such as classification after SSL-pretraining. To achieve this, typically modifications are required to suit the approach for the exact downstream task. Yet, current review works haven't paid too much attention to the practical implications of using SSL. Thus, we investigated and implemented popular SSL approaches, suitable for downstream tasks such as classification, from an initial collection of more than 400 papers. We evaluate a selection of these approaches under real-world dataset conditions, and in direct comparison to the supervised learning scenario. We conclude that SSL has the potential to take up with supervised learning, if the right training methods are identified and applied. Furthermore, we also introduce future directions for SSL research, as well as current limitations in real-world applications.

References

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 248–255. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2009.5206848
M. Assran, R. Balestriero, Q. Duval, F. Bordes, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, and N. Ballas, “The hidden uniform cluster prior in self-supervised learning,” CoRR, vol. abs/2210.07277, 2022.
R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Goldstein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, A. Schwarzschild, A. G. Wilson, J. Geiping, Q. Garrido, P. Fernandez, A. Bar, H. Pirsiavash, Y. LeCun, and M. Goldblum, “A cookbook of self-supervised learning,” CoRR, vol. abs/2304.12210, 2023.
L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, 2012. [Online]. Available: http://dx.doi.org/10.1109/MSP.2012.2211477
M.-E. Nilsback and A. Zisserman, “A visual vocabulary for flower classification,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2006, pp. 1447–1454. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2006.42
P. Omkar, M., V. Andrea, Z. Andrew, and J. C., V., “Cats and dogs,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2012.6248092
W. M. Bramer, G. B. D. Jonge, M. L. Rethlefsen, F. Mast, and J. Kleijnen, “A systematic approach to searching: an efficient and complete method to develop literature searches,” Journal of the Medical Library Association, vol. 106, no. 4, Oct. 2018. [Online]. Available: http://dx.doi.org/10.5195/jmla.2018.283
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res., vol. 11, dec 2010. [Online]. Available: http://dx.doi.org/10.5555/1756006.1953039
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” CoRR, vol. abs/1312.6114, 2013.
A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,” CoRR, vol. abs/1511.05644, 2015.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” CoRR, vol. abs/2002.05709, 2020.
J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” CoRR, vol. abs/2103.03230, 2021.
M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” CoRR, vol. abs/1603.09246, 2016.
S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” CoRR, vol. abs/1803.07728, 2018.
Z. Wu, Y. Xiong, S. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance-level discrimination,” CoRR, vol. abs/1805.01978, 2018.
I. Misra and L. van der Maaten, “Self-supervised learning of pretext-invariant representations,” CoRR, vol. abs/1912.01991, 2019.
L. Ternes, M. Dane, S. Gross, M. Labrie, G. Mills, J. Gray, L. Heiser, and Y. H. Chang, “A multi-encoder variational autoencoder controls multiple transformational features in single-cell image analysis,” Communications Biology, vol. 5, no. 1, 2022. [Online]. Available: http://dx.doi.org/10.1038/s42003-022-03218-x
W. Xiong, L. Zhang, B. Du, and D. Tao, “Combining local and global: Rich and robust feature pooling for visual recognition,” Pattern Recognition, vol. 62, 2017. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2016.08.006
S. Zhang, M. Xu, J. Zhou, and S. Jia, “Unsupervised spatial-spectral cnn-based feature learning for hyperspectral image classification,” IEEE Transactions on Geoscience & Remote Sensing, 2022. [Online]. Available: http://dx.doi.org/10.1109/TGRS.2022.3153673
C. Vununu, S.-H. Lee, and K.-R. Kwon, “A strictly unsupervised deep learning method for hep-2 cell image classification,” Sensors (14248220), vol. 20, no. 9, 2020. [Online]. Available: http://dx.doi.org/10.3390/s20092717
V. Prasad, D. Das, and B. Bhowmick, “Variational clustering: Leveraging variational autoencoders for image clustering,” CoRR, vol. abs/2005.046132, 2020.
J. Guérin, S. Thiery, E. Nyiri, O. Gibaru, and B. Boots, “Combining pretrained cnn feature extractors to enhance clustering of complex natural images,” Neurocomputing, vol. 423, 2021. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2020.10.068
J. Yang, D. Parikh, and D. Batra, “Joint unsupervised learning of deep representations and image clusters,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2016.556
X. Chen, C.-J. Hsieh, and B. Gong, “When vision transformers outperform resnets without pre-training or strong data augmentations,” CoRR, vol. abs/2106.01548, 2021.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
H. Dong, L. Zhang, and B. Zou, “Exploring vision transformers for polarimetric sar image classification,” IEEE Transactions on Geoscience & Remote Sensing, 2022. [Online]. Available: http://dx.doi.org/10.1109/TGRS.2021.3137383
X. Wang, J. Zhu, Z. Yan, Z. Zhang, Y. Zhang, Y. Chen, and H. Li, “Last: Label-free self-distillation contrastive learning with transformer architecture for remote sensing image scene classification,” IEEE Geoscience and Remote Sensing Letters, vol. 19, 2022. [Online]. Available: http://dx.doi.org/10.1109/LGRS.2022.3185088
W. Zhou, Y. Hou, K. Ouyang, and S. Zhou, “Exploring complementary information of self–supervised pretext tasks for unsupervised video pre–training,” IET Computer Vision (Wiley-Blackwell), vol. 16, no. 3, 2022. [Online]. Available: http://dx.doi.org/10.1049/cvi2.12084
J. Ding, E. Xie, H. Xu, C. Jiang, Z. Li, P. Luo, and G.-S. Xia, “Deeply unsupervised patch re-identification for pre-training object detectors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2022. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2022.3164911
Y. Li, S. Kan, J. Yuan, W. Cao, and Z. He, “Spatial assembly networks for image representation learning,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13 871–13 880. [Online]. Available: http://dx.doi.org/10.1109/CVPR46437.2021.01366
L. Fan, S. Liu, P.-Y. Chen, G. Zhang, and C. Gan, “When does contrastive learning preserve adversarial robustness from pretraining to finetuning?” CoRR, vol. abs/2111.01124, 2021.
P. Feng and H. Zhang, “Self-supervised image hash retrieval based on adversarial distillation,” in 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML), 2022, pp. 732–737. [Online]. Available: http://dx.doi.org/10.1109/CACML55074.2022.00127
M. Assran, M. Caron, I. Misra, P. Bojanowski, F. Bordes, P. Vincent, A. Joulin, M. Rabbat, and N. Ballas, “Masked siamese networks for label-efficient learning,” CoRR, vol. abs/2204.07141, 2022.
M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” CoRR, vol. abs/2301.08243, 2023.
J. Yan, H. Chen, X. Li, and J. Yao, “Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis,” Computerized Medical Imaging & Graphics, vol. 97, pp. N.PAG–N.PAG, 2022. [Online]. Available: http://dx.doi.org/10.1016/j.compmedimag.2022.102053
M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” CoRR, vol. abs/2006.09882, 2020.
A. Gomez-Villa, B. Twardowski, L. Yu, A. D. Bagdanov, and J. van de Weijer, “Continually learning self-supervised representations with projected functional regularization,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 3866–3876. [Online]. Available: http://dx.doi.org/10.1109/CVPRW56347.2022.00432
H. Kahng and S. B. Kim, “Self-supervised representation learning for wafer bin map defect pattern classification,” IEEE Transactions on Semiconductor Manufacturing, vol. 34, no. 1, 2021. [Online]. Available: http://dx.doi.org/10.1109/TSM.2020.3038165
W. Dai, M. Erdt, and A. Sourin, “Self-supervised pairing image clustering for automated quality control,” Visual Computer, vol. 38, no. 4, 2022. [Online]. Available: http://dx.doi.org/10.1007/s00371-021-02137-y
C.-H. Yeh, C.-Y. Hong, Y.-C. Hsu, and T.-L. Liu, “Saga: Self-augmentation with guided attention for representation learning,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3463–3467. [Online]. Available: http://dx.doi.org/10.1109/ICASSP43922.2022.9747302
P. Yin, L. Qi, X. Xi, B. Zhang, and H. Qiao, “Nflb dropout: Improve generalization ability by dropping out the best -a biologically inspired adaptive dropout method for unsupervised learning,” in 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 1180–1186. [Online]. Available: http://dx.doi.org/10.1109/IJCNN.2016.7727331
X. Li, X. Hu, X. Qi, L. Yu, W. Zhao, P.-A. Heng, and L. Xing, “Rotation-oriented collaborative self-supervised learning for retinal disease diagnosis,” IEEE Transactions on Medical Imaging, vol. 40, no. 9, 2021. [Online]. Available: http://dx.doi.org/10.1109/TMI.2021.3075244
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” CoRR, vol. abs/1911.05722, 2019.
K. Pang, K. Li, Y. Yang, H. Zhang, T. M. Hospedales, T. Xiang, and Y.-Z. Song, “Generalising fine-grained sketch-based image retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2019.00077
J. Lu, L. Li, and C. Zhang, “Self-reinforcing unsupervised matching,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 44, no. 8, 2022. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2021.3061945
X. Fang, Y. Cai, Z. Cai, X. Jiang, and Z. Chen, “Sparse feature learning of hyperspectral imagery via multiobjective-based extreme learning machine,” Sensors (14248220), vol. 20, no. 5, 2020. [Online]. Available: http://dx.doi.org/10.3390/s20051262
J. Liu, M. Gong, and H. He, “Deep associative neural network for associative memory based on unsupervised representation learning,” Neural Networks, vol. 113, 2019. [Online]. Available: http://dx.doi.org/10.1016/j.neunet.2019.01.004
Y. LeCun, “A path towards autonomous machine intelligence,” under review, 2022.
J. Zhang, H. Wang, J. Chu, S. Huang, T. Li, and Q. Zhao, “Improved gaussian–bernoulli restricted boltzmann machine for learning discriminative representations,” Knowledge-Based Systems, vol. 185, pp. N.PAG–N.PAG, 2019. [Online]. Available: http://dx.doi.org/10.1016/j.knosys.2019.104911
B. Xiaojun and W. Haibo, “Contractive slab and spike convolutional deep boltzmann machine,” Neurocomputing, vol. 290, 2018. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2018.02.048
M. Sakkari, M. Hamdi, H. Elmannai, A. AlGarni, and M. Zaied, “Feature extraction-based deep self-organizing map,” Circuits, Systems & Signal Processing, vol. 41, no. 5, 2022. [Online]. Available: http://dx.doi.org/10.1007/s00034-021-01914-3
P. Goyal, Q. Duval, J. Reizenstein, M. Leavitt, M. Xu, B. Lefaudeux, M. Singh, V. Reis, M. Caron, P. Bojanowski, A. Joulin, and I. Misra, “VISSL,” https://github.com/facebookresearch/vissl, 2021.
S. H. Lee, S. Lee, and B. C. Song, “Vision transformer for small-size datasets,” CoRR, vol. abs/2112.13492, 2021.
Y. Zhong, H. Tang, J. Chen, J. Peng, and Y.-X. Wang, “Is self-supervised learning more robust than supervised learning?” CoRR, vol. abs/2206.05259, 2022.