Machine Vision in Food Recognition: Attempts to Enhance CBVIR Tools

Andrzej Śluzek

DOI: http://dx.doi.org/10.15439/2016F579

Citation: Position Papers of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 9, pages 57–61 (2016)

Full text

Abstract. Visual identification of complex images (e.g. images of food) remains a challenging problem. In particular, content-based visual information retrieval (CBVIR) methods, which seem a natural choice for such tasks, are often constrained by specific characteristics of the images of interest and (possibly) other practical requirements. In this paper, a novel CBVIR approach to automatic food identification is proposed, taking into account characteristics of solutions currently existing in this area. Based on limitations of those solutions, we present a scheme in which a co-occurrence of MSER features extracted from three color channels is employed to build a \textit{bag-of-words} histogram. Subsequently, food images are matched by detecting similarities between those histograms. Preliminary tests on a recently published benchmark dataset UNICT-FD889 reveal certain advantages of the scheme and highlight its limitations. In particular, a need of a novel methodology for segmentation of food images has been identified.

References

G. M. Farinella, D. Allegra, and F. Stanco, “A benchmark dataset to study the representation of food images,” in Proc. ECCV 2014 Workshops, vol. III, 2015, pp. 584–599. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-16199-0_41
F. Kong and J. Tan, “Dietcam: Automatic dietary assessment with mobile camera phones,” Pervasive and Mobile Computing, vol. 8, no. 1, pp. 147–163, 2012. [Online]. Available: http://dx.doi.org/10.1016/j.pmcj.2011.07.003
Y. Matsuda, H. Hoashi, and K. Yanai, “Recognition of multiple-food images by detecting candidate regions,” in Proc. IEEE Int.Conf. on Multimedia and Expo, 2012, pp. 25–30. [Online]. Available: http://dx.doi.org/10.1109/ICME.2012.157
H. Hoashi, T. Joutou, and K. Yanai, “Image recognition of 85 food categories by feature fusion,” in Proc. IEEE Int. Symposium on Multimedia, 2010, pp. 296–301. [Online]. Available: http://dx.doi.org/10.1109/ISM.2010.51
S. Yang, M. Chen, D. Pomerleau, and R. Sukthankar, “Food recognition using statistics of pairwise local features,” in Proc. IEEE Conf. CVPR 2010, 2010, pp. 2249–2256. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2010.5539907
Z. Zong, D. Nguyen, P. Ogunbona, and W. Li, “On the combination of local texture and global structure for food classification,” in Proc. IEEE Int. Symposium on Multimedia, 2010, pp. 204–211. [Online]. Available: http://dx.doi.org/10.1109/ISM.2010.37
G. O’Loughlin, S. Cullen, A. McGoldrick, S. O’Connor, R. Blain, S. O’Malley, and G. Warrington, “Using a wearable camera to increase the accuracy of dietary analysis,” American Journal of Preventive Medicine, vol. 44, no. 3, pp. 297–301, 2013. [Online]. Available: http://dx.doi.org/10.1016/j.amepre.2012.11.007
F. Zhu, M. Bosch, I. Woo, S. Kim, C. Boushey, D. Ebert, and E. Delp, “The use of mobile devices in aiding dietary assessment and evaluation,” Journal of Selected Topics in Signal Processing, vol. 4, no. 4, pp. 756–766, 2010. [Online]. Available: http://dx.doi.org/10.1109/JSTSP.2010.2051471
M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang, “Pfid: Pittsburgh fast-food image dataset,” in Proc. IEEE Conf. ICIP 2009, 2009, pp. 289–292. [Online]. Available: http://dx.doi.org/10.1109/ICIP.2009.5413511
A. Jimenez, A. Jain, R. Ruz, and J. Rovira, “Automatic fruit recognition: a survey and new results using range/attenuation images,” Pattern Recognition, vol. 32, no. 10, pp. 1719–1739, 1999. [Online]. Available: http://dx.doi.org/10.1016/S0031-3203(98)00170-8
F. Pla, “Recognition of partial circular shapes from segmented contours,” Comput. Vision & Image Understanding, vol. 63, no. 2, pp. 334–343, 1996. [Online]. Available: http://dx.doi.org/10.1006/cviu.1996.0023
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [Online]. Available: http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94
J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” in Proc. 9th IEEE Conf. ICCV 2003, vol. 2, Nice, 2003, pp. 1470–1477. [Online]. Available: http://dx.doi.org/10.1109/ICCV.2003.1238663
T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, vol. 29, no. 1, pp. 51–59, 1996. [Online]. Available: http://dx.doi.org/10.1016/0031-3203(95)00067-4
M. Varma and A. Zisserman, “A statistical approach to texture classification from single images,” International Journal of Computer Vision, vol. 62, no. 1-2, pp. 61–81, 2005. [Online]. Available: http://dx.doi.org/10.1007/s11263-005-4635-4
X. Qi, R. Xiao, J. Guo, and L. Zhang, “Pairwise rotation invariant co-occurrence local binary pattern,” in Proc. ECCV 2012, 2012, pp. 158–171. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-33783-3_12
J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” Image and Vision Computing, vol. 22, pp. 761–767, 2004. [Online]. Available: http://dx.doi.org/10.1016/j.imavis.2004.02.006
C. Schmid and R. Mohr, “Local grayvalue invariants for image retrieval,” IEEE Trans PAMI, vol. 19, no. 5, pp. 530–535, 1997. [Online]. Available: http://dx.doi.org/10.1109/34.589215
Z. Wu, Q. Ke, M. Isard, and J. Sun, “Bundling features for large scale partial-duplicate web image search,” in Proc. IEEE Conf. CVPR 2009, 2009, pp. 25–32. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2009.5206566
A. Śluzek, “Extended keypoint description and the corresponding improvements in image retrieval,” LNCS (Revised Selected Papers of ACCV 2014 Workshops), vol. 9008, pp. 698–707, 2015. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-16628-5 50
R. Arandjelovic and A. Zisserman, “Three things everyone should know to improve object retrieval,” in Proc. IEEE Conf. CVPR 2012, 2012, pp. 2911–2918. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2012.6248018
S.-H. Cha and S. Srihari, “On measuring the distance between histograms,” Pattern Recognition, vol. 35, pp. 1355–1370, 2002. [Online]. Available: http://dx.doi.org/10.1016/S0031-3203(01)00118-2
M. Swain and D. Ballard, “Color indexing,” International Journal of Computer Vision, vol. 7, no. 1, pp. 11–32, 1991. [Online]. Available: http://dx.doi.org/10.1007/BF00130487
A. Śluzek and M. Paradowski, “Reinforcement of keypoint matching by co-segmentation in object retrieval: Face recognition case study,” LNCS (Proc. ICONIP 2012), vol. 7667, pp. 34–41, 2012. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-34500-5_5