What Looks Good with my Sofa: Ensemble Multimodal Search for Interior Design

Ivona Tautkute; Aleksandra Możejko; Wojciech Stokowiec; Tomasz Trzciński; Łukasz Brocki; Krzysztof Marasek,

What Looks Good with my Sofa: Ensemble Multimodal Search for Interior Design

Ivona Tautkute, Aleksandra Możejko, Wojciech Stokowiec, Tomasz Trzciński, Łukasz Brocki, Krzysztof Marasek,

DOI: http://dx.doi.org/10.15439/2017F56

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 1275–1282 (2017)

Full text

Abstract. In this paper, we propose a multi-modal search engine for interior design that combines visual and textual queries. The goal of our engine is to retrieve interior objects, e.g. furniture or wall clocks, that share visual and aesthetic similarities with the query. Our search engine allows the user to take a photo of a room and retrieve with a high recall a list of items identical or visually similar to those present in the photo. Additionally, it allows to return other items that aesthetically and stylistically fit well together. To achieve this goal, our system blends the results obtained using textual and visual modalities. Thanks to this blending strategy, we increase the average style similarity score of the retrieved items by 11\%. Our work is implemented as a Web-based application and it is planned to be opened to the public.

References

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” pp. 1097–1105, 2012.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, and et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, p. 211252, Nov 2015.
D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06).
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” CoRR abs/1301.3781, Sep 2013.
J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” CoRR, vol. abs/1612.08242, 2016.
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, p. 91110, 2004.
Z. S. Harris, “Distributional structure,” Papers on Syntax, p. 322, 1981.
J. Sivic and A. Zisserman, “Video google: Efficient visual search of videos,” Toward Category-Level Object Recognition Lecture Notes in Computer Science, p. 127144, 2006.
F. Perronnin and C. Dance, “Fisher kernels on visual vocabularies for image categorization,” 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
H. Jegou, M. Douze, C. Schmid, and P. Perez, “Aggregating local descriptors into a compact image representation,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
A. Gordo, J. Almazn, J. Revaud, and D. Larlus, “Deep image retrieval: Learning global representations for image search,” Computer Vision ECCV 2016, p. 241257, 2016.
G. Tolias, R. Sicre, and H. Jégou, “Particular object retrieval with integral max-pooling of CNN activations,” CoRR, vol. abs/1511.05879, 2015.
S. Bell and K. Bala, “Learning visual similarity for product design with convolutional neural networks,” ACM Transactions on Graphics, vol. 34, no. 4, 2015.
G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, p. 613620, Jan 1975.
C. H. Q. Ding, “A similarity-based probability model for latent semantic indexing,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’99, 1999.
G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. A. Harshman, L. A. Streeter, and K. E. Lochbaum, “Information retrieval using a singular value decomposition model of latent semantic structure,” Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’88, 1988.
J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
M. E. Yumer and L. B. Kara, “Co-constrained handles for deformation in shape collections,” ACM Transactions on Graphics, vol. 33, no. 6, p. 111, 2014.
O. V. Kaick, K. Xu, H. Zhang, Y. Wang, S. Sun, A. Shamir, and D. Cohen-Or, “Co-hierarchical analysis of shape structures,” ACM Transactions on Graphics, vol. 32, p. 1, Jan 2013.
Z. Lun, E. Kalogerakis, and A. Sheffer, “Elements of style,” ACM Transactions on Graphics, vol. 34, no. 4, 2015.
“Art history and its methods: a critical anthology,” Choice Reviews Online, vol. 33, Jan 1996.
Y. Jing, D. Liu, D. Kislyuk, A. Zhai, J. Xu, J. Donahue, and S. Tavel, “Visual search at pinterest,” Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15, 2015.
J. Redmon, “Darknet: Open source neural networks in c.,” 2016.
G. Hinton and L. Van der Maaten, “Visualizing data using t-sne,” Journal of Machine Learning Research.
R. Józefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu, “Exploring the limits of language modeling,” CoRR, vol. abs/1602.02410, 2016.
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” CoRR, vol. abs/1409.3215, 2014.
W. Stokowiec, T. Trzcinski, K. Wolk, K. Marasek, and P. Rokita, “Shallow reading with deep learning: Predicting popularity of online content using only its title,” International Symposium on Methodologies for Intelligent Systems, (ISMIS), 2017.
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Lost in quantization: Improving particular object retrieval in large scale image databases,” 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
F. Chollet, “Keras,” 2015.
G. Bradski, “Opencv,” Dr. Dobb’s Journal of Software Tools, 2000.
S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, “Youtube-8m: A large-scale video classification benchmark,” CoRR, vol. abs/1609.08675, 2016.