Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL

Fahad Muslim, Alexandros Demian, Liang Ma, Luciano Lavagno, Affaq Qamar

DOI: http://dx.doi.org/10.15439/2016F327

Citation: Position Papers of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 9, pages 141–145 (2016)

Full text

Abstract. Modern SoCs are getting increasingly heterogeneous with a combination of multi-core architectures and hardware accelerators to speed up the execution of compute-intensive tasks at considerably lower power consumption. Modern FPGAs, due to their reasonable execution speed and comparatively lower power consumption, are strong competitors to the traditional GPU based accelerators. High-level Synthesis (HLS) simplifies FPGA programming by allowing designers to program FPGAs in several high-level languages e.g. C/C++, OpenCL and SystemC. This work focuses on using an HLS based methodology to implement a widely used classification algorithm i.e. k-nearest neighbor on an FPGA based platform directly from its OpenCL code. Multiple fairly different implementations of the algorithm are considered and their performance on FPGA and GPU is compared. It is concluded that the FPGA generally proves to be more power efficient as compared to the GPU. Furthermore, using an FPGA-specific OpenCL coding style and providing appropriate HLS directives can yield an FPGA implementation comparable to a GPU also in terms of execution time.

References

Mavroidis, I., Papaefstathiou, I., Lavagno, L., Nikolopoulos, D. S., Koch, D., Goodacre, J., ... & Palomino, M. (2016, March). ECOSCALE: Reconfigurable computing and runtime system for future exascale systems. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 696-701). IEEE.
Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., & Chung, E. S. (2015). Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2.
Ouyang, J., Lin, S., Qi, W., Wang, Y., Yu, B., & Jiang, S. (2014, August). Sda: Software-defined accelerator for largescale dnn systems. In Hot Chips(Vol. 26).
http://www.nextplatform.com/2015/08/27/microsoft-extends-fpga-reach-from-bing-to-deep-learning/ [Accessed: 26 April 2016].
Struyf, L., De Beugher, S., Van Uytsel, D. H., Kanters, F., & Goedemé, T. (2014, January). The battle of the giants: a case study of GPU vs FPGA optimisation for real-time image processing. In Proceedings PECCS 2014(Vol. 1, pp. 112-119). VISIGRAPP.
User Guide, “SDAccel Development Environment User Guide v2015.1”, Xilinx, 2015.
Garcia, V., Debreuve, E., Nielsen, F., & Barlaud, M. (2010, September). K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In 2010 IEEE International Conference on Image Processing (pp. 3757-3760). IEEE, http://dx.doi.org/10.1109/ICIP.2010.5654017.
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J. W., Lee, S. H., & Skadron, K. (2009, October). Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on (pp. 44-54). IEEE, http://dx.doi.org/10.1109/IISWC.2009.5306797.
User Guide, “Vivado Design Suite User Guide High-Level Synthesis v2015.1”, Xilinx, 2015.
Stone, J. E., Gohara, D., & Shi, G. (2010). OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering, 12(1-3), 66-73, http://dx.doi.org/10.1109/MCSE.2010.69.
Pu, Y., Peng, J., Huang, L., & Chen, J. (2015, May). An efficient KNN algorithm implemented on FPGA based heterogeneous computing system using OpenCL. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on (pp. 167-170). IEEE, http://dx.doi.org/10.1109/FCCM.2015.7.
Aydin, B. E. R. K. A. Y. (2014). Parallel algorithms on nearest neighbor search. Survey paper, Georgia State University.
Garcia, V., Debreuve, E., & Barlaud, M. (2008, June). Fast k nearest neighbor search using GPU. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on (pp. 1-6). IEEE, http://dx.doi.org/10.1109/CVPRW.2008.4563100.
Singh, D. (2011). Implementing FPGA design with the OpenCL standard.Altera whitepaper.
http://weather.unisys.com/hurricane/atlantic/2012/index.php [Accessed: 30 June 2016].