Design and implementation of an FPGA-based convolutional neural network accelerator

Pitsis Antonios-Georgios

Full record

URI:

http://purl.tuc.gr/dl/dias/B72D4A43-A5A0-47DB-8A92-FEE788EF888A

Year

2018

Type of Item

Diploma Work

License

Details

Bibliographic Citation

Antonios-Georgios Pitsis, "Design and implementation of an FPGA-based convolutional neural network accelerator", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2018 https://doi.org/10.26233/heallink.tuc.79092

Appears in Collections

Diploma Works in Community School of Electrical and Computer Engineering

Diploma Works in Community Microprocessor and Hardware Laboratory

Relations with other Items

Is Part of the Item:

Methodology and experiments for large scale distributed computation on reconfigurable logic – based platform

Summary

In recent years Convolutional Neural Networks (CNNs) have shown extremely growth due to their effectiveness at complex image recognition problems. They are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image segmentation and classification. The continuing increasing amount of processing required by CNNs creates the field for hardware support methods. Moreover, CNN workloads have a streaming nature, well suited to reconfigurable hardware architectures such as FPGAs. The amount of research on the Machine Learning and especially on CNN (implemented on FPGA platforms) within the last 4 years demonstrates the tremendous industrial and academic interest. This study presents a CNN inference accelerator over FPGAs. The network we aim to accelerate was developed by Dr. Tsagatakis in the context of DEDALE project (Horizon 2020) for astrophysics subject. After carrying out Robustness Analysis computational workloads and memory accesses are analyzed, as well as compression methods and algorithmic optimizations to exploit FPGA parallelism. At the level of neurons, optimizations of the convolutional and fully connected layers are explained and compared. At the network level, approximate computing optimization methods are examined limited by not reducing the accuracy of the network. The platforms were used are ZCU102 and QFDB(a custom 4-FPGA platform developed at FORTH). The implemented accelerator was managed to achieve 20x latency speedup, 2.17x throughput speedup and 11.9x energy efficient over GPU NVIDIA-Quadro-K2200 in terms of EuroExa project.

Search

Browse

My Space

Design and implementation of an FPGA-based convolutional neural network accelerator

Pitsis Antonios-Georgios

Summary

Available Files

Services

Export

Share

Statistics

Metadata & Content in a METS Package:

Metadata in Format: