Accelerating TensorFlow machine learning inferences on FPGA-based edge platforms

Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

As the world becomes more interconnected and data-driven, the demand for real-time data analysis and forecasting is increasing. Time-series forecasting, which predicts future values based on historical data, is widely used in this context. Machine learning and deep learning models are effective for such tasks but are computationally intensive, posing challenges for deployment on edge devices with limited processing power and energy constraints. This work explores hardware–software co-design using FPGAs (Field Programmable Gate Arrays) to accelerate time-series inference. FPGAs offer parallel computation capabilities, reducing latency and increasing throughput for real-time applications. We implement the accelerator on AMD’s Zynq UltraScale+ MPSoC, which combines a dual-core Cortex-A53 processor with programmable logic on a single chip, enabling seamless offloading of complex computations. The accelerator was developed using Vitis HLS and integrated with the processing system via Vivado. A PetaLinux project enabled communication with the Linux kernel, while custom C++ drivers interfaced the accelerator with the TensorFlow Lite runtime over the AXI protocol. A TensorFlow Lite delegate was developed to offload fully connected layer computations seamlessly onto the FPGA. The complete system was deployed on the Zynq UltraScale+ MPSoC, and experimental evaluation compared CPU-only and CPU–FPGA setups in terms of latency, power consumption, resource utilization, and accuracy. Results showed that inference on the FPGA accelerator was only 0.5% slower than the CPU-only baseline, as the primary focus was on establishing the hardware-software co-design pipeline rather than optimizing the hardware for maximum performance. Model evaluation achieved a mean absolute error (MAE) of 0.01 and mean squared error (MSE) of 0.20 for the magnitude component, while the phase component obtained an MAE of 4.75 and MSE of 14.99. These findings demonstrate that even with minimal optimization, FPGA acceleration integrated with TensorFlow Lite delegates provides a functional and extensible framework for real-time forecasting on edge devices, paving the way for more efficient and responsive edge computing solutions.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By