heal.abstract |
In today’s digital era, the explosive growth of data and the increasing prominence of artificial intel-
ligence (AI) have made neural networks a linchpin of modern computing. These versatile AI models
serve as the foundation for diverse applications, from image recognition to natural language processing,
transforming industries and our digital landscape. As neural networks take center stage, the demand
for their efficient execution becomes increasingly critical.
Traditionally, neural networks are deployed in Cloud (Cloud computing) environments, which are
known for their extensive computational resources located within data centers. While this approach
offers significant computational power, it introduces challenges related to latency and network avail-
ability. These challenges can be particularly limiting for applications that demand real-time respon-
siveness.
In this work, We deploy neural networks in the Edge (Edge computing). Edge computing represents
an alternative approach to neural network deployment, seeking to address the limitations of traditional
cloud environments. It brings computation closer to data sources, enabling real-time data processing.
In this way We reduce exeuction latency of neural networks and enhance the responsiveness of ap-
plications, making it ideal for scenarios where timely decision-making is critical. However, the edge
environment presents its set of challenges. The devices operating at the edge vary widely in com-
putational capacity, from high-performance servers to resource-constrained IoT devices. Managing
this heterogeneity and efficiently allocating resources to ensure optimal neural network execution is
complex. For this reason, We make use of Serverless computing. Serverless computing abstracts the
complexities of infrastructure management, simplifying resource scaling, reducing operational overhead
and optimizing resource utilization. This approach aligns seamlessly with edge environments.
By leveraging Serverless computing in the Edge, We designed and developed a complete and robust
framework for deploying neural networks in an edge cluster.
On top of our framework is a Reinforcement Learning (RL) algorithm. Its core mission is twofold:
first, to ensure that neural networks execution latency is within the defined SLAs, meeting response
time targets; second, to optimize energy consumption by allocating tasks to energy-efficient devices
whenever this is feasible. This RL algorithm plays a pivotal role in enhancing the overall efficiency
and responsiveness of our system. |
en |