Decentralized AI processing
In short:
This building block objective is to implement decentralized AI treatments in data spaces, by enabling computations on edge nodes, without revealing individual contributions.
Timeline
Start date: T0 (expected: Q1 2024)
End date : T0 + 12 months
Duration (in months): 12
Where we are right now
Developments have yet to start
Want to join the effort? See the Working Groups.
Objectives
It aims at proposing a sovereign, open and ethical vision of educational data and its exploitation, based on a distributed approach to storage (edge computing) and processing (federated learning for example). It avoids the systematic collection of traces and the centralization of the models that process them by keeping the data and the processing as close as possible to the student or to the provider.
The building block will allow data providers to easily allow their data to be processed by the AI providers without having to exchange the data.
For instance:
I am a student and have identified my skills in a skills portfolio app A
from that app A, I can give my consent so that an AI provider B analyzes my skills and shows me in what industries I could work
on my consent, the algorithm of B is shared towards the dataset stored in A and executed on my data
I can see the result in my skills portfolio A
AI provider B never retrieved my data into their systems
Infrastructure-related functionalities:
The edge cloud infrastructure, built on an open-source platform, such as Kubernetes, aims at supporting the high level decentralized AI applications. Dedicated APIs (Application Programming Interfaces) will be exposed to the applications to enable given control on the operation. We plan to reuse the currently available features of e.g. Kubernetes as much as possible, however, several extensions will be required due to the special characteristics of the targeted decentralized AI applications. As a result, the implemented edge cloud infrastructure will provide the following functionalities:
data gathering/management functions to allow data providers to control the flow and storage of data coming from different sources (e.g. white lists of nodes can be defined where specific data can be stored and processed)
automatic, optimal control of data and function placement; AI providers can upload the AI processing logics as software artifacts, while the edge cloud platform deploys the functions to the optimal/closest/requested node having the data; available hardware accelerators (e.g. nodes with predefined GPU) can also be taken into account
dynamic placement optimization during the operation according to different quality constraints specified by the users or providers of the applications; e.g. data or functions can be migrated on-the-fly as a response to changes in network characteristics or the varying load on a given physical node
a dedicated Function as a Service (FaaS) API for AI applications supporting the on demand deployment and scaling of AI artifacts uploaded by AI providers taking the current location of data and hardware accelerators and also privacy requirements into consideration
novel horizontal and vertical resource scaling methods optimized for AI applications which can adjust the amount of allocated resources (compute, storage, network) and optimize the energy consumption of the overall infrastructure (e.g. artifacts running on CPU or GPU, performance vs. energy consumption)
Roles between partners:
BME will lead the task addressing the design, development and operation of the edge computing infrastructure supporting decentralized AI applications. BME will contribute to the establishment of the infrastructure making use of open-source components, such as Kubernetes. Based on BMEβs expertise and previous works, novel extensions and functionalities will also be proposed and implemented which enhance the edge infrastructure in terms of different aspects which are relevant to enable decentralized AI. Novel optimization algorithms will also be designed to control the energy consumption of the overall infrastructure (including compute, storage and network elements). BME will also focus on the software-related aspects and propose solutions fostering the development of upper level AI applications, such as novel FaaS APIs and related best practices which can be followed by developers.
polypoly will provide and further develop a privacy-preserving data repository and execution environment (so called polyPod) that can gather data from different sources. Other types of data will be collected and will be made available, upon user consent, for model learning and validation purposes in order to generate new knowledge and insights that can benefit users.
The polyPod is currently built by the polypoly cooperative, ownership of which is open to all European citizens, and lets users physically store and process data on their own devices in a fully distributed edge-based approach. The polyPod is an open, standardised and non-proprietary platform.
Uni Koblenz / Fraunhofer ISST
Fraunhofer ISST / Uni Koblenz will provide Gaia-X based technology (such as the Eclipse Dataspace Connector) to support federation of edge clouds with data spaces in this context.
Last updated