Determining the legal qualification of AI system providers
AI system providers who intend to create their datasets for the learning of AI systems with personal data must determine their qualification under the GDPR: they may be qualified as controllers, joint controllers or processors.
Controller
The principle
The controller is the natural or legal person who determines the purposes and means of processing, i.e. who decides on the ‘why’ and ‘how’ of the use of personal data.
The essential means of processing should be understood as those that are closely linked to the purpose and the scope of the processing, such as the type of personal data that are collected and used, the hardware and software used for the processing as well as their security, the duration of the processing, the categories of recipients and the categories of data subjects.
In practice
A provider who is at the initiative of the development of an AI system and that creates the training dataset of its AI system from data which it has selected itself on its own account, may be qualified as a controller.
Examples of controllers:
- A video-on-demand platform wants to develop an AI recommendation system. To this end, it reuses a dataset of its customers that was originally collected for the purpose of providing the service.
The video-on-demand platform that reuses the dataset to train its AI recommendation system, is responsible for this new processing since it has decided on the purpose (training an AI recommendation system) and the essential means of processing (i.e., the dataset it has already collected for another purpose and will reuse).
- The provider of a conversational agent who trains its large language model (LLM) using publicly available data on the Internet is controller of the reuse of publicly available personal data on the Internet. Indeed, it decides both the purpose (training an AI system) and the essential means of processing (selecting the data it will reuse).
Reuse of data collected by another entity
When the AI system provider who constitutes the training dataset, reuses data originally collected by another entity, it is necessary to distinguish between:
- the data diffuser: the natural or legal person, public or private, who uploads personal data or a dataset with personal data online;
- the reuser of the data: the natural or legal person, public or private, who processes such data or datasets with the intention of using them on its own account.
The diffuser and the reuser of the data are, in principle, responsible for separate processing, since each determines the objectives and the essential means of its own processing.
The data diffuser is, in principle, responsible for the public dissemination, while the provider of the AI system that reuses the data is responsible for the reuse. The diffuser is not, in principle, responsible for the reuse of its data. It may, however, lay down conditions for the use of the data disseminated to limit reuse or provide for certain provisions.
Example:
An administration makes real estate data public and freely reusable (open data). A company wants to reuse this data to create a training dataset in order to develop an AI system consisting of predicting certain real estate advancements within a designated area. The diffuser and the reuser are then responsible for separate processing, provided that these two processings are independent.
Learn more: Sheet 1 of the guide on the opening and reuse of publicly accessible data.
Joint controllers
The principle
Where two or more controllers jointly determine the purposes and means of processing, they are joint controllers.
This qualification may be difficult in the presence of several actors exercising an influence on the determination of the purposes and means of the processing. In particular, the actors must determine whether they are processing the data for their own and distinct purposes or for a common purpose.
In practice
When the training dataset of an AI system is fed by more than one controller for a jointly defined purpose, the controllers may be qualified as joint controllers.
Examples:
- Academic hospitals developing an AI system for the analysis of medical imaging data choose to use the same federated learning protocol. The latter allows them to exploit data for which they are initially separate controllers, in order to benefit from the mutualization.
Together, they determine the purpose (training a medical imaging AI system) and the means of this processing(through the choice of protocol and the determination of the data they exploit): they are therefore joint controllers.
- A consortium consisting of a municipality, a company providing an automated image processing software and a company providing video devices is conducting an experiment to install smart cameras to record and analyze the flow and behaviour of vehicles in a traffic lane within a municipality. The contract between the city and the two companies provides for the use of the software by the municipality in real-time conditions and the possibility for the two companies to improve the automated image processing software by using the data collected in real time.
The municipality and the two companies will be joint controllers for the processing of the automated image processing software training dataset as long as they jointly decide on the purpose and essential means of the processing and the companies do not act solely on behalf of the municipality. Indeed, it is possible to consider that they jointly decide on the essential means of processing (by choosing to feed the AI system training dataset with real-time data collected by augmented cameras and data already collected by the company providing the automated image processing software) and the purpose of the processing (to experimentally train an AI system that detects particular vehicle behaviour and improves the automated image processing software).
In case of joint control, the parties must ensure that the data processing is lawful (i.e. its compliance with the law), including by defining in a transparent manner their respective obligations under an agreement. The form of this agreement is not specified by the GDPR. The agreement must reflect the roles of each of the joint controllers. They have to clearly specify “who does what” to ensure the protection of the data processed.
Please note: regardless of the terms of the agreement, the data subject may exercise his or her rights vis-à-vis each of the joint controllers.
The use of a processor
The principle
The processor is the natural or legal person who processes data on behalf of the controller, in the context of a service or provision.
In practice
The qualification of the AI system provider must be conducted on a case-by-case basis. An AI system provider may itself be processor when it develops an AI system on behalf of one of its customers, as part of a service. The customer is responsible for the processing as soon as he determines the purpose and means of the processing. In other configurations, an AI system provider may be qualified as a controller for processing the AI systems it creates to market them.
An AI system provider creating the training dataset of the AI system that it develops on its own behalf, may use a subcontractor to collect and process the data according to its documented instructions (e.g. collecting publicly available data on the Internet, reusing a specific dataset made available online, etc.). The latter is then classified as a processor. It is essential for the provider of the AI system, in its quality of controller, to ensure that its processor complies with the GDPR and limits the processing of data to its instructions, in particular by concluding an agreement.
Moreover, the fact of using the same dataset for several customers, in the context of separate services, is generally a decisive indication that the provider is controller for the separate processing of the dataset creation.