Defining a purpose

07 June 2024

The creation of a training dataset containing personal data is a processing of personal data which, pursuant to the GDPR, must have a purpose that is ’specified, explicit and legitimate’. The CNIL helps you define the purpose taking into account the specificities of AI systems development

This content is a courtesy translation of the original publication in French. In the event of any inconsistencies between the French version and this English translation, please note that the French version shall prevail.

The principle

The purpose of the processing is the aim of the use of personal data. This objective must be specified, i.e. defined as soon as the project starts. It must also be explicit, that is to say, known and understandable. Finally, it must be legitimate, i.e. compatible with the tasks of the organisation.

The data must not be further processed in a manner incompatible with this initial purpose: the principle of purpose limitation restricts how the controller may use or reuse these data in the future.

The requirement of a specified, explicit and legitimate purpose is particularly important, as it determines the application of other principles of the GDPR, including:

  • the principle of transparency: the purpose of the processing must be brought to the attention of the data subjects so that they are able to know the reason for the collection of the data concerning them and to understand the use that will be made of it;
     
  • the principle of data minimisation: the data selected must be adequate, relevant and limited to what is necessary for the purposes for which they are processed;
     
  • the principle of storage limitation: the data may only be kept for a limited period, defined according to the purpose for which it was collected.
Find out more: Defining a purpose

How to define the purpose of the processing when the operational use is identified from the development stage?


How to define the purpose of processing for the development of general purpose AI systems?


How to define the purposes of the development of an AI system for scientific research ?