Ensuring the lawfulness of the data processing

16 October 2023

An organisation that wishes to build a training dataset containing personal data, and then use that dataset to build an AI system, must ensure that the processing is permitted by law. The CNIL helps you determine your obligations based on your responsibility and the means of collecting or reusing the data.

The controller must in all cases define a legal basis and carry out, depending on the means of collection or re-use of the data, certain additional checks.

There are several ways to build a dataset for training purposes :

  • the data are collected directly from individuals;
  • the data are indirectly collected from open sources on the Internet for this purpose;
  • the data were initially collected for another purpose by the controller itself (e.g. in the context of providing a service to its users) or by another controller. This means taking additional precautions.





Define a legal basis

In case of re-use of data, carry out the necessary additional tests and checks

In addition to these prior checks, and regardless of the method of collection used, re-users must fully analyse the conformity of their own processing operations, including when they reuse datasets whose constitution and sharing are outside the scope of French or European law (contrary to their re-use by an entity established on French or European territory which is subject to the GDPR). In particular, the re-user must ensure compliance with the requirements regarding the persons whose data are present in the dataset thus obtained: the re-user must inform them of the processing that he wishes to implement, and allow them to exercise their rights.

Please note: a how-to sheet on information and management of people’s rights will be published at a later date.

Previous :  Determining the legal qualification of AI system providers Table of contents Next : Carrying out a data protection impact assessment when necessary