The controller must always define the purpose of the research and the data processing. However, in the field of scientific research, it may be accepted that the degree of precision of that objective is less precise or that the research purposes are not specified in their entirety, given the difficulties which researchers may have in identifying it entirely from the beginning of their work. It will then be possible to provide information to clarify the objective as the project progresses.
Possible derogations
Data processing carried out for the purpose of scientific research benefits from derogations and adjustments to certain data protection obligations (e.g. in relation to individuals’ rights or retention periods).
Reminder: what is ‘scientific research’ within the meaning of the GDPR?
The notion of ”scientific research” is to be understood broadly in the GDPR. In summary, the aim of the research is to produce new knowledge in all areas in which the scientific method is applicable.
In order to assist controllers in determining whether they can benefit from the provisions on scientific research, the CNIL proposes a set of criteria to assist the controller in determining whether the processing which pursues a research purpose falls within the scope of scientific research:
- In some cases, it will be possible to assume that the creation of training datasets for AI pursues a scientific research purpose due to the nature of the organisation (e.g. a university or a public research centre) or the type of funding (e.g. funding from the French National Research Agency).
- Otherwise, in particular for non-publicly funded private scientific research, the following criteria (based on the OECD Frascati Manual and its definition of R & D) should be examined together. As these criteria are cumulative, the controller will in principle have to demonstrate that they are all fulfilled in order for the processing to be considered scientific research within the meaning of the GDPR. Otherwise, a case-by-case analysis is necessary to qualify the processing.
- Novelty: the processing should be aimed at obtaining new results (a novelty may also result from a project that leads to the identification of potential discrepancies with the intended result). The purpose of the research can help in the qualification of scientific research. In this respect, the publication of articles in a peer-reviewed journal or the grant of a patent makes it possible to qualify the novelty criterion.
- Creativity: this criterion is based on original and non-obvious notions and hypotheses – the contribution of the work to scientific knowledge or the state of the art. The development of collective knowledge that not only benefits the moral entity that supports the research project is a strong indication to qualify it as a scientific one.
- Uncertainty: the processing must be uncertain as to the final outcome.
- Systematicity: the processing must be part of planning and budgeting and implement a scientific methodology. Adhesion to relevant industry standards of methodology and ethics is a strong indicator to qualify research as a scientific one.
This is, for example, the case of specific methodological requirements for processing carried out for the purposes of research, study or evaluation in the field of health, which result in particular from Articles 72 et seq. of the French Data Protection Act.
- Transferability/reproducibility: the processing should lead to results that can be replicated or transferred to a wider field. For example, the publication of the study carried out and the presentation of the research methodology adopted is a strong indication to highlight the willingness of the project leader(s) to share the results of the research.
Example:
The development of an AI system for proof of concept to demonstrate the robustness of machine learning requiring less training data could be considered as pursuing scientific research purposes, which would be part of a documented scientific approach for publication.
Find out more:
Text transcript
Case 1 : The operational use of the AI system during the deployment phase is identified from the development phase.
If the purpose in the deployment phase is specified, explicit and legitimate, it is also considered that the purpose in the development phase is specified, explicit and legitimate.
Case 2 : The operational use of the AI system during the deployment phase is not clearly defined from the development phase (general purpose AI systems).
The purpose of the processing in the development phase must refer cumulatively to :
- the 'type' of system developed
- technically feasible functionalities and capabilities
It is recommended that the purpose also include :
- the most at-risk foreseeable capabilities
- functionalities excluded by design
- as far as possible, the conditions of use of the AI system
Case 3 : Creating a training dataset for scientific research purposes
It may be acceptable for the objective to be specified with a lower degree of precision, or for the research objectives not to be specified in their entirety, given the difficulties in defining it entirely from the beginning.