Developing and training an algorithm

The AI system provider must face a series of checks and precautions in order to guarantee the quality of the system. However, the following questions are also of interest to users of the system (if they are not the provider), who may be liable should the processing carried out not comply with the GDPR. They can then check the extent to which the provider has taken data protection issues into account in the system design.

Designing and developing a reliable algorithm

When designing the processing, particular attention should be paid to the choice of algorithm, tools and development infrastructure in order to achieve reliable and robust processing.

The following questions may help the data controller to assess whether a balance is maintained between the complexity of the solutions chosen and the loss of explainability.

What type of algorithm is used and how does it work (supervised, unsupervised, continuous or federated learning, learning by reinforcement, etc.)?

Has a search for possible flaws in the tool documentation and among the developer community raised any points of interest?

Applying a meticulous training protocol

By establishing a protocol for training the AI system, the provider will be able to challenge the choice of methods for validating the performance and fairness of the algorithm, while incorporating validation tests at the most critical stages. Since they provide the first window of opportunity to observe the algorithm, the training and validation metrics must be chosen carefully: in a seropositivity test for a contagious disease, it is more important to limit the false negative rate than the false positive rate, for example.

Do they allow satisfactory measurement of the performance with due regard to the consequences for the data subjects?

Has a correlation been sought with the value taken by any of the variables in the data (e.g. is the error rate greater for people of a particular gender?) ?

Are boundary situations where system outputs are not sufficiently reliable clearly identified?

Are safety mechanisms added to handle these situations (by systematically handing control to a human operator, for example)?

Continuous learning scenario: are measures taken upstream to avoid deteriorated performances, model drift or attacks aimed at influencing the results of the algorithm (e.g. online chatbots becoming “racist”)?

Checking the quality of the system in a controlled environment

The tests on the algorithm and the system as a whole must be carried out under representative conditions allowing for a comprehensive validation of the processing. The needs of the users and their expertise should be taken into account as much as possible in this phase.

Has the user of the AI system (business team, supplier's customers, etc.) been included in the experimentation process?

Has their opinion been taken into account in the design of the tool to best fit their needs and to correct any flaws they may have identified?

Is the context in which the AI system exists (number of variables to be considered, difficulty in assessing the representativeness of the data, etc.) particularly complex?

In which environment was the experiment carried out (controlled/uncontrolled? closed/open? on simulated/actual cases of use?)?

Do the conditions sufficiently represent the actual conditions that will be met during the deployment?

Have appropriate precautions been taken, such as systematic control by a human operator which could then be reduced in the production phase?

Developing and training an algorithm

Designing and developing a reliable algorithm

Applying a meticulous training protocol

Checking the quality of the system in a controlled environment

Would you like to contribute?

This can also interest you ...

CNIL: search form

Developing and training an algorithm

Designing and developing a reliable algorithm

Applying a meticulous training protocol

Checking the quality of the system in a controlled environment

Would you like to contribute?

This can also interest you ...