Reuse of databases accessible online or provided by a third party: what checks are required to comply with the GDPR?

Publication of the French Data Protection Authority dated January 23, 2025

The reuse of databases accessible online or acquired from third parties (particularly data brokers) is a widespread practice, whether for scientific research, commercial prospecting, or the development of artificial intelligence systems.

In a publication dated January 23, 2025, the French Data Protection Authority – the “CNIL” – nevertheless recalls that the apparent availability of a database does not mean that its reuse is free of any constraints. The authority stresses the need for any reuser to verify beforehand that the creation and the making available of the database are not manifestly unlawful. Failing this, the reuser could incur liability, including criminal liability in certain cases, particularly for handling data originating from an offence.

Verifying the absence of manifest illegality

The CNIL first recalls a simple principle: it is prohibited to reuse data originating from a leak, theft, or, more broadly, a source whose criminal origin cannot be ignored.

In this respect, the reuse of a database originating from the “dark web” or a court decision finding an infringement of intellectual property rights (in particular those of database producers – Article L. 342-1 of the French Intellectual Property Code) is therefore a clear warning sign.

Beyond these textbook cases, the CNIL invites reusers to examine certain indicators that can help identify a clear risk of illegality in the creation of a database:

  • Check the source and documentation of the database: the description must clearly specify the origin of the data (e.g., an identified social network). A database containing data whose source is not identified should lead to the suspension of reuse until additional information has been obtained.

  • Verify that the collection and dissemination of data are based on an appropriate legal basis. For example, a database containing precise, non-anonymized geolocation data, which in principle requires the consent of the individuals concerned, must be subject to increased precautions. Conversely, a database composed of pseudonymized data, made public by the individuals concerned and not containing sensitive data, presents in principle a lower risk of illegality.

The CNIL specifies that these checks do not require an exhaustive audit of the original processing, but do require a reasonable examination of the available information (description of the database, context of dissemination, possible public sanctions, etc.).

Sensitive data and data relating to offenses: increased vigilance

The authority also draws attention to the possible presence of sensitive data (Article 9 of the GDPR) or data relating to offenses (Article 10). Their reuse is in principle prohibited, unless it can be based on one of the exceptions provided for by the GDPR or the French Data Protection Act.

In practice, the presence of such data should lead the reuser to carry out additional checks, in particular with regard to the collection of explicit consent or the manifestly public nature of the information concerned (for example, when individuals themselves have made certain information public on platforms accessible to all).

Compliance with subsequent processing remains essential

Finally, the CNIL points out that these prior checks do not in any way exempt the reuser from their own compliance obligations. Reuse constitutes a separate processing operation, which must be based on an appropriate legal basis, comply with the principles of minimization, purpose limitation, and information for data subjects, and, where applicable, be subject to a data protection impact assessment (DPIA).

The authority also recommends that relations with the original data controller be governed by contract, documenting in particular the source of the data, the legal basis for the initial processing, the purposes pursued, and the safeguards put in place.

* * *

In a context marked by the development of AI projects and the massive circulation of data sets, this publication highlights the importance for reusers to carry out prior checks and to keep documentation as part of their accountability obligations.