Provenance: Regulating data gathering by generative AI providers

Provenance in the art world is a big deal. It’s the difference between a multimillion dollar Picasso guarded in a gallery and a pleasing reproduction languishing in a spare room. Provenance is about tracing evidence for the origin and chain of ownership of a piece to show it is authentic, and that its current ownership is lawful.

The same concept of provenance can be applied to the training data underpinning AI models. Questions arise such as: Where did the training data originate? Was it lawfully acquired, and how has it changed hands? Answering these questions becomes challenging with proprietary AI models, where developers often guard the details of the scope and source of their data sets.

As financial services firms weigh the deployment of generative AI (gen AI), they will need clear guidance on how to ensure the provenance of the training data underpinning the AI models and services they source from suppliers.

The creators of AI models have a voracious appetite for training data. To satisfy this, AI firms have turned to scraping the public internet as a data source. However, in casting their net so widely, model builders risk acquiring personal information that identifies individuals. This places them within the regulatory purview of bodies responsible for data protection and guarding the privacy of citizens.

The UK’s Information Commissioner’s Office (ICO) has begun consulting on how the approach to gathering training data will be policed. In its first, consultation, it offered an analysis of what it considers to be a lawful basis for acquiring training data (based on Article 6(1)(f) of the UK GDPR).

The ICO noted that scraping-based data gathering is ‘invisible’, taking place without notifying or gaining the consent of data subjects. The ICO considers invisible processing a high-risk activity. By categorising as such, the ICO draws attention to additional obligations model builders and users have. They must show that they have carefully considered how they are balancing their business objectives with the impact their data gathering could have on the rights of private citizens.

Flawless Money responded to the ICO’s consultations on web scraping for generative AI to highlight 3 issues concerning the financial services sector:

How can firms demonstrate robust due diligence of AI Vendors?
How should firms monitor the output of AI services?
What must firms disclose to customers when adopting AI?

We expand on each of these point in turn below.

How can firms demonstrate robust due diligence of AI Vendors?

While firms can delegate the creation and operation of AI models to 3rd party suppliers, they retain a responsibility to assess and demonstrate (to regulators) that their choice of supplier is sound. Recent standards like ISO 42001 for AI management and ISO 23894 for AI risk management show promise as a way of demonstrating a supplier’s good practice, but certification against these standards is still in its infancy.

Due diligence is not only about assessing that the AI suppliers’ data gathering methods are lawful. The responsibility also includes making sure that the output controls and filters suppliers use in their AI services are enough to catch and fix inaccurate and biased behaviour.

How should firms monitor the output of AI services?

Financial services firms need clarity about any obligations regulators will impose on them for exercising oversight of the output of AI services. Firms are accountable for decisions their AI arrives at and potentially liable for information or advice which their use of AI provides to customers (for example customer services chatbots). Monitoring issues include:

Is continuous monitoring of AI responses required, or is sampling or periodic audits sufficient to demonstrate an AI system’s responses remain within acceptable bounds?
What is the appropriate balance between using automated filters and manual checks to monitor the truthfulness and fairness of AI responses?
How can responsibility for monitoring AI outputs be divided between the AI service provider and the fintech using the service? More concretely, who bares liability when things go wrong.

What must firms disclose to customers when adopting AI?

The Department for Science Technology and Innovation issued guidance to UK regulators (including the ICO) in February 2024. The guidance sets out the principles the government sees as underpinning any regulatory efforts. Among the principles is ‘Appropriate transparency and explainability’. These properties are central to building and maintaining trust between firms and their customers over the use of AI.

Where firms are introducing AI into existing financial products and services, must they communicate this change to their customers? The DSTI guidance suggests customers should know when they ‘are affected by or engaging with an AI system’ and that they have enough information to exercise any rights arising from the involvement of AI.

Clear regulatory guidance is required for how the transparency principle will be met when AI is integrated into existing services. For example, the distinction between:

Disclosing the deployment of AI so that a customer could discover its use
Actively notifying customers that AI is being used in decisions or responses
Seeking consent to use AI when offering services to customers (and consequently dealing with dissent)

A corollary of transparency, which makes the use of AI evident, is the need to tell customers of any rights they have. For example, customers may wish to seek an explanation of an AI decision or to contest an AI decision they believe to be unfair.

The analysis the ICO has provided on the lawful basis for gathering training data to build models is a valuable starting point. As financial services firms turn to AI to enhance their services, they will need clear guidance on how to conduct due diligence, monitoring, and disclosure in an AI context.

Posted: 21 March 2023

Want to comment or have questions? You can contact the AI team at Flawless via:

Disclaimer: The information provided in this blog post is for general informational purposes only and does not constitute advice, legal or otherwise.