Mastodon Kuan0

Monday, 18 November 2024

AI: legitimate interests, controller/processor questions - data protection/privacy

Under GDPR, can personal data be processed based on legitimate interests for AI-related purposes, whether for training, deployment, or beyond? That was the key focus of the EDPB stakeholder event on AI models on 5 Nov 24 that I was registered to attend and was fortunate enough to get a place - many thanks to the EDPB for holding this event!

The event was intended to gather cross-stakeholder views to inform the EDPB's drafting of an Art.64(2) consistency opinion on AI models (defined quite broadly) requested by the Irish DPC.  The EDPB said it will issue this opinion by the end of 2024 but, unlike EDPB guidelines, such consistency opinions can't be updated - which is concerning given how important this area is.

The specific questions were:

  1. AI models and "personal data" - technical ways to evaluate whether an AI model trained using personal data still processes personal data? Any specific tools / methods to assess risks of regurgitation and extraction of personal data from AI models trained using personal data? Which measures (upstream or downstream) can help reduce risks of extracting personal data from such AI models trained using personal data? (including effectiveness, metrics, residual risk)
  2. Can "legitimate interest” be relied on as a lawful basis for processing personal data in AI models? 
    1. When training AI models - and what measures to ensure an appropriate balance of interests, considering both first-party and third-party personal data?
    2. In the post-training phase, like deployment or retraining - and what measures to ensure an appropriate balance, and what if the competent supervisory authority found the model's initial training involved unlawful processing?

There wasn't enough time for me to explain my planned input properly or to comment on some issues,  given the number of attendees, so I am doing it here. I'll take the second set first.

Training AI models - legitimate interests

I strongly believe legitimate interest should be a valid legal basis for training AI with personal data. Particularly training AI to reduce the risk of bias or discrimination against people, when the AI is used in relation to them. 

I had a negative experience with facial biometrics. The UK Passport Office's system kept insisting my eyes were shut, when they were wide open - they're just small East Asian eyes, white people's eyes are usually bigger. Others have suffered far worse from facial biometrics and facial recognition, including wrongful arrests, denial of food purchases, debanking (see my book and 23.5 of the free companion PDF under Facial recognition). 

Had the AI concerned been trained on more, and enough, non-white faces, it would be much less likely to claim facial features that didn't match typical white facial features were "inappropriate" (like eye size, hair shape), or to misidentify the wrong non-white people leading to their wrongful arrests.

The EU AI Act is aware of this risk: Art.10(5) (and see Rec.70) specifically permits providers of high-risk AI systems to process special categories of personal data, subject to appropriate safeguards and meeting certain conditions:

  1. the bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data;
  2. the special categories of personal data are subject to technical limitations on the re-use of the personal data, and state-of-the-art security and privacy-preserving measures, including pseudonymisation;
  3. the special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons have access to those personal data with appropriate confidentiality obligations;
  4. the special categories of personal data are not to be transmitted, transferred or otherwise accessed by other parties;
  5. the special categories of personal data are deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first;
  6. (GDPR) records of processing activities include reasons why processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data.
(Aside: I know that Article also mentions "appropriate safeguards", but I'd argue that meeting those conditions would provide the minimum required safeguards - although in some cases others could be considered necessary.)

The Act confines this permission to the use of special category data in high-risk AI systems, but I'd argue that legitimate interests should permit the use of non-special category personal data through meeting the above conditions (and any other appropriate safeguards). 

Recall that personal data can be processed under GDPR's legitimate interests legal basis if "necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child." The EDPB's recent guidelines on processing based on Art.6(1)(f) note three cumulative conditions to enable processing based on legitimate interests:

  • the pursuit of a legitimate interest by the controller or by a third party;
  • the need to process personal data for the purposes of the legitimate interest(s) pursued; and
  • the interests or fundamental freedoms and rights of the concerned data subjects do not take precedence over the legitimate interest(s) of the controller or of a third party

Let's review those in turn.

Detecting and correcting bias involves the pursuit of a legitimate interest of the controller, i.e. AI developer, and third parties. I'd argue that many, many third parties, being those in relation to whom the AI is to be used, have a legitimate interest in not being discriminated against due to biased AI. I've already mentioned biased AI resulting in wrongful arrests, denial of services important to life like food buying, and debanking (see 23.5 of my free PDF under Facial recognition). 

It is indeed necessary to process personal data of people in certain groups in order to train AI models to reduce bias, as per the experiences noted above and much more.

Finally, the balancing test in the final limb must clearly consider the legitimate interests, not just of the controller, but also of "a third party" - in this case, the legitimate interest of third parties, in relation to whom the AI is to be used, not to be discriminated against. (While fairness is a core principle of the GDPR, this only concerns fairness to the individual whose personal data is being processed. Processing A's personal data to try to ensure fairness to B isn't a concept explicitly provided for in GDPR. There are mentions of "rights and freedoms of others" or other data subjects, but more in the sense of not adversely affecting their rights/freedoms, rather than positive obligations in their favour.) 

I argue that, if the conditions in Art.10(5) AI Act are implemented as a minimum when training AI using personal data, that should tilt the balancing test in favour of the controller and those third parties, and enable legitimate interests to be used as the legal basis for the training - at least in the case of non-special category data - even when training non-high-risk AI. I really hope the EDPB will agree. 

However, the problem remains of how to use special category data to train non-high-risk AI systems to detect and address bias. Some examples I mentioned could fall through the cracks. 

The UK Passport Office's AI system, designed to reject photos with "inappropriate" facial features, is probably a high-risk AI system within Annex III para.5(a) (if the Act applied in the UK). Yet, para.5 (and Annex III more generally) does not protect anyone from being refused a private bank account or being debanked as a result of biased AI being applied to them.

And, a huge hole in the AI Act is this: Annex III para.1(a) excludes "AI systems intended to be used for biometric verification the sole purpose of which is to confirm that a specific natural person is the person he or she claims to be". What if an AI biometric verification system used by a bank mistakenly says someone is not who they claim to be, because it can't verify the identity of non-white people properly due to not having been trained on the faces of enough non-white people - and therefore the bank's systems automatically debanks that individual? How can such a biased AI biometric verification system be "fixed", if it can't be fully trained in this way?

Such an AI system is not classed as a high risk AI system, because of the biometric verification exclusion. Therefore, the developer isn't allowed to train the AI using special category data, because Art.10(5) AI Act only allows this for high-risk AI systems! (Yes, I know there's the odd situation where biometric data is "special category" data only when used for the purpose of uniquely identifying someone, so it could be argued that using non-white people's facial biometrics to train personal data isn't processing their special category data, because the processing purpose isn't to identify those specific people, and I'd certainly be putting that argument and pushing for being able to use legitimate interests for that training. But - really? Why should those arguments be necessary?)

It was argued that Art.9(2)(g) (necessary for reasons of substantial public interest etc) doesn’t allow processing of special category data to train AI, even though there is a substantial public interest in addressing bias. I agree there is a huge public interest there, but I also agree that, due to the wording of that provision, it can’t apply unless proportionate etc. EU or Member State law provides a basis for such processing. EU law in the form of AI Act Art.10* does provides a basis for processing special category data in high-risk AI systems - doesn’t provide such a basis in the case of non-high-risk AI, or non-special category data - hence the need to argue that biometric data isn’t special category when used for training! I guess it’ll have to be down to national laws to provide for this clearly enough. France, Germany or Ireland, perhaps?

(Consent isn’t feasible in practice here, given the volumes involved, and issues like having to repeat AI training after removing, from the training data, any personal data where consent has been withdrawn. It was argued that financial costs or training time for AI developers shouldn’t be relevant in data protection, but equally it was argued that environmental costs etc. of repeating training are relevant. I’ll only mention briefly practical workarounds, like not removing that data but preventing it from appearing in outputs using technical measures whose efficacy is debated)

If including my personal data in training datasets can help to reduce the risk of otherwise biased AI systems discriminating against you (should you be in the same ethnic or other grouping as me) when deployed,  personally I'd be OK with that - partly informed by my own bad experiences with AI biometrics. Shouldn't such processing of data for AI training be permitted, even encouraged? But, currently, this issue is not properly or fully addressed, as I've shown above. So, there's a big data dilemma here, that still remains to be dealt with.

AI models and personal data

Does an AI model "contain" personal data, given that strictly it's not a database per se? Or is it just something that can be used to produce personal data when used in deployment, with personal data being processed only at the usage stage? Much debate, and diametrically opposing views (and difficult questions like, can a GPAI model developer be said to control the purposes and means for which deployers of the model use it?). [Added: I meant to expand on that the clarify that question, is the model developer controlling the purposes of processing personal data, particularly with general-purpose/foundation models, or is it merely providing part of the means of processing to others, i.e. is it really a "controller"?]

Rather than pinhead-dancing around that question, personally I think that use of a deployed AI system is the most relevant processing here, because that's the main point at which the LLMs/large language models (that the event focused on, pretty much exclusively) could regurgitate accurate or inaccurate personal data - whether through prompt injection attacks or similar in the case of LLMs, or because a model's guardrails weren't strong enough. 

I feel the EDPB's query on technical ways to evaluate whether an AI model trained using personal data "still processes personal" data is really more one for technical AI experts to answer, and that what merits more attention is preventing training data's regurgitation/extraction at the deployment/use stage, whether personal data or otherwise.  It's well known that attacks have successfully obtained training personal data from models - although with some limitations and caveats (paper & articleanother article). This has been shown to be possible not only with open source models (where attackers obviously have access to more info about the model, its parameters etc., and indeed to the model itself), but even semi-open and closed source models like ChatGPT.

Again, my view is that assessing and reducing training data regurgitation/extraction risks are essentially questions for technical AI experts. Reducing such risks mainly involve technical measures, and this is an emerging area where much research continues to be conducted, so I feel it's premature to rule on such measures at this point in time (although organisational measures are also possible, and recommended, like deployers prohibiting their users from trying to extract personal data from any AI).

AI value chain: controllers, processors

More interesting, and difficult, from a GDPR perspective are the crucial questions of: who is a controller, who is a processor, who is liable for what, and at which stages in the AI lifecycle? 

Unfortunately, these weren't really discussed at the event. To be fair, the focus of the event was meant to be legitimate interests, not the controller/processor position of AI model/system providers. 

I still tried to raise them, but wasn't allowed to speak again to clarify my points, so I'll do that below in the form of some "exam questions". But, first, I want to spell out some issues with the AI supply chain that I couldn't expand on during the event.

If a developer organisation makes its own AI model available for customers to use, depending on the business model adopted by the organisation (and the following isn't comprehensive!), the supply chain can involve several alternative options:

  • The model could be accessed via the model developer's API, and/or
  • The model could be permitted to be:
    • Downloaded by customers as a standalone model, then 
      • Embedded/integrated within an AI system developed by the customer (which the customer could use internally only, or offer to its own customers in turn), or
      • Accessed by a customer-developed AI system (which the customer could use internally only, or offer to its own customers in turn) via API, where the downloaded model is hosted
        • on-prem, or 
        • (more likely) in-cloud, using the customer's IaaS/PaaS provider, but with all AI-related operations being self-managed by the customer, or 
    • (Common nowadays) deployed and used by the customer for the customer's AI system  (which the customer could use internally only, or offer to its own customers in turn), through the customer using a provider's cloud AI management platform with the benefit of tools/services available from the cloud provider to ease AI-related operations like fine-tuning models, building AI systems, using RAG, etc.
      • Note: the model used could be one of the cloud provider's own models (i.e. where the cloud provider is the model developer), or it could be a third-party model offered through the cloud provider's own AI marketplace or similar. Exactly what licence/contract terms apply to the customer in such a scenario, particularly with third-party models, let alone what the controller/processor position is there, is still clear as mud (see below).
Note that an AI system can use or integrate more than one AI model.

Also note that the above applies equally to how an AI system is accessed, i.e. via API, or by embedding the system within an AI product/solution/tool, or using a cloud AI management platform, and that an AI system can use or integrate more than one other AI system (i.e. rinse and repeat the above, on AI models, to AI systems). See my PDF that I'd previously uploaded to LinkedIn (with a small clarificatory update):

And I won't even mention the twists introduced by using RAG/retrieval-augmented generation in LLMs, at this point.

All that spelt out, now on to my exam questions!:

  1. After an organisation deploys a third-party model in an AI system
    1. If a user in the organisation deliberately extracts personal data from the AI without the deploying/employing organisation's authorisation
      1. Is the rogue user a controller in their own right, so that the organisation is not responsible as a controller under data protection law (as with the Morrisons case in the UK)?
      2. Does or should the AI model developer bear any responsibility or liability at the deployment and use stage as a controller in some way, if the guardrails they implemented against the extraction weren't appropriate? Or could it be a processor, particularly if the model is hosted by the model or system developer?
        1. Even if the model is considered not to "contain" any personal data, so that the model developer is not a controller of the model itself, could the model developer be considered to have some responsibility if and when personal data is extracted from the AI at the deployment and use stage?
        2. Remember, for security measures under GDPR, a security breach alone doesn't mean the security measures weren't appropriate; it's quite possible for an organisation that had implemented appropriate security measures to suffer a personal data breach nevertheless.
        3. Also to reiterate, measures to reduce the risk of extracting training data from AI models are still being developed, this is very much a nascent research area.
        4. Recall that a developer providing software for download/on-prem install is not generally considered a processor or controller, but when it offers software via the cloud as SaaS, it is at least a processor, even a controller to the extent it uses customer data for its own purposes. If a model developer makes available a model (software), but doesn't host it for customers, it seems the developer shouldn't even be a processor?

    2. If a user in the organisation deliberately extracts personal data from an AI with the deploying/employing organisation's authorisation (e.g. for research, or for the organisation's own purposes)
      1. Is the organisation a controller, responsible/liable for that extraction as "processing"? (and could the GDPR research exemption apply there, if for research?)
      2. Could the AI model developer and/or AI system developer bear any responsibility or liability for this extraction as a controller in some way, if the guardrails they implemented against the extraction weren't appropriate, as above, or as a processor? 
        1. Note the same points/queries apply as in 1.1 above!
    3. If a user in the organisation uses the AI in such a way that, without the user intending it, the AI regurgitates personal data, who is responsible as controller for the output, which is "processing"?
      1. Remember, a user could process personal data by including it in the input provided to the AI (not discussed further here), but personal data could also be processed if it is included in the AI's output
      2. Does or should the AI model developer and/or AI system developer bear any responsibility or liability as a controller in some way, if the guardrails they implemented against inadvertent regurgitation weren't appropriate, or could it be a processor, or neither?
        1. Note the same points/queries apply as with deliberate extraction, 1.1 above.
      3. What difference if any does it make if the personal data in the output is accurate, or inaccurate (e.g. defamatory of the individual concerned)? 
    4. If a person unrelated to the organisation, e.g. a third-party hacker, manages to access the deployed AI to extract training data such as personal data, is the deploying organisation responsible as controller? What about the model/system developer?

  2. Do any of the above apply, are they relevant, when a AI developer makes its model available to customers via the developer's API only? Is the model developer/provider a processor for customers in that situation?
    1. Again see 1.1 above. In particular, it seems the AI developer hosting the model offered to customers would at least be a processor here.

  3. What if a customer uses a third-party AI model hosted by the customer's cloud provider? Is the cloud provider only a processor for the customer, or could it be a controller in any way?
    1. Does it make a difference if the model used by the customer is the cloud provider's own model, or another party's model?
    2. Does it make a difference if the model's use is completely self-managed by the customer, or if the customer is using a cloud provider's cloud AI management platform?
    3. Do the license terms, cloud agreement terms and/or other terms applicable to the customer's use of the cloud service/AI platform affect the position (under GDPR it's the factual control of purposes and means that matters, and contract terms are not determinative, but nevertheless terms could influence the factual position in some cases, especially in what they permit or prohibit...).
    4. Indeed, back to the AI Act, who is the model provider - the AI platform provider, or the model developer?

  4. Rinse and repeat for AI system developers/providers - could they be responsible/liable as controllers and/or processors especially if a model provider hosts its model or AI systems using its model for customers in-cloud?
(There are many more questions and issues, these are just the key ones that spring to mind most immediately, believe it or not!)

Answers on a postcard...?

Saturday, 16 November 2024

Cyber Security & Resilience Bill: consultation

DSIT is seeking views on some measures planned under the UK Cyber Security and Resilience Bill, to be introduced in 2025 to update The Network and Information Systems Regulations 2018. I saw this a couple of days ago on the ICO's NIS webpage, then found more info on techUK's 8 Nov webpage.

Usefully, techUK has also listed all the consultation questions in one PDF, which is really helpful as, unlike EU consultations that usually offer a downloadable PDF listing the questions, sadly too many UK consultations expect respondents to go through a form page by page before they can see what the questions are, which wastes time for those wanting to provide considered responses to all questions holistically (some webpages don't even allow going back).

The deadline is soon according to ICO: 21 Nov 24, i.e. next Sunday!

As you'll know, the intention is to expand the NIS Regulations to catch even more types of organisations, and to reduce incident reporting deadlines (with staffing/costs implications for 24 hr reporting especially at the weekend). Some proposals resemble the changes under the EU's NIS2 Directive. Managed service providers will probably be brought into scope (proposed criteria below). Note the queries on the costs of rolling out MFA, and of password resets. DSIT is also asking competent authorities (but it seems not other stakeholders) whether data centres should be regulated. Interestingly, it also asks if any Competent Authorities currently review the supplier contracts of regulated entities for visibility into their supply chain, assurance of supplier cyber security and resilience measures, and/or have audit rights - familiar from GDPR, but could this be specifically required in future under NIS too?

Key excerpts:

Managed service providers (MSPs) to be brought within scope of Relevant Digital Service Provider (RDSP)

DSIT's proposed characteristics of a Managed Service Provider have 4 criteria:
1. The service is provided by one business to another business, and
2. The service is related to the provision of IT services, such as systems, infrastructure, networks, and/or security, and
3. The service relies on the use of network and information systems, whether this is the network and information systems of the provider, their customers or third parties, and
4. The service provides regular and ongoing management support, active administration and/or monitoring of IT systems, IT infrastructure, IT network, and/or the security thereof.


Incident reporting

Changes being considered "to ensure more incidents are reported and that incident information is communicated to relevant parties more quickly and clearly" include:
"1. A change to the definition of an incident under the existing NIS Regulations. To meet the current reporting threshold, an incident must have led to a significant or substantial disruption to service continuity. We are proposing to change the definition of a reportable incident to ensure that a wider range of incidents are captured, including incidents capable of resulting in a significant impact to service continuity and incidents that compromise the integrity of a network and information system.
2. A change to the amount of time an organisation has to report an incident from when it is detected. Currently, incidents must be reported without undue delay and no later than 72 hours after being made aware of the incident. We are assessing whether this time can be reduced to no later than 24 hours after being made aware of the incident.
3. New transparency requirements. We are considering introducing a transparency requirement which will ensure customers are notified of incidents which significantly compromise the integrity of a digital service upon which they rely."

On 24-hr reporting, DSIT wants to know:
1. Which members of staff are needed to develop and submit an NIS incident report?
2. Do you have the people required to submit an incident report already working weekend shifts?
3. Could you have staff on call as opposed to working weekend shifts in case there is the need to report an NIS incident? Could you save money by calling in members of staff when an incident is detected?
4. Is there a higher rate of pay for staff working weekends than those working during the week? If so, what overtime rate do staff get paid?

On transparency:
5. If an incident occurred which affected a service you provide, would you be able to
identify which customers have been affected? (‘Customers’ in this question should
be interpreted as businesses which rely on a digital service provider [cloud provider] for a service,
not individual clients.) If so, how long would it take to identify which customers have
been affected?
6. Do you have a plan in place for what to do if an incident occurs? [For RDSPs [i.e. cloud providers]]

MSPs:
7. [for OES] Do you use services provided by an MSP (or multiple MSPs) to deliver your essential service(s)? This would also include, for example, companies which provide IT outsourcing, BPO  (business process outsourcing) where it is provided through IT networks, or cyber security services.
    a. If yes, please provide examples of where these services provided by an MSP (or multiple MSPs) are critical to the provision of your essential service? (note: names of companies are not required)
8. [for RDSPs] Do you provide managed services? This would include, for example, providing IT outsourcing, Business Process Outsourcing (BPO) where it is provided through IT networks, or managed security services.
9. Do you provide Business Process Outsourcing (BPO) services that involve ongoing management of an IT system/ infrastructure/network and have a connection or access to the customer?
    a. If yes, please provide examples of the BPO services provided by your organisation.
10. Do you provide managed IT services that secure or manage operational technology (OT)?
    a. If yes, please provide examples. Detailed examples are welcome, particularly where these relate to critical national infrastructure (CNI).
11. Do you provide system integration?
    a. If yes, is the system integration provided as part of a managed service? Please provide examples of the system integration you provide as part of a managed service.
12. Do you provide telecommunications services (e.g. WAN, LAN)?
- If yes, please provide examples of the telecommunications services you provide.
- If yes, do you consider that any of these telecommunication services constitute a ‘managed service’?
- If yes, are these telecommunications services regulated under the Communications Act 2003?
13. Is the cyber security of the services you provide (in the UK or overseas) currently regulated? Are you currently regulated for the cyber security for any of your services offered (in the UK or overseas)?
    If yes, please provide details of these regulations.

[Questions about small and micro cloud or managed services in the supply chain]

Operational technology (OT):
15. Does your organisation use operational technology to manage any critical or essential services?
16. [if yes to 15] If you purchase operational technology (OT) from a vendor, do you maintain and operate it ‘in house’?
17. [if yes to 15] Do you outsource the management of operational technology (OT) to third party providers?
a. If yes, are these third party providers Managed Service Providers (MSPs)? (i.e., the same company that manages your IT systems/networks/Infrastructure)
b. If yes, please provide examples of operational technology (OT) that you outsource to third parties (note: a description of the company would suffice, names are not required)

Managing risks - costs impacts of serious incidents:
18. How much would it cost your organisation to conduct a full rollout of multi-factor authentication for all users?
19. How much would it cost your organisation to conduct a full organisation-wide reset of passwords?
20. What other actions do you anticipate you might need to take to protect your organisation in the event of a major cyber security attack or resilience incident?

[Some duplication: the next set of questions is for firms NOT regulated under NIS, including 24-hr reporting and staff costs, OT, managing risks, small/micro MSPs/cloud providers, MSPs]

25. If you purchase operational technology (OT) from a vendor, do you maintain and operate it ‘in house’?
26. Do you outsource the management of operational technology (OT) to third party providers?
    a. If yes, are these third party providers Managed Service Providers (MSPs)? (i.e., the same company that manages your IT systems/networks/Infrastructure)
    b. If yes, please provide examples of operational technology (OT) you outsource to third parties (note: a description of the company would suffice, company names are not required)    

Plus questions to competent authorities (CAs) re 24-hr reporting, staff etc., private vs. public organisations regulated and their size from micro to large, and:
38. Do any Competent Authorities currently review the supplier contracts of regulated entities to ensure that appropriate measures are being taken to manage supply chain risk? E.g. that regulated entities have visibility of their suppliers’ supply chain, have some level of assurance of the cyber security and resilience measures followed by their supplier, and/or have the right to audit their supplier? If so, please share details

Data centres
39. How many standalone data centres are owned and operated by OES/RDSP/MSP businesses under your remit in the UK?
40. Do you include standalone data centres owned and operated (enterprise data centres) by OES/RDSP businesses under your remit in your supervisory activity?
    a. If no under your current scope, have you previously considered or are you currently considering expanding your supervision to focus on your sector’s enterprise data centres?
    b. If yes, what compliance obligations are applicable to and what assurance is required in relation to OES/RDSP owned-and-operated data centres? For example, appropriate and proportionate measures + CAF.
    c. If yes, are there any measures or assurance designed for the data centre infrastructure that you apply and/or assess for your sector's data centres (or that guide your supervision) under the NIS? For example, standards designed for operational resilience of data centre infrastructure, the cyber security of operational technologies/industrial control systems, or levels of physical security of data centres.
41. To what extent do you agree with the following statements:
    a. It would be beneficial to have standardised guidance on “appropriate and proportionate” measures in relation to the security and resilience of data centres / data centre infrastructure
(Strongly agree/Agree/Neither agree nor disagree/Disagree/Strongly disagree)
    b. UK third-party operated data centres should be brought into the scope of the NIS under dedicated supervision with a view to protecting them as CNI and OES/RDSP supply chains?
(Same range from Strongly agree to Strongly disagree)

Saturday, 19 October 2024

Things AI, Oct 2024

AI tool for meeting recordings, taking notes, creating draft documents: ICO says if not used for new purpose, can rely on previous legal basis. For any new processing activity/purpose, identify lawful basis! NB. update privacy notice, accuracy, ADM, profiling, consider any data sharing with tool provider (ICO last updated date still says April but this Q&A is new since Sept).

EU AI Act contractual clauses drafted by SCL (I've not reviewed them myself). And, the Commission seeks feedback on a draft implementing regulation for scientific panel of AI experts to assist the AI Office.

EU algorithms regulation: don't forget the EU Platform Work Directive, just approved by the Council; 2-year transposition deadline. This aims to improve working conditions and protection of  personal data in platform work (i.e. gig economy workers like drivers) by, among other things, promoting transparency, fairness, human oversight, safety and accountability in algorithmic management in "platform work". It will require measures on algorithmic management of people performing platform work in the EU, including those with no employment contract/relationship. Chapter III on algorithmic management limits certain processing of personal data by means of automated monitoring systems or automated decision-making systems, such as personal data on emotional or psychological state. Similarly where "digital labour platforms" use automated systems taking or supporting decisions that affect persons performing platform work; personal data processing by a digital labour platform by means of automated monitoring systems or automated decision-making systems is deemed high risk, requiring a DPIA under GDPR, and more, as well as detailed transparency requirements on automated monitoring systems and automated decision-making systems, and obligations regarding human oversight and human review, etc. There's certainly overlap with both GDPR and the AI Act.

US EO14110: NIST 1-pg summary of progress to date & next steps.

Open source AI: a draft definition 1.0-RC1 is open for comment. FAQs; and must all training data be made available for openness?

Federated learning: scalability challenges in privacy-preserving federated learning (UK RTAU & US NIST collaboration). (For an explanation of federated learning, please see my book)

UK AI Safety events: the Nov 2023 summit cost £27.7m; plus info on the Nov 2024 event incl. criteria for invites (names of invitees were withheld for data protection reasons, but names of their organisations were also withheld, not clear why): from FOI requests.

Financial services/finance/securities:

Training data collection, not just by web scraping!: certain robot vacuums were found to collect photos and audio to train AI, so big security and privacy risks with some robotic hoovers, though reportedly the privacy notice was suitably expansive (but who reads those?!, covering wholesale data collection for research including: device-generated 2D/3D map of user's houses, voice recordings, photos or videos! Talk about hoovering up data for AI training...😉🙄

LLMsstill can't do maths or reasoning (Apple researchers)

G7 Hiroshima AI Process (recall the Code of Conduct etc.) progresses:

  • G7 ministerial declaration 
  • Overview of the OECD pilot of the Hiroshima artificial intelligence process reporting framework (for the international code of conduct for organizations developing advanced AI systems, like foundation models/GPAI) - summary by the Italian presidency (pilot phase); G7 joint statement 
  • G7 toolkit for AI in the public sector - "a comprehensive guide designed to help policymakers and public sector leaders translate principles for safe, secure, and trustworthy Artificial Intelligence (AI) into actionable policies" - of interest/use to the private sector too. And see the Ada Lovelace Institute's Buying AI: s the public sector equipped to procure technology in the public interest? 

Adtech: IAB Tech Lab's AI in advertising primer.

Recommender systems: seem to be particularly targeted, e.g. under the EU Digital Services Act (DSA) (and see ICO brief consultation re using children's data for recommender systems).

AI in healthcare: increasing focus e.g. by Google, Microsoft. See below on the new UK RIO.

LinkedIn & AI: LinkedIn may have agreed not to train AI using UK users' data, but it plans in its new user agreement to put all responsibility for AI-generated content on users - even though, when a user wants to start a new post, it encourages users to "try writing with AI"!


Fairness: evaluating first-person fairness in chatbots (PDF)

AI hype, costs cf productivity (is AI making work worse?) and environmental impact (is nuclear the answer?) vs. examples of AI uses: detecting that UK family court judges used victim-blaming language in domestic abuse cases; stymying mobile phone thieves; cancer detection (UKRI, gov news); pollen & allergies; UK Royal Navy like predictive maintenance; helping sustainable cities; fertilisation treatment

UK AI research programs: include wearable tech to help drug addicts; building resilience against AI risks like deepfakes, misinformation, and cyber-attacks.

UK Regulatory Innovation Office: the RIO promised in the Labour manifesto has been launched, within DSIT, "to reduce the burden of red tape and speed up access to new technologies... like AI training software for surgeons to deliver more accurate surgical treatments for patients and drones which can improve business efficiency", with the 4 initial areas including AI and digital in healthcare, and connected and autonomous technology. The RIO "it will support regulators to update regulation, speeding up approvals, and ensuring different regulatory bodies work together smoothly. It will work to continuously inform the government of regulatory barriers to innovation, set priorities for regulators which align with the government’s broader ambitions and support regulators to develop the capability they need to meet them and grow the economy... The new office will also bring regulators together and working to remove obstacles and outdated regulations to the benefit of businesses and the public, unlocking the power of innovation". But the RIO's first Chair has yet to be appointed, working 4-5 days a month (apply!). FT article (paywall).

(See also my blog on data protection & cyber security)

Data protection & cyber security, Oct 2024

Cookies: consent or pay OK in UK? ICO says it's a business decision by the organisation, it holds no info! (FOI).

EU NIS2 Directive: applies from 18 Oct 2024 (news): see Commission implementing regulation on requirements for digital services incl. cloud, CDN, online marketplaces, social networks; too few Member States have transposed it into national law (published Commission list, so far just Belgium, Croatia, Italy, Latvia, Lithuania). Not listed doesn't mean "not implemented": a country might not have notified the Commission yet, or the Commission might not have added it to that list yet. But it's clear some Member States have missed the deadline, like Ireland (draft law heads of Bill). Microsoft has been quick off the mark to tout how Azure can help NIS2 compliance.

EU Cyber Resilience Act (CRA)adopted by the Council in Oct 24, on security requirements for "products with digital elements" (software or hardware products and their remote data processing solutions, including software or hardware components being placed on the market separately). NB "remote data processing" as defined could catch some cloud servces. Applicable 36 months after CRA becomes effective (should be published in OJ in a few weeks), with some transitional provisions.  Views that the CRA is an "accidental European alien torts statute"! Separately, the US CISA/FBI have published for consultation draft guidance on product security bad practices.

Revised EU Product Liability Directiveadopted by the Council in Oct 24, see some previous blog commentary on software/SaaS being caught, and defects including cybersecurity issues. Liability on repairers, compensation claims easier for claimants, importers/EU representatives can be liable for products of non-EU manufacturers. 2-year transposition period after it becomes effective (should be published in the OJ soon).

EU CSAM Regulation: recently revived by the Council's Hungarian presidency which suggested the amended compromise text. Remember, this would catch online service providers, such as providers of hosting services and interpersonal communications services. Currently this would apply 24 months from its effective date. (The previous temporary derogation from the ePrivacy Directive to allow scanning for CSAM was extended to 3 Apr 2026, in Apr 24.)

UK Product & Metrology Bill: the Delegated Powers and Regulatory Reform Committee has reservations, see my previous comments on LinkedIn including that things are mostly left to delegated legislation.

Backdoors?: but, note that any encryption/other backdoors into apps/products/networks, or special keys "only" for government access, will threaten everyone's security (as noted regarding Global Encryption Day, 21 Oct 2024!). Example: it seems Chinese hackers got into US broadband providers' networks and acquired information "from systems the federal government uses for court-authorized wiretapping".

Passkeys: more secure than passwords (see my book free PDF!), it's great that this "passwordless" option is increasingly being adopted, and increasingly interoperable cross-platform: see passkeys on Windows, and Google's passkey syncing.

Ransomware, sanctions: individuals with links to Russian state and other prolific ransomware groups, including LockBit, have been found and sanctioned. NCA newshistory of Evil Corp (not on technical matters)

Software bill of materials (SBOM): more from the US NIST e.g. on framing software component transparency (what's SBOM? CISA FAQ, resources, SBOM in SaaS/cloud, SBOM for assembled group of products. SBOM is explained in my book). I do feel contracts should include SBOM provisions.

IoT:

UK NCSC guidance:

Microsoft, Cybersecurity and Infrastructure Security Agency (CISA) and the National Cybersecurity Alliance (NCA) Be Cybersmart Kit for Cybersecurity Awareness Month (which is October) also focuses on the basics: use strong passwords and consider a password manager; turn on MFA; learn to recognize and report phishing; keep software updated.

Quantum techICO views; UK government response on regulating quantum applications; cybersecurity risks from quantum computing and steps for financial authorities and institutions (see the G7 Cyber Expert Group statement on planning for the opportunities and risks of quantum computing)

US & transfersCommission's report on the first periodic review of the functioning of the adequacy decision on the EU-US Data Privacy Framework (DPF). Separately, industry body CCIA's comments on digital trade barriers affecting US companies include, for the EU (detailed PDF), data and infrastructure localization mandates and restrictions on cloud services (citing e.g. the EUCS, NIS2, Data Act), and restrictions on cross-border data flows (under not just GDPR but also the Data Act and Data Governance Act)

Other ICO:

  • Levales solicitors reprimand: "A threat actor accessed Levales’ cloud-based server using legitimate credentials and subsequently published data on the dark web". Levales "did not have Multi-Factor Authentication (MFA) in place for the affected domain account. Levales relied on computer prompts for the management and strength of password and did not have a password policy in place at the time of the incident. The threat actor was able to gain access to the administrator level account via compromised account credentials. Levales Solicitors LLP have not been able to confirm how these were obtained." And see above, NCSC and cybersecurity awareness month guidance reiterating the importance of using MFA, especially for cloud!
  • New data protection audit framework launched, including toolkits (on areas like securitypersonal data breach detection/preventionAI), framework trackers (similar areas), resources, case studies
  • From 11 Oct 24, businesses must try online resources "Instead of first calling our phone line..." - will the expected increase in the data protection fee change this?
  • Children's data: ICO's further short consultation on its Children's Code (on use of children’s personal information in recommender systems, use of PD of children <13) has closed, sorry I didn't have time to blog it earlier this month
  • Cyber investigations/incidents: latest datasets, for Q1 24/25 published
  • ICO DPIA for its use of Canva - interestingly, here as in some other FOI responses, the ICO redacted internal tech info like, in this case, detailed links: "The disclosure of extended links reveals the ‘make up’ of our SharePoint system. Due to the nature of information this reveals, this information increases our vulnerability to cyber attacks."
    • Is security by obscurity really the best approach here? Previously, when asked for a "list of all the variable names in the database, together with any descriptive/user guides of the variable names in the database list of all the variable names in the database, together with any descriptive/user guides of the variable names in the database" for the ICO's database of data security incident trends, the ICO refused, saying "if disclosed, such information could be used by malicious actors seeking criminal access to our information and systems". It even took the view that "The size of our internal security team is exempt from disclosure to you under section 31(1)(a) of the FOIA, as it could make the ICO more vulnerable to crime".
  • Facial recognition:
  • One court order for winding-up (liquidation) on ICO petition in Q2 24/25, wonder who?

Cyber Security Breaches Survey (UK, annual): how could this be developed and improved? DSIT call for views (survey questions), deadline 23:59, 4 Nov 24. 

Cloud: NIST's A Data Protection Approach for Cloud-Native Applications (note: here "data protection" means protecting all types of data, not just personal data), and see NCSC on MFA and cloud

UN Cybercrime Convention: concerns continue to be raised (see other critiques summarised in my book and free PDF).

Adtech: the IAB has published its Repository of European IAB’s Initiatives for Responsible Digital Advertising with helpful links to its key docs on data protection, DSA etc. It also published, for consultation, a proposed privacy-centric Attribution Data Matching Protocol (ADMaP), a data clean room interoperability protocol for attribution measurement (tech specs) "that enables advertisers and publishers to measure attributions using Privacy Enhancing Technologies (PETs) in a  Data Clean Room (DCR) and protecting their user’s Personal Identifiable Information". 

GDPR non-material damage: CJEU case, reiterating that mere GDPR infringement isn't damage, but an apology could be sufficient compensation if previous position can't be restored, as long as it's full compensation; controller attitude/motivation irrelevant in awarding smaller compensation than the damage suffered. (I'd add, an apology is not full compensation without a binding promise not to do something similar again in future!)

GDPR Procedural RegulationEDPB statement; the Council's Data Protection Working Party will be discussing the draft Regulation on 24 Oct 24.

Digital identity:

Other EDPB:

  • Adopted a raft of docs including
    • Opinion 22/2024 on certain obligations following from the reliance on processor(s) and sub-processor(s), produced on the Danish SA's request (industry association BSA has raised concerns that these requirements are at odds with market practice, supply chain relationships, etc.)
    • For consultation, Guidelines 1/2024 on processing of personal data based on Article 6(1)(f) GDPR, deadline 20 Nov 24
      • Note: I've not read properly but there's at least one oddity. The cases the EDPB relied on to argue that personalised advertising is "direct marketing" don't actually say that. "However, CJEU case law suggests that personalised advertising could be considered a form of direct marketing" - well no, the para referenced stated processing for direct marketing may be for legitimate interests, not that personalised ads are direct marketing! Similarly, arguments about "communications" being for direct marketing skate over the case cited clearly being about "electronic mail" as defined in the ePrivacy Directive. I think we'd all agree that ads in emails are direct marketing, but the EDPB seems to be arguing that, under that case, all commercial communications like personalised ads are direct marketing. This can't follow from that case, which is clearly confined to "communications covered by Article 13(1)" of the ePrivacy Directive such as email.
    • Work programme 24-25
    • Granting Kosovan Information and Privacy Agency observer status for the EDPB's activities (contrast the polite No post-Brexit to the UK's then Information Commissioner, in a letter whose reference, coincidentally or not, was "OUT2020-0110"!)
    • Next coordinated enforcement action in 2025 will be on erasure (right to be forgotten, RTBF)
  • Final Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive i.e. "cookie" consent but much more; local processing, like on-device processing for AI/machine learning, is still caught according to the EDPB, if anything is sent to the "entity producing the client-side code". Small AI models that can "fit" on user devices are emerging, and may represent the only way forward for users who want AI applications on their phones, at this rate!
  • Response to the European Commission concerning the EDPB work on the interplay between EU data protection and competition law (DMA etc.: still working on it!)

For amusement value only: ICO FOI response, non!

(See also blog on AI and, just because, UK Attorney-General's speech on the rule of law in an age of populism, Commission webinars on development of model or standard contractual terms for data sharing and switching between data processing services i.e. cloud services under the EU Data Act, and EU Digital Services Act DSA transparency database researchers' workshop)

Sunday, 6 October 2024

Things data protection / privacy (some AI), Sept/Oct 2024

GDPR Procedural Regulation: the Council seems to be progressing this, in October 2024.

CJEU cases: there have been several lately that others have covered, such as on commercial interests possibly being legitimate interests, so I won't for now. I just want to highlight a case from a few months back, which is relevant to employee policies and training/awareness-raising, and possible strict liability to pay compensation to data subjects, at least for infringements arising from employee action/inaction.

Adtech: IAB Tech Lab has launched, for public consultation, its PAIR protocol 1.0 for a "privacy-centric approach for advertisers and publishers to match and activate their first-party audiences for advertising use cases without relying on third-party cookies". Initially donated by Google, PAIR has been developed into "an open standard that enables interoperability between data clean rooms and allows all DSPs to adopt the protocol for enhanced privacy-safe audience targeting".

Equality, AIThe public sector equality duty and data protection, Sept 2024, UK EHRC guidance (with ICO input), including helpful examples of proxy data for protected characteristics under the UK Equality Act 2010, and a short section on proxy analysis of AI models, with a case study on the Dutch benefit fraud scandal that led to unlawful discriminatinon (from using biased predictive algorithms).

Open-source AI: from UK ICO's previously-asked questions, this Q&A was added recently even though currently the "Last updated" date indicates 11 April 2024.
Q: We want to develop a speech transcription service for use in our organisation, using an open-source artificial intelligence (AI) model. Can we do this even though we don’t have detailed information about how the model was trained? (see the answer! It seems call transcription is a popular use of AI, see other Q&A on that webpage on that topic, e.g. this and this. Also, compare a Danish SA decision from June 2024 on the use of AI to analyse recordings of phone calls.)

Oral disclosures?: talking of contrasting approaches, compare a Polish SA decision holding that oral disclosure of personal data during a press conference was not in breach of GDPR, whereas an Icelandic SA decision ruled that oral disclosures by police under the Law Enforcement Directive infringed that Directive.Yes, different laws, but they ought to be interpreted consistently. And I don't get how oral statements amount to "processing" wholly or partly by automated means under EU data protection laws, just as I don't get how there have been so many fines in the EU/UK regarding paper records without first holding that they form part of a "filing system" as defined.

ICO big PSNI fine: well-known by now (news release, MPN), but it underlines the point that the many surnames can be unique, and indicate religion and/or ethnicity (see Equality above on proxy data).

ICO: selected recent ICO disclosures, that the ICO decided to publish following FOI requests to it:

  • How the ICO assesses incidents / possible personal data breaches: ICO internal guidance (request, PDB assessment methodology as of June 2023); seems to be based on ENISA's risk assessment for PDBs, which is unsurprising as that has been endorsed by both EDPB and ICO
  • Territorial scope under UK GDPR, DPA 2018: ICO internal guidance (request, copy)
  • What's a restricted transfer outside the UK: ICO internal guidance (request, copy); taking the outdated and misguided view that "transfer" is based on transfer of personal data's physical location, which is at odds with the ICO's own public guidance on transfers!
  • How does ICO decide whether to publicise its intention to fine (request, emails on decision, more info)? This was on one concrete situation, but it's helpful to know the factors, again unsurprising, which I summarise below:
    •  ICO default posture of transparency, although it considers each circumstance.
    • This is consistent and fair with other similar cases where it has publicised the information at this stage.
    • For deterrence regarding perceived central provider issues: "We are seeing a pattern of central providers having security issues with consequences for patients, publishing this will act as a learning/ deterrent for other processors with large central contracts, including the provisional fine will help clarify the seriousness of these issues".
    • "The case has been extremely well reported and is well known, so this reduces the potential additional impact on the organisation and there is limited dispute about the facts of the attack."
    • "Publishing the NOI [notice of intention to fine] and the provisional fine will help improve information rights practice and compliance among those we regulate."
    • While it is possible that the fine value will change, as it is "provisional and subject to reps", this was balanced "the possible criticism of the ICO for changing the fine amount as the process concludes vs. the benefit of being transparent about the process... Idemonstrating that, if it does change, that is proof that the ICO does consider reps carefully and takes action based upon reps. This can serve to increase confidence in and awareness of our processes. I am comfortable that, subject to including suitable language to make clear it is provisional, that this risk is managed and the benefit is greater."
    • "in this case, I have decided that publicity at this point allows for improved public protection from threat and hence is overridingly in the public interest. It is also already in the public domain."

DRCF: UK regulators the Digital Regulation Cooperation Forum are seeking input on their 2025/26 workplan by 8 Nov 2024. Unsurprisingly, the work includes AI, but also bilateral work on data protection and online safety, competition and data protection and illegal online financial promotions, and risks and opportunites of emerging technologies like digital identity, digital assets and synthetic media.

Data protection fee: The consultation on increasing the UK data protection fee has closed. The ICO's own response supported the increase, but didn't advocate for any change in the bases for charging the fee, although the government was open to views on that, so it seems there will just be an increase in fee levels but no substantive changes to the bases.

Dark patterns: while not limited to data protection, see OECD dark patterns on online shopping: countdown timers, hidden information, nagging, subscription traps, forced registration and privacy  intrusions, cancellation hurdles. Not dissimilar to the issues previously raised by UK regulators ICO and CMA on online choice architecture, control over personal data and harmful designs in digital markets.

Data transfers under the UN Digital Compact ("a comprehensive framework for global governance of digital technology and artificial intelligence"): the text is a bit vague and general on cross-border data flows, and 2030 is not exactly near-term!:

46. Cross-border data flows are a critical driver of the digital economy. We recognize the potential social, economic and development benefits of secure and trusted cross-border data flows, in particular for micro-, small and medium-sized enterprises. We will identify innovative, interoperable and inclusive mechanisms to enable data to flow with trust within and between countries to mutual benefit, while respecting relevant data protection and privacy safeguards and applicable legal frameworks (SDG 17).

47. We commit, by 2030, to advance consultations among all relevant stakeholders to better understand commonalities, complementarities, convergence and divergence between regulatory approaches on how to facilitate cross-border data flows with trust so as to develop publicly available knowledge and best practices (SDG 17)...

...We encourage the working group to report on its progress to the General Assembly, by no later than the eighty-first session, including on follow-up recommendations towards equitable and interoperable data governance arrangements, which may include fundamental principles of data governance at all levels as relevant for development; proposals to support interoperability between national, regional and international data systems; considerations of sharing the benefits of data; and options to facilitate safe, secure and trusted data flows, including cross-border data flows as relevant for development (all SDGs).

But on data protection more broadly, Objective 4. Advance responsible, equitable and interoperable data governance approaches, data privacy and security:

"We recognize that responsible and interoperable data governance is essential to advance development objectives, protect human rights, foster innovation and promote economic growth. The increasing collection, sharing and processing of data, including in artificial intelligence systems, may amplify risks in the absence of effective personal data protection and privacy norms...

...We commit, by 2030, to: (a) Draw on existing international and regional guidelines on the protection of privacy in the development of data governance frameworks (all SDGs); (b) Strengthen support to all countries to develop effective and interoperable national data governance frameworks (all SDGs); (c) Empower individuals and groups with the ability to consider, give and withdraw their consent to the use of their data and the ability to choose how those data are used, including through legally mandated protections for data privacy and intellectual property (SDGs 10 and 16); (d) Ensure that data collection, access, sharing, transfer, storage and processing practices are safe, secure and proportionate for necessary, explicit and legitimate purposes, in compliance with international law (all SDGs); (e) Develop skilled workforces capable of collecting, processing, analysing, storing and transferring data safely in ways that protect privacy (SDGs 8 and 9).

Survey on attitudes and awareness of emerging technologies, data protection, and digital products: There was a recent government survey of the UK public on the level of adoption and awareness of blockchain and immersive virtual worlds, attitudes towards pricing on digital platforms and behaviours regarding personal data control. But I can't yet find a summary of its outcomes, just the raw data.

Hungary: the Commission's decision to refer Hungary to the CJEU argues that Hungary's national law on the Defence of Sovereignty is in breach of EU law, including the e-Commerce Directive, the Services Directive, as well as EU Data protection legislation.

Canada: if attacker accesses and encrypts data without exfiltration for ransom purposes, that is still considered a breach that must be notified to affected individuals under Ontario’s Personal Health Information Protection Act (PHIPA), and the Child, Youth and Family Services Act (CYFSA).

Facial recognition & privacy / personal data: interesting and scary, students managed to adapt smart glasses to look up info on strangers in real-time, including parents' names!

(Also please see my blogs last week on security and AI: both have also been updated with more Sept links.)