Monday, 28 June 2021

Bias – AI vs. name fields in databases / forms

AI discrimination, due to past biases built into training data, is touted as a massive problem, notably when it reflects bias based on racial or ethnic origin. This is Art.9 special category data, as all GDPR practitioners know. A famous example is car insurance quotes being about £900 higher for people named Mohammed, compared with quotes for those named John, even when the other details were identical (although it's unclear whether any artificial intelligence was involved there).

What's in a name?

However, there's an even more basic concern about names. This arises, not from emerging technologies like machine learning, but just from everyday life progressively going digital and online, no doubt accelerated by the Covid-19 pandemic.

People and, crucially, the organisations they have to interact with, must increasingly rely on electronic records or digital databases to store personal data and other information.

More and more, we are forced to fill in online (or other electronic) registration forms to obtain services or goods. Those form fields are often completed by someone other than the individual seeking to obtain services or goods, e.g. an organisation's staff member may input details of new clients or customers.

People, and those organisations, rely on these records and databases to be accurate especially as, more and more, online transactions rely on correct identification and authentication. Art.5(1)(c), to drop in another GDPR provision.

However, too many electronic forms for the input of people's data are coded based on unconscious biases: namely, that people's names are always Western in format, typically Anglo-Saxon, with a single one-word first name and single one-word surname, and maybe sometimes a one-word middle name.

This isn't a new problem. A W3C document from a decade ago, 2011, urged an internationalised approach to names when designing forms, databases, ontologies, etc. for the Web – but, I'd now say, a global approach must be taken much more generally, towards the design of all databases and forms - not just those used "for the Web".

It's not simply a technical issue of "forms validation" (where an electronic system refuses to accept the name you're trying to enter, the computer says No, even though it actually is your real name!). It's an organisational issue: database/forms design, staff awareness and training are vital too. All staff must be conscious of this issue when taking down and entering customers' names into systems.

What triggered this blog?

I have a Chinese first name. I don't have a middle name. I have a two-word first name, "Wai Kuan". I go by "Kuan" for ease, as that's the short form of my first name – it's like going by "Liz", when your name is "Elizabeth". Also, going by "Wai Kuan" risks "witty" quips like "Why not Kuan?", so I don't!

Some organisations have entered my "first name" on their systems as "Wai", others as "Kuan". All this without asking me how my "first name" should appear on their systems, or on my payment cards, etc etc – although, to be fair, I think one bank did actually ask me once. (And btw, with Chinese names the "first" name is the surname, the personal name usually appears second, not first – but I've just given up and anglicised the order of my real name for ease.) If someone's name was Philip, would you automatically enter his first name on your system as "lip" without asking him, or query his identity if one source said he was "Phil" and another source said he was "Philip"? I don't think so.

I spend far too much of my life trying to sort out problems arising from organisational mismatches in my name, or mis-spellings of my name. Recently, I received a rejection because one organisation's receipt had my first name down as "Wai", whereas the other had noted my name as "Kuan". You might think an explanation of the reason for the rejection would have been merited but, no, the standard message they sent me just implied that I hadn't filled in all the (other) details correctly – whereas in fact the problem was due to the first name mismatch, even though my surname was clearly the same! I had to waste my time, and theirs, calling to find out the real reason, i.e. the "first name" mismatch.

Now, if one organisation had put my name down as Liz and the other as Beth, do you think they'd have automatically rejected my request - or let it go through? Or do you think they might have, at least, sent me a message properly explaining that it was the first name discrepancy that was of concern? If that isn't indirect ethnic discrimination, I don't know what is. I keep, continually, having to ask for receipts to be issued just to "W K", yet some organisations still get it wrong, or are huffy when I ask them to reissue their receipts, or both.

First name, middle name, surname, hyphens, apostrophes...

Even people of Western origin can be affected by this problem, particularly those from Southern USA. The actor Billy Bob Thorton has a two-word first name. His first name is "Billy Bob". It's not "Billy", and "Bob" is not his middle name. Same with actor John David Washington, whose first name is "John David". As female examples, there's tennis player Billie Jean King, and singer Sarah Jane Morris. I've also seen two-word first names with no space or hyphen in between, just a capitalisation of the second name, like MaryAnn. Other people may have more than one middle name.

So, please don't always assume the first word is a "first name" and the second word is a "middle name"! (yes, I get "Kuan" entered as my middle name, even though I constantly stress that I have a two-word first name). I also know English people with double-barrelled surnames. Some with hyphens, some without. Name fields must also allow two-word surnames! (and hyphens in first names, as some people have hyphenated first names - e.g. actor Mary-Louise Parker). Allowing apostrophes in names would also help people of e.g. Irish descent, and yes please preserve the way people capitalise their names and don’t "auto correct" to perdition. If someone spells their name without a hyphen, please train staff not to hyphenate it when entering it on your systems. I don't know how many times I've had to say it's not "Wai-kuan" or even "Wai-Kuan", when someone has unthinkingly added the hyphen without my actually using the word "hyphen", and always without asking me. If I spell it as "space", that means there's a space between the two words, not a "hyphen" – there's a difference between a space and a hyphen, you know! (At least no one has ever tried to call me "Wai Space Kuan" - yet.)

Minimum and maximum lengths for name fields

Finally, don't assume that names must always have a certain minimum or maximum length. It's tough enough for me, having a 3-letter surname (on spelling my surname out over the phone, I once got asked, "Is that all?!"). Take pity on people like politician Cédric O, and actors Maggie Q and Jet Li. Or Thanita Phuvanatnaranubala or Bhadajarabhakinai Dhanarpitivongsavadhadhana (from Thailand), and Dr Tedros Adhanom Ghebreyesus (WHO's current Director-General). A well known data protection-related website, that I won't name, rejects attempts to register for its events if you enter a single letter in either first name or surname - that's not considered valid, so pity Mr O if he tries to attend one of their events! (At least they accept 2 letters so Mr Li will be just fine, luckily for him.)



"The computer says no"

There also seems to be a mentality of "the computer is always right", "what's on the system is always right", which completely ignores the possibility that the staff member who first input someone's details onto the system might have misheard or misspelled the name, unilaterally added a hyphen for no reason, etc. etc. etc. I won't give details of the hoops organisations have made me jump through to get them to correct my name on their systems.

Again, they always assume that the staff member who first entered my name must always be correct, more correct than the person whose name it actually is! Even when they first got my name from a third party source, and not from me. Or even when their staff made errors when inputting it, although my name was perfectly correct on the paper form I had sent in. (I'll mention just one hoop – sometimes I have had to make them go check against the name on the paper form, or the name my bank has recorded for me, before they're willing to correct my name on their systems.)

GDPR to the rescue?

I've had to resort to sending DSARs, more specifically Art.16 data subject rectification requests, to the data protection contact details set out in privacy notices, in order to get organisations to correct my name on their systems. Often, that's after repeated fruitless calls to customer "service" "help" lines - who haven't been of much service, or any help. I don't want to waste the time of DPOs or privacy teams, who I feel have much better things to do with their limited time and resources, but I haven't had any other choice. Thank goodness for GDPR!

Obviously, there is an Art.5(1)(d) accuracy issue in relation to wrongly-input names. There's also an issue regarding Art.25 data protection by design and by default, particularly in relation to database and web form fields, as controllers are supposed to take account of "risks of varying likelihood and severity for rights and freedoms of natural persons" - not just data protection rights, but also the right not to be discriminated against or "singled out" based on racial or ethnic reasons. And a broader Art.5(2) and Art.24 accountability issue, including in relation to staff training. (It could involve Art.22 automated decision-making too, if someone can't access certain services, online or otherwise, because their name is "too short" or "too long" for the system (as designed) to accept it, or it "doesn't match the system" because staff entered their name wrongly!)

What to do?

The W3C document says it all – in real life, ethnic or racial discrimination doesn't arise only from AI bias. I wish all organisations would read that document, train staff on those issues, and apply its guidance fully when designing name fields for databases and web forms, and when their staff enter data into name fields. That's the only solution.

Otherwise, we'll risk facing a very Kafkaesque future, where what services or goods we can obtain, and with what degree of difficulty, will depend entirely on how organisations (often wrongly) first decided to enter our names on their systems.

From my experiences, we're already halfway there. Although my name is correctly spelled within the email address from which I send emails to organisations, or indeed is correct on organisational systems, I still keep receiving email replies or other correspondence addressed to me with a Q or a Kw, etc. I'm often called "Kuon". Even "Juan", although I'm not actually of Spanish origin – my photo might provide a bit of a clue about that.

I also feel sorry for people with names like "Null", given that we no longer have any choice about the computerisation of our names. But, that's a different problem…

Tuesday, 11 May 2021

Make EDPB webpages readable again - howto

The recent EDPB website redesign kills usability and ergonomics for those with widescreen monitor. Maybe they were trying to make the site user-friendly for mobiles/tablets, but the result is that it's user-unfriendly for desktop PCs/laptops.

Viewed on a computer with widescreen monitor, the left sidebar or margin's passed on, it's no more, it has ceased to be... it's an ex-margin! (hi Monty Python fans).

This means that the main webpage text is no longer centered onscreen.


Cue neckache or crick from having to twist or turn the head too far to the left (and hold it there) just to read the main webpage text! What to say, that's certainly one way to deter desktop/laptop website users.

This seems to be an EDPB website matter, as the general Europa website is still fine. But it can be a pain in the neck for EDPB website visitors, literally!

To center EDPB webpages on your widescreen monitor again and save your neck, three options to try:

1. Un-maximise ("restore") your browser window, move the window (or drag the left edge) to the right till the main text is centered onscreen, and read EDPB webpages only from the restored window. 

2. Use the Liquid Page bookmarklet (instructions are on that webpage), if you prefer to keep your browser window maximised. Then, you can drag the "Latest news" column on the right (and beige box behind it, and the flag thing) even further to the right, out of the way. Then drag the main text column to the right, and scroll down as usual. (Links won't work till you refresh the page, but you can rightclick the link and open in new tab). More troublesome maybe than 1., but I do like outside the box creative solutions - have fun dragging stuff around!

3. Simplest solution (tested in Chrome and Edge on a Windows 10 PC) - use my bookmarklet or favelet: /Fix EDPB. Instructions: ensure your browser bookmarks or favourites toolbar is visible, drag that link to the toolbar e.g. between other bookmarks then, when you're on a no-margin EDPB webpage, just click that bookmark. Or, if you prefer, follow the bookmarklet creation instructions under Solutions but, in step 4, name the bookmark whatever name you wish and, in step 6, instead of pasting the code shown there, paste the following code:
javascript:(function(){document.body.style.marginLeft = "500px";})();
All fixed, main text is centered on screen! This also narrows the text column so it's easier to read scrolling down.

  • Keyboard shortcut fans: hotkeys to run this in Chrome are Alt-e, b, then type the 1st letter of the bookmarklet name (I made that the / symbol here so as not to clash with other bookmarklets, but feel free to edit the bookmarklet's name yourself), then Enter if necessary.
  • Per webpage only: if you navigate to another EDPB webpage, you'll need to click the bookmarklet or use the hotkey again. It's a per page rather than permanent fix, as it adds a margin to the current page after it's been downloaded to your browser. Unfortunately it can't modify the original pages on the EDPB website, only the EDPB can do that.
  • Margin width:- a 500 pixel left margin works for me. If it's too narrow/wide for you, rightclick the bookmarklet in the toolbar, Edit, under URL just change 500 to 400 or 600 as you wish (but obviously don't change the rest of the code) and Save. 
My neck feels better already! (and this works on BAILIII too, BTW).

I hope this helps other EDPB website visitors too.

Saturday, 10 April 2021

Security / identity theft risks - reporting Covid-19 home test results

It's laudable that free Covid19 lateral flow home test kits became available in England yesterday, e.g. from pharmacies.

You're meant to report results even if negative (though that could be made clearer), by phone/online. But - then you get an email from Gov.uk Notify with your result, advocating continued social distancing etc - with your name, date of birth and NHS number, right at the top of the email! Full marks for promptness, but - for security/privacy...?

As is well known, email is insecure. If your email or the NHS's gets hacked, or intercepted, or shoulder surfed, bad guys can use your name, DoB and NHS no. for fraud and/or identity theft. I guard my DoB jealously, not just because some women don't like revealing their age (yes, I am over 30!), but because of this risk of crime. I only ever give my real DoB to government, health and financial organisations (perturbation anyone? 😁).

Too many organisations use just name and DoB to identify customers who contact them, sometimes combined with address/postcode, which usually aren't difficult for criminals to discover. (Recall that in Germany, for using just name and DoB for authentication, 1&1 got fined €9.55m, reduced by the court to €0.9m – which is still substantial.)

I'm OK with the UK DHSC requesting my DoB and NHS number (as long as they store it securely and share it securely and only on a need to know basis). But, I already know my own DoB and NHS no., wouldja believe it, and, with this type of home test kit, I do actually already know my result! There's absolutely no need to email any of that info to me.

Even if they'd adapted a previous standard form of email designed to go to people who didn't already know their results, again there's no need to include DoB or NHS number. (It's not just the DHSC - other organisations are guilty of emailing people with their DoB too, including an optician I was unfortunate enough to try using.)

I suspect that if I didn't give my DoB/NHS no. they wouldn't take my report, or if I asked for that info not to be automatically included in their followup email, they'd reply "The computer says no, the system hasn't been designed that way, we can't tell it to omit that info!"

Let's count the UK GDPR issues here:

  • Art.5(1)(f) integrity and confidentiality, and the related Art.32 security.
  • Art.5(1)(c) data minimisation, most definitely. 
  • (Not to forget Art.25 data protection by design & by default of course. And Art.35 on data impact assessments aka DPIAs.) 
Also, the UK NIS Regulations under the EU NIS Directive require operators of essential services or OESs (critical infrastructure, including the healthcare sector) to take appropriate and proportionate technical and organisational measures to manage risks to the security of their network and information systems. (Ironically, the DHSC doesn't seem to be caught under those Regs, although NHS Trusts are.)

The worst consequence of the DHSC's approach is that it might cause privacy/security-conscious people (like data protection professionals!) to decide not to report their test results (at least if negative) while it's not legally-required, in order to avoid the risk of fraud and identity theft. Meaning that the NHS may not receive fully comprehensive data...

Because, in connection with Covid-19, it handles sensitive, special category data like health data, the DHSC might be expected to be more careful about security and privacy than most. Our NHS heroes of course deserve our greatest respect and gratitude. But real security and privacy risks to individuals can be created unless everything is thought through carefully when conducting the DPIA (I hope there was one?) - even supposedly minor process issues like the content of standard followup emails after home test reports.

I've emailed the DHSC's data protection officer (at the email address in the privacy notice linked to from the test results reporting webpage), and I really hope the DHCS will change this risky practice ASAP.

Thursday, 11 February 2021

Digital Services Act infographic summary

Here's my infographic summarising the key liability and due diligence rules under the EU Digital Services Act, proposed in December 2020.