Mastodon Kuan0: Bias – AI vs. name fields in databases / forms

Monday 28 June 2021

Bias – AI vs. name fields in databases / forms

AI discrimination, due to past biases built into training data, is touted as a massive problem, notably when it reflects bias based on racial or ethnic origin. This is Art.9 special category data, as all GDPR practitioners know. A famous example is car insurance quotes being about £900 higher for people named Mohammed, compared with quotes for those named John, even when the other details were identical (although it's unclear whether any artificial intelligence was involved there).

What's in a name?

However, there's an even more basic concern about names. This arises, not from emerging technologies like machine learning, but just from everyday life progressively going digital and online, no doubt accelerated by the Covid-19 pandemic.

People and, crucially, the organisations they have to interact with, must increasingly rely on electronic records or digital databases to store personal data and other information.

More and more, we are forced to fill in online (or other electronic) registration forms to obtain services or goods. Those form fields are often completed by someone other than the individual seeking to obtain services or goods, e.g. an organisation's staff member may input details of new clients or customers.

People, and those organisations, rely on these records and databases to be accurate especially as, more and more, online transactions rely on correct identification and authentication. Art.5(1)(c), to drop in another GDPR provision.

However, too many electronic forms for the input of people's data are coded based on unconscious biases: namely, that people's names are always Western in format, typically Anglo-Saxon, with a single one-word first name and single one-word surname, and maybe sometimes a one-word middle name.

This isn't a new problem. A W3C document from a decade ago, 2011, urged an internationalised approach to names when designing forms, databases, ontologies, etc. for the Web – but, I'd now say, a global approach must be taken much more generally, towards the design of all databases and forms - not just those used "for the Web".

It's not simply a technical issue of "forms validation" (where an electronic system refuses to accept the name you're trying to enter, the computer says No, even though it actually is your real name!). It's an organisational issue: database/forms design, staff awareness and training are vital too. All staff must be conscious of this issue when taking down and entering customers' names into systems.

What triggered this blog?

I have a Chinese first name. I don't have a middle name. I have a two-word first name, "Wai Kuan". I go by "Kuan" for ease, as that's the short form of my first name – it's like going by "Liz", when your name is "Elizabeth". Also, going by "Wai Kuan" risks "witty" quips like "Why not Kuan?", so I don't!

Some organisations have entered my "first name" on their systems as "Wai", others as "Kuan". All this without asking me how my "first name" should appear on their systems, or on my payment cards, etc etc – although, to be fair, I think one bank did actually ask me once. (And btw, with Chinese names the "first" name is the surname, the personal name usually appears second, not first – but I've just given up and anglicised the order of my real name for ease.) If someone's name was Philip, would you automatically enter his first name on your system as "lip" without asking him, or query his identity if one source said he was "Phil" and another source said he was "Philip"? I don't think so.

I spend far too much of my life trying to sort out problems arising from organisational mismatches in my name, or mis-spellings of my name. Recently, I received a rejection because one organisation's receipt had my first name down as "Wai", whereas the other had noted my name as "Kuan". You might think an explanation of the reason for the rejection would have been merited but, no, the standard message they sent me just implied that I hadn't filled in all the (other) details correctly – whereas in fact the problem was due to the first name mismatch, even though my surname was clearly the same! I had to waste my time, and theirs, calling to find out the real reason, i.e. the "first name" mismatch.

Now, if one organisation had put my name down as Liz and the other as Beth, do you think they'd have automatically rejected my request - or let it go through? Or do you think they might have, at least, sent me a message properly explaining that it was the first name discrepancy that was of concern? If that isn't indirect ethnic discrimination, I don't know what is. I keep, continually, having to ask for receipts to be issued just to "W K", yet some organisations still get it wrong, or are huffy when I ask them to reissue their receipts, or both.

First name, middle name, surname, hyphens, apostrophes...

Even people of Western origin can be affected by this problem, particularly those from Southern USA. The actor Billy Bob Thorton has a two-word first name. His first name is "Billy Bob". It's not "Billy", and "Bob" is not his middle name. Same with actor John David Washington, whose first name is "John David". As female examples, there's tennis player Billie Jean King, and singer Sarah Jane Morris. I've also seen two-word first names with no space or hyphen in between, just a capitalisation of the second name, like MaryAnn. Other people may have more than one middle name.

So, please don't always assume the first word is a "first name" and the second word is a "middle name"! (yes, I get "Kuan" entered as my middle name, even though I constantly stress that I have a two-word first name). I also know English people with double-barrelled surnames. Some with hyphens, some without. Name fields must also allow two-word surnames! (and hyphens in first names, as some people have hyphenated first names - e.g. actor Mary-Louise Parker). Allowing apostrophes in names would also help people of e.g. Irish descent, and yes please preserve the way people capitalise their names and don’t "auto correct" to perdition. If someone spells their name without a hyphen, please train staff not to hyphenate it when entering it on your systems. I don't know how many times I've had to say it's not "Wai-kuan" or even "Wai-Kuan", when someone has unthinkingly added the hyphen without my actually using the word "hyphen", and always without asking me. If I spell it as "space", that means there's a space between the two words, not a "hyphen" – there's a difference between a space and a hyphen, you know! (At least no one has ever tried to call me "Wai Space Kuan" - yet.)

Minimum and maximum lengths for name fields

Finally, don't assume that names must always have a certain minimum or maximum length. It's tough enough for me, having a 3-letter surname (on spelling my surname out over the phone, I once got asked, "Is that all?!"). Take pity on people like politician Cédric O, and actors Maggie Q and Jet Li. Or Thanita Phuvanatnaranubala or Bhadajarabhakinai Dhanarpitivongsavadhadhana (from Thailand), and Dr Tedros Adhanom Ghebreyesus (WHO's current Director-General). A well known data protection-related website, that I won't name, rejects attempts to register for its events if you enter a single letter in either first name or surname - that's not considered valid, so pity Mr O if he tries to attend one of their events! (At least they accept 2 letters so Mr Li will be just fine, luckily for him.)



"The computer says no"

There also seems to be a mentality of "the computer is always right", "what's on the system is always right", which completely ignores the possibility that the staff member who first input someone's details onto the system might have misheard or misspelled the name, unilaterally added a hyphen for no reason, etc. etc. etc. I won't give details of the hoops organisations have made me jump through to get them to correct my name on their systems.

Again, they always assume that the staff member who first entered my name must always be correct, more correct than the person whose name it actually is! Even when they first got my name from a third party source, and not from me. Or even when their staff made errors when inputting it, although my name was perfectly correct on the paper form I had sent in. (I'll mention just one hoop – sometimes I have had to make them go check against the name on the paper form, or the name my bank has recorded for me, before they're willing to correct my name on their systems.)

GDPR to the rescue?

I've had to resort to sending DSARs, more specifically Art.16 data subject rectification requests, to the data protection contact details set out in privacy notices, in order to get organisations to correct my name on their systems. Often, that's after repeated fruitless calls to customer "service" "help" lines - who haven't been of much service, or any help. I don't want to waste the time of DPOs or privacy teams, who I feel have much better things to do with their limited time and resources, but I haven't had any other choice. Thank goodness for GDPR!

Obviously, there is an Art.5(1)(d) accuracy issue in relation to wrongly-input names. There's also an issue regarding Art.25 data protection by design and by default, particularly in relation to database and web form fields, as controllers are supposed to take account of "risks of varying likelihood and severity for rights and freedoms of natural persons" - not just data protection rights, but also the right not to be discriminated against or "singled out" based on racial or ethnic reasons. And a broader Art.5(2) and Art.24 accountability issue, including in relation to staff training. (It could involve Art.22 automated decision-making too, if someone can't access certain services, online or otherwise, because their name is "too short" or "too long" for the system (as designed) to accept it, or it "doesn't match the system" because staff entered their name wrongly!)

What to do?

The W3C document says it all – in real life, ethnic or racial discrimination doesn't arise only from AI bias. I wish all organisations would read that document, train staff on those issues, and apply its guidance fully when designing name fields for databases and web forms, and when their staff enter data into name fields. That's the only solution.

Otherwise, we'll risk facing a very Kafkaesque future, where what services or goods we can obtain, and with what degree of difficulty, will depend entirely on how organisations (often wrongly) first decided to enter our names on their systems.

From my experiences, we're already halfway there. Although my name is correctly spelled within the email address from which I send emails to organisations, or indeed is correct on organisational systems, I still keep receiving email replies or other correspondence addressed to me with a Q or a Kw, etc. I'm often called "Kuon". Even "Juan", although I'm not actually of Spanish origin – my photo might provide a bit of a clue about that.

I also feel sorry for people with names like "Null", given that we no longer have any choice about the computerisation of our names. But, that's a different problem…