Ellen Broad – Medium
Facial recognition is the next big area where questions about data ownership, data accuracy and algorithmic bias will arise – and indeed are arising. Some of those questions have very close parallels with their equivalents in other areas of personal data, others are more distinctive – for example, discrimination against black people is endemic in poor algorithm design, but there are some very specific ways in which that manifests itself in facial recognition. This short, sharp post uses the example of a decision just made in Australia to pool driving licence pictures to create a national face recognition database to explore some of the issues around ownership, control and accountability which are of much wider relevance.
This is a long and detailed post, making two central points, one more radical and surprising than the other. The less surprising – though it certainly bears repeating – is that qualitative understanding, and particularly ethnographic understanding, is vitally important in understanding people and thus in designing systems and services. The more distinctive point is that qualitative and quantitative data are not independent of each other and more particularly that quantitative data is not neutral. Or, in the line quoted by Leisa Reichelt which led me to read the article, ‘behind every quantitative measure is a qualitative judgement imbued with a set of situated agendae’. Behind the slightly tortured language of that statement there are some important insights. One is that the interpretation of data is always something we project onto it, it is never wholly latent within it. Another – in part a corollary to the first – is that data cannot be disentangled from ethics. Taken together, that’s a reminder that the spectrum from data to knowledge is one to be traversed carefully and consciously.
This is a beguiling timeline which has won a fair bit of attention for itself. It’s challenging stuff, particularly the point around 2060 when “all human tasks” will apparently be capable of being done by machines. But drawing an apparently precise timeline such as this obscures two massive sources of uncertainty. The first is the implication that people working on artificial intelligence have expertise in predicting the future of artificial intelligence. Their track record suggests that that is far from the case: like nuclear fusion, full blown AI has been twenty years in the future for decades (and the study underlying this short article strongly implies, though without ever acknowledging, that the results are as much driven by social context as by technical precision). The second is the implication that the nature of human tasks has been understood, and thus that we have some idea of what the automation of all human tasks might actually mean. There are some huge issues barely understood about that (though also something of a no true Scotsman argument – something is AI until it is achieved, at which point it is merely automation). Even if the details can be challenged, though, the trend looks clear: more activities will be more automated – and that has some critical implications, regardless of whether we choose to see it as beating humans.
The internet runs on personal data. It is the price we pay for apparently free services and for seamless integration. That’s a bargain most have been willing to make – or at least one which we feel we have no choice but to consent to. But the consequences of personal data powering the internet reverberate ever more widely, and much of the value has been captured by a small number of large companies.
That doesn’t just have the effect of making Google and Facebook very rich, it means that other potential approaches to managing – and getting value from – personal data are made harder, or even impossible. This post explores some of the challenges and opportunities that creates – and perhaps more importantly serves as an introduction to a much longer document – Me, my data and I:The future of the personal data economy – which does an excellent job both of surveying the current landscape and of telling a story about how the world might be in 2035 if ideas about decentralisation and personal control were to take hold – and what it might take to get there.
There is plenty of evidence that data-driven political manipulation is on the increase, with issues getting recent coverage ranging from secretively targeted Facebook ads, bulk twitterbots and wholesale data manipulation. As with so much else, what is now possible online is an amplification of political chicanery which long pre-dates the internet – but as with so much else, the extreme difference of degree becomes a difference of kind. This portmanteau article comes at the question of whether that itself puts democracy itself under threat from a number of directions, giving it a pretty thorough examination. But there is a slight sense of technological determinism, which leads both to some sensible suggestions about how to ensure continuing personal, social and democratic control – but also to some slightly hyperbolic ideas about the threat to jobs and the imminence of super-intelligent machines,
The first half of this paper is a slightly breathless and primarily US-focused survey of the application of AI to government – concentrating more on the present and near future, than on more distant and more speculative developments.
The second half sets out six “strategies” for making it happen, starting with the admirably dry observation that, “For many systemic reasons, government has much room for improvement when it comes to technological advancement, and AI will not solve those problems.” It’s not a bad checklist of things to keep in mind and the paper as a whole is a good straightforward introduction to the subject, but is very much an overview, not a detailed exploration.
It’s a surprisingly common mistake to design things on the assumption that they will always operate in a benign environment. But the real world is messier and more hostile than that. This is an article about why self-driving vehicles will be slower to arrive and more vulnerable once they are here than it might seem from focusing just on the enabling technology.
But it’s also an example of a much more general point. ‘What might an ill-intentioned person do to break this?’ is a question which needs to be asked early enough in designing a product or service that the answers can be addressed in the design itself, rather than, for example, ending up with medical devices which lack any kind of authentication at all. Getting the balance right between making things easy for the right people and as close to impossible as can be managed for the wrong people is never straightforward. But there is no chance of getting it right if the problem isn’t acknowledged in the first place.
And as an aside, there is still the question of whether vehicles are anywhere close to be becoming autonomous in the first place.
This is a long, but fast moving and very readable, essay on why AI will arrive more slowly and do less than some of its more starry-eyed proponents assert. It’s littered with thought-provoking examples and weaves together a number of themes touched on here before – the inertial power of the installed base, the risk of confusing task completion with intelligence (and still more so general intelligence), the difference between tasks and jobs, and just how long it takes to get from proof of concept to anything close to real world practicality. There are some interesting second order thoughts as well. There is a tendency, for example, to assume that technologies (particularly digital technologies) will keep improving. But though that may well be true over a period, it’s very unlikely to be true indefinitely: in the real world, S-curves are more common than exponential growth.
A straightforward and very useful post which does exactly what the title says – a step by step explanation of what AI can now do, what it might be able to do and what there is currently no prospect of its doing (for economic as well as technical reasons).
It is – or should be – well known that 82.63% of statistics are made up. Apparent precision gives an authority to numbers which is sometimes earned, but sometimes completely spurious. More generally, this short article argues that humans have long experiences of detecting verbal nonsense, but are much less adept at spotting nonsense conveyed through numbers – and suggests a few rules of thumb for reducing the risk of being caught out.
Much of the advice offered isn’t specific to the big data of the title – but it does as an aside offer a neat encapsulation of one of the very real risks of processes based on algorithms, that of assuming that conclusions founded on data are objective and neutral, “machines are as fallible as the people who program them—and they can’t be shamed into better behaviour”.
Were the Beatles average? This is Matthew Taylor in good knockabout form on a spectacular failure to use data analysis to understand what takes a song to the top of the charts and, even more bravely, to construct a chart topping song. The fact that such an endeavour should fail is not surprising (though there are related questions where such analysis has been much more successful, so it’s not taste as such which is beyond the penetration of machines), but does again raise the question of whether too much attention is being given to what might be possible at the expense of making full use of what is already possible. Or as Taylor puts it, “We are currently too alarmist in thinking about technology but too timid in actually taking it up.”
This is an extract from a new book, The Mathematical Corporation: Where Machine Intelligence and Human Ingenuity Achieve the Impossible (out last month as an ebook, but not available on paper until September). The focus in the extract focuses on the ethics of data, with a simple explanation of differential privacy and some equally simply philosophical starting points for thinking about ethical questions.
There is nothing very remarkable in this extract, but perhaps worth a look for two reasons. The first is that the book from which it comes has a lot of promise; the second is a trenchant call to arms in its final line: ethical reasoning is about improving strategic decision making.
This is a short sharp summary of how biases affect AI design and what to do about them, reaching the conclusion that government oversight is essential (though not, of course, sufficient). There are interesting parallels with Google’s in house rules for working on AI, so worth reading the two together.
What’s the best way to arrange the nearly 3,000 names on a memorial to the victims of 9/11 to maximise the representation of real world connectedness?
Starting with that arresting example, this intriguing essay argues that collection, computation and representation of data all form part of a system, and that it is easy for things to go wrong when the parts of that system are not well integrated. Focus on algorithms and the distortions they can introduce is important – but so is understanding the weaknesses and limitations of the underlying data and the ways in which the consequences can be misunderstood and misrepresented.
If machine learning is not the same as human learning, and if machine learning can end encoding the weaknesses of human decision making as much as its strengths, perhaps we need some smarter ways of doing AI. That’s the premise for a new Google initiative on what they are calling human centred machine learning, which seems to involve bringing in more of the insights and approaches of human-centred design together with a more sophisticated understanding of what counts as a well-functioning AI system – including recognising the importance of both Type I and Type II errors.
Artificial intelligence is more artificial than we like to think. The idea that computers are like very simple human brains has been dominant pretty much since the dawn of computing. But it is critically important not to be trapped by the metaphors we use: the ways in which general purpose computers are not like human brains are far more significant than the ways in which they are. It follows that machine learning is not like human learning; and we should not mistake the things such a system does as simply a faster and cheaper version of what a human would do.
Does the power of big data combined with location awareness result in our being supported by butlers or harassed by stalkers? There’s a fine line (or perhaps not such a fine line) between being helpful and being intrusive. Quite where it will be drawn is a function of commercial incentives, consumer responses and legal constraints (not least the new GDPR). In the public sector, the balance of those forces may well be different, but versions of the same factors will be in play. All of that, of course, is ultimately based on how we answer the question of whose data it is in the first place and whether we will switch much more to sharing state information rather than the underlying data.
If it’s hard to explain how the outputs of complex systems relate to the inputs in terms of sequential processes steps, because of the complexity of the model, then perhaps it makes sense to come at the problem the other way round. Neural networks are very crude representations of human minds, and the way we understand human minds is through cognitive psychology – so what happens if we apply approaches developed to understand the cognitive development of children to understanding black box systems?
That’s both powerful and worrying. Powerful because the approach seems to have some explanatory value and might be the dawn of a new discipline of artificial cognitive psychology. Worrying because if our most powerful neural networks learn and develop in ways which are usefully comparable with the ways humans learn and develop, then they may mirror elements of human frailties as well as our strengths.
This is a neat summary of questions and issues around the explicability of algorithms, in the form of an account of a recent academic conference. The author sums up his own contribution to the debate pithily and slightly alarmingly:
Modern machine learning: We train the wrong models on the wrong data to solve the wrong problems & feed the results into the wrong software
There is a positive conclusion that there is growing recognition of the need to study the social impacts of machine learning – which is clearly essential from a public policy perspective – but with concern expressed that multidisciplinary research in this area lacks a clear home.