I picked up Big
Data by Timandra Harkness solely on the testimonials of Hannah Fry and Matt
Parker on the front and back covers. The 2017 printing that I read is a 300-odd
page general interest book about recent advances in big data.
"Big data"
starts off pretty boilerplate for the topic – with a lot of definitions about what
makes data "big"; volume, variety, velocity, and the like. It also
gives some historical context about the growth of data over time, through early
censuses, primitive computers, to today. The rest of this book is the result of
interviews across the world with people working on different big data projects.
Some of
these have received a lot of news coverage before, like the Large Hadron
Collider, and consumer profiling programs like those made by Dunnhumby. Others haven't
had as much attention, at least in the western world, like the field device that
identifies, and selectively kills, insects based on the sound they make as they
fly by.
There are two
things that separate Harkness's book from the rest of the literature on big
data I've read: Its overall approachability and the extra attention paid to
data ethics.
Regarding approachability,
this book keeps both technical and business jargon to an absolute minimum.
Someone who has only heard about big data in an article or two could pick up this
book and follow along with ease. That means a few things are some things are not
exactly right (life expectancy is the mean years of life, but the median is described),
but the descriptions are much, MUCH, easier to understand as a result.
If you've
read anything by Malcolm Gladwell (Outliers, Tipping Point) or Stephen Baker
(Final Jeopardy: The Story of Watson), you have a good idea of the difficulty
and scope Timandra Harkness's writing. If you've read any of the Freakanomics
series by Dubner and Levitt then you have an idea of the tone of "Big
Data".
Data ethics
is the focus of the third and final part of "Big Data". Harkness discusses
the rise, pushback, and implications of mass surveillance. The cases she covers
here include Stingray, a technology to locate a particular cellphone and
remotely collect a startling amount of metadata and even message content from
that phone. She writes about a technology that has been implemented in Oakland
to listen for gunshots, which has been used to also listen in on conversations.
She also
writes about the limits of confidentiality in the face of better and better
profiling, specifically the ability to identify individual people by combining disparate
datasets. This isn't always a bad thing; it can be used to find someone's medical
history and pertinent medical information when they show up to an emergency room
and aren't in a state to provide that information.
She talks about
the perpetuation of inequalities through algorithmic decision making (e.g. for
bank loans, criminal punishment and parole, job applications, and school admission),
even in cases where the algorithm was designed to avoid discrimination. Ethical
issues like this get rolled into a concept called 'algorithmic accountability' –
as in 'who is to blame when unfair discrimination happens because of a machine's
decision?'.
Other literature
has discussed data ethics before, but not usually in such an applied and
relevant manner. Most other work on the
topic I've read puts too much focus on ethical issues that existed before big
data, like abortion, or they require a background in philosophy to understand.
The "Big ideas" part of Harkness's "Big Data" does neither;
it gets straight to the newly emergent issues and does a good job of
summarizing the problems, potential benefits, and complications surrounding them.
There are large excerpts of these last few chapters that belong in, say, a
statistics or computing science course on data ethics.
- Jack Davis
No comments:
Post a Comment