A couple of weeks ago, I attended a workshop on streaming data
organised by the Turing Gateway to Mathematics, at Cambridge University[1].
The meeting brought together some formidable mathematical brains with the sorts
of people that might want to use those brains. People with data.
I was one of those visitors. I gave a talk about the BGS’
new UKGEOS project[2] which
aims to collect data from a whole range of new sensors at two sites: in England
near Chester, and in Scotland, on the east side of Glasgow. The sites will
collect data from boreholes on groundwater, seismic activity, ground motion,
and a range of other variables. The sites will be observatories that will match
the ambition and science presence of some of our more famous observatories such
as Jodrell Bank and the Royal Observatory at Herstmonceux – only the UKGEOS
observatories will point down into the ground rather than up into the sky.
UKGEOS - instrumenting the Earth |
The reason I wanted to talk at the meeting was that I was
sure that the UKGEOS data and the way it will be collected will be of interest
to statisticians– mainly because it represents a new realm for data. Oil
companies collect data on subsurface reservoirs, and volcanologists and
seismologists collect data to assess hazards, but comprehensive data on the
subsurface isn’t collected. It’s the last great frontier – we have ‘macroscopes’
for the atmosphere and oceans, but so far nothing for the subsurface.
Two trends have prompted this. One is the need to understand
the subsurface better – because we build on the ground and tunnel through it,
and because we extract things from the ground and store things in the ground.
It’s clear that to do this sustainably we need to understand the subsurface
better.
The other trend has been in technology. A ‘geological
macroscope’[3] is
now within our grasp because of the technology that’s available – more sensors
that are capable of withstanding the challenging conditions in the deep
underground, better visualisation technology – and most importantly - bigger
computing capability.
In the end we’ll want to use the UKGEOS sites to understand
the subsurface – the fluids that flow through it, and the way that it changes
day to day and hour to hour. Of course like meteorologists, we geologists would
also like to be able to predict what’s going
to happen – not in the atmosphere but in the underground. How does
groundwater quality change from day to day? How do the shallow geothermal resources
change with the seasons? How sustainable is shallow geothermal for the UK? How
do we know when sinkholes or landslides are going to happen?
So a big draw of collecting all this subsurface data is the
ability for geologists to forecast change. And this was the main reason why I
attended the Cambridge meeting. To meet lots of clever people who are used to
looking at data and who are interested in looking for trends that presage
change.
At the beginning of the event, the audience was
treated to an anecdotal story and cautionary tale exactly on those lines – the
perils of prediction. The story detailed a bank that almost went out of business
because of the inability of its computer systems to tell between anomalies and
so called ‘change points’. The latter are more structural changes, while the
former are essentially ‘blips’. Statistical algorithms that controlled an
automatic buying and selling strategy at the bank, misidentified a change point
as an anomaly and continued trading at highly unfavourable terms. So the bank
took a big hit.
Discussions at the event also considered the ways that
personal medicine – streamed data on personal health - could be used to discern
dangerous trends and predict catastrophic health failures in individuals.
The implications for subsurface data are obvious. Complex statistics
are used in earthquake seismology aiming for the holy grail of earthquake
prediction or at the less difficult game of predicting aftershocks – but how
could some of the techniques being discussed be used for more low key
subsurface natural processes like subsidence or groundwater drought prediction?
Clearly better process understanding is needed – but perhaps some of the statistics
that aim to distinguish change points and anomalies could be useful too for
forecasting, but also perhaps more mundanely to just spot imminent sensor
failure.
The Cambridge event ended with a talk from Jeremy Bradley of
the Royal Mail Data Science Group – which took a different tack. It seems that
some of our long standing institutions are beginning to realise the value of their
infrastructure in the new world of environmental sensors. The physical
infrastructure that the Royal Mail controls in order to deliver its parcels and
letters is huge. The service delivers 50000 letters per day to 24 million
addresses. It has 115000 postboxes visited regularly - and 40000 post vans – as
well as 20000 hand trolleys. Most of these follow the same route every day. The
Royal Mail Data Science Group wonders if sensors could be mounted on this
physical infrastructure – for air quality monitoring for example – or traffic. The
Royal Mail’s infrastructure can’t offer a clear geological angle, but there is
a lot of other subsurface infrastructure. I wonder what geological use sensors in our subsurface
water pipeline network might be put to? What other subsurface physical
infrastructure could be used for gathering underground data? Time will tell!
Comments