Sensing the Earth: UKGEOS, statistics and streaming data by Mike Stephenson

A couple of weeks ago, I attended a workshop on streaming data organised by the Turing Gateway to Mathematics, at Cambridge University[1]. The meeting brought together some formidable mathematical brains with the sorts of people that might want to use those brains. People with data.

I was one of those visitors. I gave a talk about the BGS’ new UKGEOS project[2] which aims to collect data from a whole range of new sensors at two sites: in England near Chester, and in Scotland, on the east side of Glasgow. The sites will collect data from boreholes on groundwater, seismic activity, ground motion, and a range of other variables. The sites will be observatories that will match the ambition and science presence of some of our more famous observatories such as Jodrell Bank and the Royal Observatory at Herstmonceux – only the UKGEOS observatories will point down into the ground rather than up into the sky.

UKGEOS - instrumenting the Earth

The reason I wanted to talk at the meeting was that I was sure that the UKGEOS data and the way it will be collected will be of interest to statisticians– mainly because it represents a new realm for data. Oil companies collect data on subsurface reservoirs, and volcanologists and seismologists collect data to assess hazards, but comprehensive data on the subsurface isn’t collected. It’s the last great frontier – we have ‘macroscopes’ for the atmosphere and oceans, but so far nothing for the subsurface.

Two trends have prompted this. One is the need to understand the subsurface better – because we build on the ground and tunnel through it, and because we extract things from the ground and store things in the ground. It’s clear that to do this sustainably we need to understand the subsurface better.

The other trend has been in technology. A ‘geological macroscope’[3] is now within our grasp because of the technology that’s available – more sensors that are capable of withstanding the challenging conditions in the deep underground, better visualisation technology – and most importantly - bigger computing capability.

In the end we’ll want to use the UKGEOS sites to understand the subsurface – the fluids that flow through it, and the way that it changes day to day and hour to hour. Of course like meteorologists, we geologists would also like to be able to predict what’s going to happen – not in the atmosphere but in the underground. How does groundwater quality change from day to day? How do the shallow geothermal resources change with the seasons? How sustainable is shallow geothermal for the UK? How do we know when sinkholes or landslides are going to happen?

So a big draw of collecting all this subsurface data is the ability for geologists to forecast change. And this was the main reason why I attended the Cambridge meeting. To meet lots of clever people who are used to looking at data and who are interested in looking for trends that presage change.

At the beginning of the event, the audience was treated to an anecdotal story and cautionary tale exactly on those lines – the perils of prediction. The story detailed a bank that almost went out of business because of the inability of its computer systems to tell between anomalies and so called ‘change points’. The latter are more structural changes, while the former are essentially ‘blips’. Statistical algorithms that controlled an automatic buying and selling strategy at the bank, misidentified a change point as an anomaly and continued trading at highly unfavourable terms. So the bank took a big hit.

Discussions at the event also considered the ways that personal medicine – streamed data on personal health - could be used to discern dangerous trends and predict catastrophic health failures in individuals.

The implications for subsurface data are obvious. Complex statistics are used in earthquake seismology aiming for the holy grail of earthquake prediction or at the less difficult game of predicting aftershocks – but how could some of the techniques being discussed be used for more low key subsurface natural processes like subsidence or groundwater drought prediction? Clearly better process understanding is needed – but perhaps some of the statistics that aim to distinguish change points and anomalies could be useful too for forecasting, but also perhaps more mundanely to just spot imminent sensor failure.

The Cambridge event ended with a talk from Jeremy Bradley of the Royal Mail Data Science Group – which took a different tack. It seems that some of our long standing institutions are beginning to realise the value of their infrastructure in the new world of environmental sensors. The physical infrastructure that the Royal Mail controls in order to deliver its parcels and letters is huge. The service delivers 50000 letters per day to 24 million addresses. It has 115000 postboxes visited regularly - and 40000 post vans – as well as 20000 hand trolleys. Most of these follow the same route every day. The Royal Mail Data Science Group wonders if sensors could be mounted on this physical infrastructure – for air quality monitoring for example – or traffic. The Royal Mail’s infrastructure can’t offer a clear geological angle, but there is a lot of other subsurface infrastructure.  I wonder what geological use sensors in our subsurface water pipeline network might be put to? What other subsurface physical infrastructure could be used for gathering underground data? Time will tell!