Wednesday, 24 August 2016

PropBase: the world's first true geo-data warehouse...by Andy Kingdon

The British Geological Survey has led the world in using 3D geological models, rather than 2D maps, for improving scientific understanding or communicating science. Such models often need to be created very quickly to visualise and understand the issues and then explain the implications. But producing such models requires easy access to complex geological information; accurate models require multiple datasets to be supplied and interrogated in common formats.

Yet of the huge amount of geological data that BGS holds from borehole or surface and subsurface sampling, we actually acquire a relatively small of the data ourselves, the rest is deposited with BGS for statutory reasons. Therefore a new geological model might incorporate data collected by any (or even all) of the water, energy, minerals and construction industries, each delivered in their own formats to their own standards.

For the last 25 years, BGS has been holding data in normalised structured databases. Normalised data means holding every element of your records separately and once only, with keys to then link these fields together. This is highly efficient, it means that whenever you move house your credit card company only needs to change one address field and not every transaction record. The disadvantage of such an approach can be slow response time and complex data structures for scientists to interrogate. Creating a 3D model might involve interrogate 17 major databases, with 50 datatypes containing millions of records. Until now, each dataset has to be searched separately using different tools and then laboriously reformatted. So making a “first-look” 3D model could involve several days work before any interpretation can occur.

To solve this we’ve adapted another idea from the finance insurance industry and built PropBase the world’s first true geo-data warehouse. A data warehouse takes a copy of the original data (thereby ensuring its integrity), then reformats the data back together in a standardised structure, and outputs them in common formats. Given that all data used in 3D models has a broadly common structure (a location in 3D, then the datatype, its value and any qualifiers) they can be imported in a common way. Therefore PropBase outputs these data standardised into a common set of multiple output formats from each record (e.g. a GIS shape file, a CSV for importing into modelling packages or webservices for machine-to-machine interrogation) by simply flicking a “switch” to toggle between them. The key advantages are massively improved data response times to querying and standardised outputs so data can be imported much more quickly by modelling software, these same ideas are now being used as templates for more complex datatypes such as real-time streaming of sensor outputs.

PropBase Explorer tool showing spatial search for physical property data within the area of interest defined by a
mapped rectange
So now regardless of whether your density or porosity data comes from field tests by geotechnical companies, rock tests of borehole core from the oil or water industries or even some geophysical log data, this can all be simply exported in a common format from a single interface with a few mouse-clicks and imported into your modelling platforms using a single input routine because they are identically formatted (but still separate so any systematic data mismatches can be identified), the time taken to do this has decreased from many hours or even days to a few minutes. Now modelling geologists can spend their time modelling not writing database calls.

Our new publication in Computing and Geosciences defines this new data structure and applies a model for how scientists can effectively access and serve complex multiple spatially enabled structures. If you find it useful please cite it. (Please note that this is behind a paywall, researchers who cannot access this should contact me).

No comments: