COVID-19 Hackathon / / by Alex Hall and Vyron Christodoulou

Alex Hall and Vyron Christodoulou are data scientists, working to extract useful information from the BGS’s vast and varied data. They specialise in Machine Learning, which involves training computers to learn how to perform tasks by themselves – often tasks that are impossible for humans. Last week they took part in a hackathon with other data scientists and medical specialists to address some of the issues arising from the current COVID-19 pandemic. Here Alex explains the problem they tried to solve and how diagnosing health conditions can be surprisingly similar to analysing geological data...

As a data scientist, for the past few weeks I have been asking myself ‘how can my skills be used to save lives during the COVID-19 pandemic?’ Like many others in the data science field, I have been strongly suspecting I have the capability to help out, but have been unsure exactly how. However last week, the Coronahack virtual hackathon, hosted by Mindstream-AI tackled exactly that question: by gathering together data scientists and biomedical experts and presenting us with a number of urgent issues.

Alex working from home. Note the hastily thrown-together PC to the right, which was assembled specifically to train up some of our algorithms.

Vyron and I signed up to take part and formed a team with six other data scientists and a healthcare worker located around the UK. We were one of over 20 teams to take part. Although this event was originally planned to be hosted in London, the current circumstances forced it to be hosted virtually, which turned out to be a huge benefit, as it t is unlikely Vyron and I would have been able to attend otherwise. Additionally, working remotely allowed us and other attendees to commit our time more freely since most of us were working around our daytime jobs. Needless to say, this made for an intense week of seemingly nonstop coding.

Within 30 minutes of getting a team together, we had agreed on a problem and began working on our solutions. We chose to tackle the problem of ‘First-Diagnosis’ with the goal of training an AI to assist with early diagnosis of COVID-19. Much of the work I carry out at BGS is using existing image data to predict outcomes. In this case, we planned to use x-ray images to predict whether a patient exhibited symptoms of COVID-19. Existing lab tests are timely, in short supply and have a surprisingly high false-negative rate. So, by using the same techniques we use at BGS to automatically categorise geological images, we hoped to produce a tool to aid in diagnosis of COVID-19 from x-ray images.

The main hurdle to this approach is the lack of existing data. Unfortunately hospitals are generally underfunded and understaffed when it comes to data management. As a result, the majority of recently generated data (be it x-rays or other clinical data) are simply unavailable for wider use. There is no doubt that thousands of publications will be generated in the next few years from the data that is slowly released after the crisis has passed. However that is of no use to us now; it is frustrating knowing that data exists right now but isn’t available for immediate use to save lives.

Our team’s major innovation was to use the limited existing x-ray data to generate new ‘computer-generated’ x-rays. This artificial data could be then used to train AI to automatically recognise the features of COVID-19 in chest x-rays of actual patients. Vyron worked on a model called a Generative Adversarial Network (GAN) which was able to build the new, artificial x-rays. In parallel, I worked with two other data scientists on the team to feed this data into our image classifiers which were able to predict whether a patient has the disease with an accuracy of 97%. It should be stated that this figure comes with several caveats and we would likely see a decrease in accuracy if more data were available, but as a proof of concept it was fascinating to see our code actually working after just five days of work.

Some outputs from the GAN. For each pair of images, the left is an original x-ray and the right is artificially generated by the algorithm to simulate a case with COVID-19. 

Unfortunately, we did not take away the prize of £1000 funding for further research and a state of the art GPU, which was awarded to the winning team. However, our findings are open source and available for anyone wishing to conduct further research in the field. Furthermore, I will be taking some of the learnings back to BGS. I have already been able to use some of the methods to accelerate the running of machine learning scripts used at BGS and I anticipate Vyron’s work on GANs will soon prove valuable to extracting use from some of our more limited datasets.

So, to summarise, it was an extremely intensive week but absolutely worth it and I probably enjoyed it even more than if I’d been there in person.

Open source material: 
Coronahack Stylegan
Alex Hall data

Open data: 
Kaggle chest x-rays
Kaggle radiography database