When discussing complicated topics like artificial intelligence (AI), it’s often easier to understand the issues in a sector like clean technology by comparing what’s happening in other sectors like manufacturing or healthcare. So, this month, let’s talk about building useful artificial intelligence(AI) tools, the need for expertise in adapting these technologies, and how our experience using AI in other sectors is similar to what we experience in clean technology.
Since we’re still in a pandemic, let’s take a look at healthcare and clean technology.
To start with, what do these two seemingly very different sectors have to do with each other? Quite a lot, as it happens. First, the environment has a huge impact on public health - think wildfire smoke or poor air quality and the impacts on cardiovascular health and asthma in people. Second, monitoring the environment allows us to better understand the health of communities - for example, monitoring wastewater for SARS-Cov2, the virus causing the current pandemic, allows us to contain outbreaks as they occur in communities and prevent wide scale disruption to communities.
Finally, new tools and technologies using data science and machine learning are being introduced in both healthcare and clean technology - and in both cases, practitioners and users have discovered that you need to understand what’s happening in the system in order to build useful tools.
An interesting article published last month in MIT Technology review explored the use of artificial intelligence (AI) tools in the current pandemic and their effectiveness in improving patient outcomes and resource management in hospitals. They looked at tools to diagnose patients faster, to predict the severity of the disease in patients and to better manage hospital resources . The algorithms involved included deep learning, standard image recognition tools and a wide range of predictive models such as clustering. The data used included chest X-ray images, hospital records on patient health (patient age, patient position during imaging, body temperature etc) and datasets on resources by hospitals.
There were 232 tools that were developed to improve patient diagnosis and predict how sick those patients with Covid-19 might get. Out of the 232, only 2 tools were found to be fit for further clinical trials. In an even more shocking study published in Nature Machine Intelligence, 415 tools were developed using deep learning to diagnose Covid from medical images such as X-rays and CT scans and none of those 415 were considered to be fit for clinical use.
So, in other words - a lot of resources were thrown at using AI to aid healthcare professionals during the pandemic - but the results were not useful.
Why did that happen?
According to the authors of the studies, the problems in using AI in healthcare were present earlier and have become more visible as a result of the pandemic. While there’s been a lot of hype about AI and what it can do - fundamentally, AI algorithms, like any other algorithms, depend on the data they are trained on and the way the practitioner builds the model. And that’s true in all fields - healthcare, clean technology and others.
Let’s start with the data that were used to train algorithms like neural networks or image recognition models in the studies discussed above. In healthcare, the data are often fragmented, poorly labeled and incorporate the bias of the physician doing the work.
As an example of incorporation bias, patient X-ray images were labeled as “having Covid” or “not having Covid” based on the opinion of the radiologist running the X-ray. If instead, the images were correlated with the patient’s PCR test, human error in labeling the image would be removed and the accuracy of the model would be greatly improved.
While it’s relatively easy to see if there are additional datasets that can improve the accuracy of an AI model, other errors can creep in during the model building phase. As AI models have become more complex, they have also taken on aspects of a “black box”. For example, a deep learning model can incorporate hundreds of layersand millions of parameters - and it’s very difficult for the practitioner to tease out the relationships between all the parameters that the model is using and the final result. That’s where understanding the system, understanding the question that is being solved and understanding the algorithm all come into play.
In the study that was published in Nature Machine Learning, the authors found that in some cases, the AI model was trained to detect patients without Covid using data from children who didn’t have Covid. That meant that the model was very accurate at detecting children, but not really accurate in detecting patients without Covid! Similarly, there were a number of images where the patient was lying down and the X-ray taken and in general, patients that were sicker were the ones lying down. However, the AI algorithm was very successful at detecting whether the patient was lying down from the image and less accurate at detecting whether the patient actually had Covid or not!
When errors like these are identified, they are relatively easy to fix - but that’s the challenge of identifying what a complicated algorithm with hundreds to millions of parameters is doing! That requires a combination of expertise in the subject matter and sufficient mathematical skill to build the algorithm and solve for the problems. Or as the author of the article in MIT Technology Review says “But many tools were developed either by AI researchers who lacked the medical expertise to spot flaws in the data or by medical researchers who lacked the mathematical skills to compensate for those flaws.”
And that is the critical issue facing scientists and practitioners who are using or want to use AI in their fields - the lack of expertise in both the subject matter and data science!
Next week, we’ll take a look at similar challenges in clean technology and talk about how we can adapt AI algorithms to solve problems in the environment and earth sciences.