05.06.2025

Dr Michael Pocock discusses a promising approach to combine data from remote sensing and citizen science through machine learning to provide monitoring of habitat quality...

For effective environmental stewardship we need information across large areas to assess environmental quality, and ideally to get this in near real-time. Data from satellites are one source – but while we have masses of data constantly beamed down from satellites doing remote sensing, we often struggle to make sense of it.

In our new paper, published in Ecological Solutions and Evidence, we provided a perspective on how three areas of science – remote sensing, citizen science, and machine learning (a form of AI) – could be brought together to meet the needs for comprehensive, high resolution, real-time data on habitat quality. It was a collaboration between UK Centre for Ecology & Hydrology (UKCEH) and The Alan Turing Institute (for research in AI) with on-the-ground perspectives from the Peak District National Park.

Combining multiple data from a protected area into analysis of local habitat condition
Our perspective on how remote sensing and citizen science can be combined through machine learning to provide high quality, near real-time reporting on habitat condition. This could be transformative for environmental reporting.

Remotely sensed data from satellites are a tremendous source of information – indeed UKCEH uses such data to create its widely-used Land Cover Map products. They contain a richness of information: pixel-by-pixel in multiple colour bands from regular satellite passes.

But we need the tools to extract the information from the raw data, and we need on-the-ground data to train the models. AI can help us extract the information using ‘machine learning’ models, and citizen science can provide the on-the-ground data to train the machine learning.

Layers of remotely sensed data
Can the richness of remotely sensed data be harnessed even more by using machine learning techniques?

Citizen science, such as supported through the Biological Records Centre at UKCEH and our partners, is an incredible source of data. Using tools such as our iRecord system, millions of species records are submitted by tens of thousands of volunteers every year. Crucially records of ‘indicator species’ can provide information on the habitat quality, and we could harness this information to train the AI algorithms.

Image
Dingy skipper butterfly
The diminutive Dingy Skipper butterfly. If a butterfly recorder sees this, it indicates they are in ‘nice’ habitat. Records like this could potentially train AI to extract more information about habitat quality. (Photo: Michael Pocock)

Extracting the information from remotely sensed data with citizen science data is challenging. AI can make the tasks more computationally feasible using ‘feature embeddings’ to compress the satellite data and using ‘pre-train/fine-tune’ models to make the most of the patchily distributed citizen science data.

The challenges aren’t solved yet (more technical detail on our approach is available in this preprint article), but this was a productive collaboration – bringing together a computer scientist, ecological scientist and on-the-ground practitioner to explore the transformative next steps for environmental monitoring.