From ‘Where to FAIR’: Curating and Pre-processing the AddNeuroMed Dataset

By Zoë Leanza
Sage Bionetworks

Colin Birkenbihl

At a recent conference, bioinformatician Colin Birkenbihl showcased his work on the AddNeuroMed (ANM) dataset. Birkenbihl, a PhD student at Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), painstakingly curated this extensive dataset to improve its utility.

Launched in 2005, the ANM study produced a vast collection of data from more than 1,700 participants. Participants enrolled at six different sites across Europe and provided blood and other samples to help develop and validate biomarkers in Alzheimer’s disease (AD). The research team used these specimens to generate data for the cohort, including proteomic, metabolomic, genomic, transcriptomic and other data types. The volume of data was remarkable. It was also disorganized.

Once described as a “data dump” with mixed modalities and missing documents, the dataset was complex and disparate. Recognizing the importance of good data organization, Birkenbihl meticulously pre-processed the ANM dataset with the support of Sir Simon Lovestone of the University of Oxford.

Sir Simon Lovestone

The new version of the dataset includes comprehensive multimodal data, with identifiers mapped to public resources and metadata standardized to align with the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) science. Standardizing the dataset enables researchers to focus on discovery and validation.

“Researchers will be able to explore more complex approaches, such as machine learning and artificial intelligence,” said Birkenbihl, whose recent study demonstrates this value. A preprint of the ANMerge study is available via medRxiv. As an AD Knowledge Portal adjacent study, the AddNeuroMed & ANMerge datasets are accessible through the Synapse platform. To qualify for access, view the terms of use.

Share This Story, Choose Your Platform!