By Zoë Leanza
At a recent conference, bioinformatician Colin
Birkenbihl showcased his work on the AddNeuroMed (ANM) dataset. Birkenbihl, a PhD student at Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), painstakingly curated this extensive dataset to improve its utility.
Launched in 2005, the ANM study produced a vast collection of data from more than 1,700 participants. Participants enrolled at six different sites across Europe and provided blood and other samples to help develop and validate biomarkers in Alzheimer’s disease (AD). The research team used these specimens to generate data for the cohort, including proteomic, metabolomic, genomic, transcriptomic and other data types. The volume of data was remarkable. It was also disorganized.
Once described as a “data dump” with mixed modalities and missing documents, the dataset was complex and disparate. Recognizing the importance of good data organization, Birkenbihl meticulously pre-processed the ANM dataset with the support of Sir Simon Lovestone of the University of Oxford.
Sir Simon Lovestone
The new version of the dataset includes comprehensive multimodal data, with identifiers mapped to public resources and metadata standardized to align with the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) science. Standardizing the dataset enables researchers to focus on discovery and validation.