Open access data from the SolACE project

SolACE aims to ensure the data produced in the project is open, transparent and has a practical legacy after the end of the project. How can we achieve this? What are our aspirations when producing an open dataset?

The long journey of data in SolACE

One of the key objectives of SolACE is to ensure that the flow of data within the project and to the broader stakeholder community is open, transparent and effective. To achieve this, we have produced a Data Management Plan and delivered a Handbook of Protocols and Methodology and a unified Data Template. These outputs ensure that all experiments are using a standardised protocol, and the data is being captured and labelled in the same way by all.   

Like many collaborative projects the size of SolACE, there are potential issues in data collection and data flow. These issues include the datasets being stored and distributed on many computer hard drives across Europe, the datasets using names, units and methods that only make sense to their creator; genotype and variable names being written differently or having different coding; the metadata being absent or unharmonised and processing of raw data done in a range of different ways by different operators. Therefore, common protocols for storage, meta-data and common vocabularies are essential.

It is essential that we have a standardised data format and collection process, and this is what the Data Template provides. All experimental data produced by SolACE partners should now be formatted and placed on the shared data template. This template consists of a multi-sheet Excel file with space for metadata and raw data captured in experiments. It uses the ICASA data standard and dictionary and can be automatically translated to ACE format and uploaded into other internationally recognised databases.

Issues with the process of using the Data Template were addressed at the recent annual meeting, where a virtual Data Template training course took place and was attended by many of the SolACE partners. 

Why is it so important that we collect a harmonised data set from the project? In short, it is critical to facilitate model and simulation work being performed within SolACE, which enables joint data analyses to allow SolACE to share data in a harmonised format compatible with international initiatives. This is critical for the delivery of the SolACE Data Management Plan.

More importantly, harmonised datasets will give SolACE a legacy after the lifetime of the project. This legacy will allow the inclusion of SolACE data in future meta-analysis with non-SolACE datasets, parametrisation of models and re-analysis of SolACE data using new software applying new questions not foreseen at present. This will ensure that data produced by SolACE, using public money, will contribute to solving long-term issues of global significance. 

If you would like to know more about the approach SolACE is taking to providing open and useful datasets, then please contact Tim George.