Agile scientific data curation initiative

UPData is a scientific data curation experiment currently under development at University of Porto which aims to determine the main digital preservation needs of several research groups at the university. In the course of the experiment, eight datasets have been collected from diverse scientific domains. After conducting several interviews with researchers working at U.Porto, we have concluded that from their point of view, flexible data access is the most valued capability when analyzing a preservation solution and that offering such access it is the best way to involve them in the preservation workflow. We propose an extension to the DSpace repository platform to complement it with data curation capabilities. In the proposed solution, the system ingests Excel spreadsheets containing scientific data and translates them into XML documents which can then be queried via automatically generated XQuery statements. Researchers use a search web page designed for displaying deposited data and applying various filters to it, retrieving the parts they need without having to scan each file. The collected datasets will be used as test cases for data deposit, and also to evaluate the effort required by the curation procedure.
Source code repository (FEUP VPN required)