DataNotes

datanotes_logoA platform to describe research data, based in a semantic media wiki

FEBRUARY 2011 – SEPTEMBER 2011

Scientific research is increasingly based on the collection and use of significant amounts of data, which has led researchers to consider depositing them in data repositories. The goal of these repositories is the storage and preservation of datasets. One of the problems that usually arise is the difficulty experienced in interpreting these data. To overcome this issue, it is necessary not only to save the data but also to describe it. Descriptions can be specialised to varying degrees: they can be based only on generic descriptors such as title, date and creator, or include domain features. The creation of rich descriptions for the datasets in a repository requires the collaboration of an information-management specialist who is responsible for creating those descriptions. In this scenario, as the researcher has little control over data descriptions, the process tends to be time-consuming. In the case of the storage and description of data in universities, there are some projects where the uploading process depends on these specialized staff, called “curators”.

This project aims to design and develop a collaborative annotation system to be used by re- searchers at the University of Porto. Using this system, they will be able to upload and describe their datasets themselves, using a set of tools to assist them in this process. They will also be able to describe, in free text, other observed facts that may be hard to fit into the available descriptors. Researchers are able not only to describe and update their own data but also data belonging to other researchers, provided they are authorised to do so. Based on the Semantic MediaWiki platform and one of its extensions, Semantic Forms, this new extension was developed to ensure the functionalities of DataNotes. The platform was also integrated with another project called UPBox so that they can both be part of a complete research data curation process. The initial objectives have been met, but there is plenty of opportunity for future expansion. Possible improvements include the development of a new extension on Semantic MediaWiki which would allow metadata schemas to be automatically imported into DataNotes, as well as semi-automatic annotation of data using the content of the datasets to be annotated.