A data journey pt 2: from collecting to publishing

In the second part of the journey we are taking with our data, we will look at the efforts to make the data ready for publication.

Collect data from IHHD

After the initial contact from the team in the Institute for Health and Human Development (IHHD), we met them in person to collect the data on a CD.

We were talked through the data, shown it open in SPSS, and given information on how to get the documentation. What struck us was how useful the documentation could be. We were eager to get access to the field manual, as it sounded like it could be useful as a stand alone document.

Examine data and check for identifiers

Once we had the data files back in the office, we opened each one in SPSS to check it was valid. We also checked to make sure we were not publishing any identifiers which would compromise the anonymity of the participants.

This can be a complicated process with SPSS files. The data was split into many different files for different types of processing. Printing off a check sheet for every file in the collection (data and documentation) helps.

At this stage we created a record in our test EPrints repository. This allowed us to show the project team what was needed and how it would be arranged.

Translate/Get access to correct documentation

The documentation we were given, while being very full, was written in two local Indian languages (both spoken by many millions of people). These languages do no use the Latin alphabet that English uses, so additional font files were needed to read them in word format.

As our PCs are locked down by IT (and installing fonts requires admin privileges) this was not a trivial task. It was something we did not think of at all when planning how the ingest process would work. If others are going to be working with international data, language documentation/data issues could be a real problem.

Confirm metadata and other details are correct with all contributors

We had to contact all of the contributors to confirm that they were happy with the metadata, the files uploaded, the wording or various parts of the abstract/metadata. This was again not something we had planned for. While the PI might be happy with when they’ve given us, there were many people to go through to make sure we were doing everything correctly. This could be the case for all larger teams, different members have different responsibilities and will need to be contacted to make sure we are describing their contribution correctly.

This took time as they are not all in the same time zone, or working on the project anymore.

Publish and mint DOI

Once everyone in the project team was happy with the data, we could move the record from our test server, to our live server.

We made the record live on the 23rd of March 2015 and minted a DOI at the time (http://dx.doi.org/10.15123/DATA.4). 


When depositing data like this in a repository there are bound to be issues you can’t predict as you go along. We’ve learned from this experience, and will use it in future as an example for others within UEL.

Our experience also highlights in need for good relationships with the researchers, and how much self depositing could smooth the process.


Research Data Management Workshop

Firstly I should introduce myself; I’m UELs new Research Data Management Officer. I performed a similar role at the University of Glasgow, working on the C4D project. The outcome for Glasgow was a live data repository built on the EPrints platform. I am excited to now be part of the team at UEL, where we have an excellent opportunity to provide a fantastic new RDM infrastructure and service to our staff and students.


Stephen and I ran a Research Data Management Workshop yesterday in our Stratford Campus. We had 11 participants, from a variety of backgrounds. We aimed to give a wide outline of the importance of good RDM and the services we offer in the library.2013-11-05 12.33.04

12.00     Welcome and Introductions

12.15     Presentation on managing your research data

13.00     Briefing on exercise using a simple Data Management Plan template

13.30     Feedback and discussion on exercise, and next steps

14.00     Close

Stephen led the introduction and invited the participants to tell us and each other about the sort of research they do, and their relationship with data. There was a very good variety of research data being created and reused, from sensitive patient data, foreign government data, and interviews, to large quantitative datasets.

Stephen then started the presentation on managing your research data. I took over at one point and gave information on backing up and securing data. Once I had finish Stephen finished off by talking about Data Management Plans.

We then took a short break and encouraged everyone to have a go at completing a sample data management plan we provided, based on work by the DCC. The feedback at the time suggests that this was very helpful. Some saying that it helped make what they need to do for their research clearer.

We gathered the DMPs and plan on providing feedback to those who left their email addresses.

Our feedback forms show that overall the workshop was very well received. It has also given us ideas on how we can improve the flow in future. We are very pleased with the level of interest shown by the participants, reinforcing our view on the importance of providing good support for research at UEL.