Tuesday, 16 July 2013

Gold Standard Project completed

My humblest apologies to readers of this blog for neglecting to provide an earlier update on the status of this project. The Gold Standard project officially closed in March this year as announced at that time on the ANDS-general and ANDS-partner e-lists. 

The key achievements of this project were:
1.          Development of processes and workflows for improving the record quality, connectivity and richness of research data metadata records in Research Data Australia.
2.          Analysis of ANDS Tools and costs and benefits of improving Research Data Australia records.
3.          A selection of 15 collection records described and contributed to Research Data Australia providing greater exposure to research data through the Research Data Australia portal.

The project findings were presented at an ANDS webinar on 28 November 2012. The webinar recording is available here.

A description of Griffith’s Gold Standard collection records can be viewed on Research Data Australia:

The code and user documentation developed as part of this project is available at https://github.com/gu-eresearch/

A description of the project and contact information can be found on the ANDS website: http://projects.ands.org.au/id/SC36

As the Gold Project has closed, this will be the last blog post on this site. However, you may be interested in a related blog which recorded the experience and findings of the Data Citation Project at Griffith. The Data Citation Project built on the earlier work Griffith had put in to support minting of DOIs (as discussed frequently on this blog). As per Karen Visser’s email to the ANDS-partner e-list on 15 July:

Griffith University has successfully completed the ANDS-funded project: Data Citation Infrastructure Establishment Program.  
The project has contributed to the roadmaps for Griffith research content repositories and discovery services, which now include suggestions for citation-related enhancements such as:
  • a ‘citation builder’ that helps depositors to see how the metadata they enter leads to a well-formed citation
  • automated depositor notifications that include details of how to cite the deposited collection
  • well-formed citation statements displayed on repository landing pages for users to cut and paste into note-taking and word-processing tools
  • embedded machine-readable metadata, that will help search engines index scholarly content more effectively, and facilitate citation export to a web-based reference managers like Zotero or Mendeley, and 
  • citations in formats can be easily downloaded to offline reference managers like Endnote.
The project blog is available at: http://data-citation-griffith.blogspot.com.au/
The PHP script which Griffith staff developed for minting DOIs using the ANDS Cite My Data service, and the MOAI Python script used for providing records to RDA via OAI-PMH, have been uploaded to GitHub and are freely available: https://github.com/gu-eresearch/ANDSDOIScripts.
The project team shared the project with the research data community in a webinar hosted by ANDS on 4 June 2013. The recording is on the ANDS YouTube channel at:  http://goo.gl/LRISq 

Project information is available on the ANDS website at: https://projects.ands.org.au/id/F&C013 

Finally, thanks so much for staying tuned to this blog. There are quite a lot of exciting projects happening at Griffith in 2013 and beyond. We are continuing to develop the Griffith Research Hub, which won the VALA Award 2012 and a Commendation of Merit in the Stanford Prize for Innovation in Research Libraries 2013. We are building a Climate Change Adaptation Hub and a Biodiversity and Climate Change Virtual Laboratory. We’re also redeveloping our data repository and creating a method for researchers to self-submit their data collections to the repository, which will then feed into the Research Hub. Installation of Symplectic is also in the mix. If you are interested in keeping in touch with these developments, contact me directly, or you can follow my personal blog. I have neglected the blog for quite some time and shall endeavour to post more regularly.


Tuesday, 27 November 2012

Griffith Gold Standard Project Webinar

We will be presenting a 'Griffith Gold Standard Records project' webinar later today: 

Wednesday 28 November from 2-3pm AEDaylight time

Register here:  http://griffithgoldstandard.eventbrite.com

If you can't make it, the webinar will be recorded so that you can view it later in your own time.

Note that the project is not yet completed. We are in the final stages of the records review process and will put up a blog post when our records have been published to the Research Data Australia production service. 
The webinar will cover:

Project overview
Strategy utilised
Manually created records using ANDS Online Services
Records created via an automated feed to Research Data Australia
Technical architecture and challenges
List of Gold Standard records
Enriched record connections
Enriched record components
Lessons learned
Return on investment

Thursday, 25 October 2012

Nearing the finish line!

It has been a while since my last update about the Gold Project. I’m not sure what it’s like at your institution but here at Griffith we are showing no signs of slowing up as we come steaming toward the holiday season. The Gold Project final deadline is fast approaching and we have been very busy with our Research Hub (metadata store) project which is running in parallel to the Gold project.

Our draft Gold records are going through the review process with ANDS staff. They reviewed one round of records, we had a Skype call to clarify a few issues, and we are currently putting in the final changes for records in this round. We are trying to limit to just one other round in order to finish in a few short weeks time but we will see how we go. We are down to the nitty gritty detail but the automated approach – which is about building capacity for Gold standard metadata records within our systems - has required us to work within our existing systems (the data repository and the metadata store) and technical architecture to enhance the records and then feed them from the metadata store to RDA. This can be time consuming but will pay off into the future.

We have organised a webinar to discuss our experiences with the Gold project. QUT had theirs recently and the recording is available at: http://www.youtube.com/watch?v=rOL2vb--9G8&feature=youtu.be

Our Gold Project webinar is tentatively scheduled for Wednesday 28 November. An announcement will be made on the ANDS e-list and included in the ANDS events calendar closer to the date.

I’ll be at the eResearch Australasia conference next week in Sydney and plan to blog about it here: http://natashajsimons.blogspot.com.au/. Griffith will be having a booth at the conference and we will be demonstrating a few of our projects including our new climate change hub project and our metadata store solution, the Griffith Research Hub. Quite a lot has been happening with our Research Hub/metadata store project in particular – please come and talk to me about it at the conference if you’re interested.

Friday, 24 August 2012

VIVO 2012 conference blog

As per the previous blog post, Griffith's Senior Developer on the Gold Project, Arve Solland, is attending the VIVO 2012 conference in Florida. Check out his conference blog.

Monday, 20 August 2012

Gold record challenges

I came back from OR2012 in Edinburgh last month and it already feels like a lifetime ago. The conference was terrific and generated lots of good ideas and reflections (see my conference blog here). This month Arve Solland, the Senior Programmer on the ANDS-funded Gold and Metadata Store projects here at Griffith, is presenting a paper at the VIVO conference in Miami, Florida. His paper has the title of Profiling Research @ GU through the use of VIVO. I’ll post a link to his blog when it’s available as I’m sure it will be of interest to some of you.

However the big news is that we have submitted all of our Gold standard records for review by ANDS staff. We feel this is quite an achievement and it was very fiddly work in the end to get the records looking the way we wanted them to from the automated feed.  The ANDS manual interface allows for the creation of rich records and you have more control to tailor each record to the subject of the collection. An automated solution has to fit across all collection records so is less tailored but is preferable because it is more sustainable as a method.

We completed a report on enhancing RIF-CS records to ‘Gold Standard’ and submitted it to ANDS. The report details our strategy, challenges and resolution of issues in addition to a cost-benefits analysis. The latter was really quite difficult and I feel a round of usability testing with researchers would be most helpful. Next up there will be a process in which we receive feedback on the report, and on the records, followed by a bit of to-and-fro before both ANDS and ourselves are satisfied with the results. Then the records will be published in RDA and we will share our experiences at a public event, most likely an online forum so that more of the ANDS-partner community can join in the conversation. In the meantime, I’m going to use this blog to summarise some specific challenges we faced in creating the Gold records:

Where to update fields?
An important issue arose at various points during the Gold project in relation to the capacity for storage and display of various metadata elements used to describe collections and parties [records describing a person or group] in different systems. Griffith has a Research Data Repository and the Research Hub harvests data from the repository and from other Griffith systems (e.g. Research Administration Database) and then feeds the (selected) records to ANDS Research Data Australia. [For more on our technical architecture, check out this diagram]. However, the repository was not able to store a number of metadata elements in the way we wanted them stored and displayed (e.g. DOIs, NLA party identifiers, biographical information, spatial information for some collection records etc. ). So, a decision was made to store this data in the Research Hub. As a result, our systems are slightly out of sync in terms of metadata describing the same record types – an issue which may come back to bite us at some point in the future (should we not resolve it before that time).

Updating DOIs
We encountered some problems when we were to update some previously minted DOIs, because a recent upgrade of the ANDS minting service had changed some parameters. While this issue was swiftly resolved by ANDS services, an update to the documentation would be most helpful.
Citation information for data collections
Once we had minted DOIs for each data collection, we created a citation element. This was a citation to be used for the record about the data collection itself, not for its associated publications. In the ANDS online manual interface, the addition of a citation element is easy once you have identified what information you want to include in it. However, the automated feed was far more challenging.

In RIF-CS you can provide the citationInfo element as either a single block or provided in separate parts. Because we use VIVO, and this is based on RDF triples, it is more logical for us to map to separate parts. However, we found that some mandatory elements seemed to be ported directly from citation information for publications, and not adjusted to suit collection metadata. Specific examples are edition, and maybe also placePublished, as these fields may not have special meaning for collections. ABetter model may be be to follow the citation guidelines from Datacite: http://schema.datacite.org/meta/kernel-2.2/index.html making only 5 vital properties required: Creator (PublicationYear): Title. Publisher. Identifier. The rest of the properties could then be optional.

The absence of a guide or working example made it difficult to determine how the information required should be constructed, and how citations would be rendered based on individual citation elements. However, I believe that Griffith (and also QUT as part of their Gold standard project) were the first to supply the citationInfo element as separate parts. I also heard an excellent talk about this exact issue from Michelle Teis, Metadata Analyst for the TERN project, at the Data Citation Best Practice Roundtable organised by ANDS in Brisbane a couple of weeks ago. I hope that our feedback as early users of this element helps to improve this option for other institutions.
Expanding subjects
Our records already contained ANZSRC Field of Research codes that were ingested by the Hub from the Griffith Research Administration Database. We looked at extending the subjects to include different types: ‘local’ and ‘Library of Congress’ but only where these types provided additional context or granularity to the existing ANZSRC codes.

While the ‘local’ type involved addition of free text content, the Library of Congress (LoC) subject headings come from a controlled vocabulary with text and a hyperlink to the term on the LoC website. Addition of the LoC subject headings to the manual record were straightforward but for the automatic feed we ran into problems. We decided to load the LoC subject headings as an ontology into VIVO using the Hub development environment to begin with. The result was that VIVO crashed and wiped all of the contents of the Hub Dev environment. The problem was that the file size was too large and we had to abandon the idea and resort to ‘local’ subject types instead - at least for the moment.

Temporal and spatial metadata
Adding spatial information required use of Google Earth so that we could supply both the human-readable text and the actual co-ordinates. The latter was tricky at first and we obtained assistance from a specialist in spatial co-ordinates who works within the Griffith eResearch Services team to ensure the correct coordinates were used.

Related publications for party records
For related publications, we decided to script an insertion into each Gold party record that contained a link to the publications listed in the researchers’ profile page in the Hub with accompanying text title explaining this link. We felt this was preferable to adding each publication individually as in some cases there were 100s of publications.

As mentioned, once the record review process has been completed and the records published in RDA, we will share our experiences of the Gold project at a public forum (date yet to be determined).

Community events
There were two ANDS events held recently in Brisbane: an ANDS QLD Community Day and a Data Citation Best Practice Roundtable. At the former, we heard updates on projects and progress, as well as some useful information about ANDS direction, its international connections strategy, and (new) national collections program. There were opportunities to talk informally and to provide ANDS with feedback. At the Roundtable, which was organised by ANDS, we heard from a range of presenters about data citation and the implementation of DOIs for research data collections.
As mentioned, Michelle Teis gave a talk that analysed the citation element in RIF-CS and compared it to other citation styles such as that recommended by DataCite. Gerry Ryder from the CSIRO went through some enviable workflows for DOI minting using the ANDS service and Siddeswara Guru demonstrated the really neat interface he and Wing-Fai have developed for minting and managing DOIs at TERN. At Griffith we have used a simple PHP script but over the next few months we’ll be looking at how we can implement the type of interface Guru has developed, as well as developing our workflows. I was given two bites of the cherry at the Roundtable and spoke about our Gold project, our DOI implementation and some data citation best practice initiatives that I’ve learnt about from the UK Data Archive (versioning) and the Dryad data repository folks in the USA (workflows for data citation). There were a number of very good talks at the Roundtable and I noticed a maturing of the discussion around this important global initiative. I’m looking forward to continuing the discussion – which has been brilliantly facilitated by Karen Visser at ANDS – and perhaps moving towards some best practice guidelines. Finally, if you haven’t checked it out lately, there has been quite a lot added to the ANDS Data Citation Resources page.

Wednesday, 11 July 2012


I'm currenly attending the Open Repositories 2012 conference in Edinburgh. If you're interested, I am blogging this at http://natashajsimons.blogspot.com/

Wednesday, 27 June 2012

Data is the new Gold

‘Data is the new gold in the research world’. This is a phrase I took away from one of the data citation ‘roundtables’ I attended recently. In this post, I want to give an update of the progress we have made in the Gold project, some DOI events and resources, and a glimpse of what we have ahead.

Progress on Gold record enhancements
We’ve progressed in enhancing a selection of our collection records, and their related party, activity and service records, to make them “Gold Standard”. This work has to be completed in the next month and then ANDS will assess their quality before they are made available in Research Data Australia (so you can’t see them yet, sorry!). This is a snippet of what we’ve focussed on more recently in the project:
·      Reviewing our feed of RIF-CS 1.3 records to RDA as part of our migration to MOAI.
·      Adding spatial information as a RIF element where this information is provided in a collection record. This is being added as both text and co-ordinates. ANDS have clarified that the Gazetteer service is not quite ready yet for us to use.
·      Clarifying university policy on provision of birth dates for party records to ‘third parties’ such as the NLA and RDA. The policy is that we can’t disclose birth dates. The policy that covers this is both internal to the institution and external in the form of federal government policy.
·      Building in automated support for the RIF-CS relatedInfo element in party records to point to the publications tab of the researchers profile in the Griffith Research Hub.
·      Working out how best to include citations for our data collections (as distinct from publications). According to the section on citations in the ANDS Content Providers Guide - http://www.ands.org.au/guides/cpguide/cpgcitation.html - we can either supply the citation in full or split into component parts. It would be better to split into component parts as it works better as a mapping from our RDF triple store. But we want to make sure this will display nicely in RDA and we’ve not yet found any examples of those who have contributed records with the citation in component parts.

Knights of the (data citation) round table!
Karen Visser, on behalf of ANDS, organised and skilfully facilitated a series of excellent virtual roundtables on the topic of DOIs and data citation. One was with Jan Brase (DataCite), one with Bob Cook (ORNL DAAC) and one with Nigel Robinson (Thomson Reuters).  Each presenter was excellent and is to be commended, not least for getting out of bed so early to present from the other side of the world! I really got a lot out of these:

There is a lot of work being done by DataCite, DAAC and Thomson Reuters to make data citable, to link publications with the underlying data using the DOIs, and to track data citation statistics as a way of promoting data citation to researchers. And there is also a lot of sophisticated communication from organisations such as these regarding the benefits of data citations e.g. data preservation, access, re-use, discoverability, impact, credit.

However most of the citation tracking is still done manually. In fact, Bob Cook estimated it takes around 80 hours each year of a librarian’s time once a year to review and track data citations and referrals. This is quite a remarkable effort. It sounds like a truly onerous task, even if the benefits are realised. I think if an automated way of doing this isn’t found, the wider adoption of citation tracking is not likely to happen.

In between the virtual roundtables, I also had some face-to-face meetings with DOI implementers from other institutions in Queensland. There is some great work being done by the Terrestrial Ecosystem Research Network (TERN) team and I was grateful they came to Griffith for the meetings. Guru and Wing-Fai demonstrated their progress and I was impressed that they had built such a nice clean interface for minting DOIs using the ANDS service and that it supports the full DataCite metadata schema.  

Here are some recent DOI resources you may not have seen:
DOI becomes an ISO standard (ISO 26324:2012)
Implementing DOIs for Research Data (my article in D-Lib Magazine)
Joint statement from STM and DataCite (to encourage publishers and data centers to link articles and underlying data)
Total Impact (A website that makes it quick and easy to view the impact of a wide range of research output, including data.)

Coming up next
Here’s a glimpse of what’s ahead for us over the next couple of months:
·      Complete the current project milestone by submitting all of the Gold records to RDA for ANDS to review.
·      Move onto a cost benefits analysis and prepare to share our Gold project experiences at a public forum (e.g. a GoToMeeting session).
·      I will be attending the Open Repositories conference in Edinburgh in July and presenting on the Griffith Research Hub and the NLA party infrastructure. The latter is a Pecha Kucha which presents a real challenge in terms of complex content vs. short amount of time. I expect to take away a bunch of new ideas and inspiration that I can share with others back home.

Bye for now,