datalibre.ca

EU Guidelines for Open Access

January 13, 2008 in academia, government, policy, R&D by Hugh | Permalink

The Scientific Council of the European Research Council has released its Guidelines for Open Access [pdf]…

Here is the text:

Scientific research is generating vast, ever increasing quantities of information, including primary data, data structured and integrated into databases, and scientific publications. In the age of the Internet, free and efficient access to information, including scientific publications and original data, will be the key for sustained progress.

Peer-review is of fundamental importance in ensuring the certification and dissemination of high-quality scientific research. Policies towards access to peer-reviewed scientific publications must guarantee the ability of the system to continue to deliver high-quality certification services based on scientific integrity.

Access to unprocessed data is needed not only for independent verification of results but, more importantly, for secure preservation and fresh analysis and utilisation of the data.

A number of freely accessible repositories and curated databases for publications and data already exist serving researchers in the EU. Over 400 research repositories are run by European research institutions and several fields of scientific research have their own international discipline-specific repositories. These include for example PubMed Central for peer-reviewed publications in the life sciences and medicine, the arXiv Internet preprint archive for physics and mathematics, the DDBJ/EMBL/GenBank nucleotide sequence database and the RSCB-PDB/MSD-EBI/PDBj protein structure database.

With few exceptions, the social sciences & humanities (SSH) do not yet have the benefit of public central repositories for their recent journal publications. The importance of open access to primary data, old manuscripts, collections and archives is even more acute for SSH. In the social sciences many primary or secondary data, such as social survey data and statistical data, exist in the public domain, but usually at national level. In the case of the humanities, open access to primary sources (such as archives, manuscripts and collections) is often hindered by private (or even public or nation-state) ownership which permits access either on a highly selective basis or not at all.

Based on these considerations, and following up on its earlier Statement on Open Access (Appendix 1) the ERC Scientific Council has established the following interim position on open access:

The ERC requires that all peer-reviewed publications from ERC-funded research projects be deposited on publication into an appropriate research repository where available, such as PubMed Central, ArXiv or an institutional repository, and subsequently made Open Access within 6 months of publication.

The ERC considers essential that primary data – which in the life sciences for example could comprise data such as nucleotide/protein sequences, macromolecular atomic coordinates and anonymized epidemiological data – are deposited to the relevant databases as soon as possible, preferably immediately after publication and in any case not later than 6 months after the date of publication.

The ERC is keenly aware of the desirability to shorten the period between publication and open access beyond the currently accepted standard of 6 months.

Peter has some good analysis.

What is the NRC’s policy on Open Access?

open access conference vid

October 31, 2007 in academia, canada, openmovement by Hugh | Permalink

Open Access: the New World of Research Communication

The University of Ottawa Library, in association with the Canadian Association of Research Libraries (CARL), hosted a public seminar entitled Open Access: the New World of Research Communication on Wednesday October 10, 2007

See here for vid.

DataBase Aesthetics

October 30, 2007 in academia by Tracey | 1 comment

Oh lala! If anyone want to purchase a present for me – well this is it!

Database Aesthetics:

examines the database as cultural and aesthetic form, explaining how artists have participated in network culture by creating data art. The essays in this collection look at how an aesthetic emerges when artists use the vast amounts of available information as their medium. Here, the ways information is ordered and organized become artistic choices, and artists have an essential role in influencing and critiquing the digitization of daily life.

Contributors: Sharon Daniel, Steve Deitz, Lynn Hershman Leeson, George Legrady, Eduardo Kac, Norman Klein, John Klima, Lev Manovich, Robert F. Nideffer, Nancy Paterson, Christiane Paul, Marko Peljhan, Warren Sack, Bill Seaman, Grahame Weinbren.

Victoria Vesna is a media artist, and professor and chair of the Department of Design and Media Arts at the University of California, Los Angeles.

Kindred blog – Zzzoot

October 11, 2007 in academia, Access, datasets, policy by Tracey | Permalink

It is a small world!Â Especially in Ottawa where the degree of separation is about .0005 degrees!Â Feels like that anyway!

Zzzoot is definitely a kindred blog to datalibre.ca.Â The Sept. posts are excellent reviews of International data access and preservation technologies, RFPs and initiatives.Â All my faves’ are discussed there – Cyberinfrastructure in the US, Joint Information Systems Committee (JISC) UK DataShare project and CODATA Open Data for Open Science Journal and a whole bunch more.

I think like I, he wishes there would be some uptake from these innovations & initiatives here!

DataNet – Data Archive

October 4, 2007 in academia, infrastructure by Tracey | 5 comments

Imagine a cyberinfrastructure that builds a data archive! Well the National Science Foundation (NSF) in the US has a massive call for proposals to build just that a Sustainable Digital Data Preservation and Access Network Partners (DataNet) I am so jealous of those folks!Â Canada has no equivalent to an NSF and does not invest in the future access of data at all!Â The Canadian Digital Information Strategy document will be released for public consultation in October but it is no where near as comprehensive as the Cyberinfrastructure work.Â The Cyberinfrastructure Vision for 21st Century Discovery is well worth reading.

Call for proposals info via – Jon Udell
Musing (about infrastructures) from my reading of Cyberinfrastructure documents sometime ago -Â Infrastructure learning, More, even more, public reason!

Census Privacy – Implied VS Informed Consent

September 12, 2007 in academia, Access, canada, datasets, government, policy by Tracey | 1 comment

There is a interesting article in the Globe today by Eric Sager a professor of history at the University of Victoria about access to the names of Census respondents of Censuses gone by and those in the future.

I consider the privacy aspects of the Census to be sacred and so does StatCan. I fill it out because I know I am anonymous and that the data will be aggregated therefore not traced back to my personal address. Many people feel the same way, recall the Lockheed Martin online Census debacle. Fortunately for Canadians we do not live in Nazi Germany, Stalin’s Ukraine, or are in Idi Amin’s Uganda where Censuses were explicitly used to target, kill or expulse ‘undersireable’ populations or to mask the death tole of massive mistakes. Censuses can and have been used to trace and target people of ethnic, religious, sexual orientation, or racial backgrounds. This 2006 Census year included a question as to whether or not we would be willing to give consent to sharing our private information 92 years from now. I responded with an educated no.

Historians and genealogists argue that past census respondent’s names should be made available and that we should have future access to current censuses:

The census is the only complete inventory of our population, an indispensable historical record of the Canadian people. It’s critical to genealogy, our most popular form of history. Of all visitors to our national archives today, half are doing genealogical research. If you had ancestors in Canada in 1901 or 1911, you can find them in the censuses of those years, online from Library and Archives Canada. Your children will also be able to find their grandparents and great-grandparents in the censuses of the past century â€” but only after a legally mandated delay of 92 years.

Seems like our friends in the South are sharing their Census information, as the U. S. Census information is released

through their National Archives after a delay of 72 years. They apply the principle of “implied consent” â€” a principle well known to privacy experts. When completing their census forms, Americans are consenting to the present-day use of their information by the Census Bureau, and to its use by other researchers in the distant future. Americans do not complain about the future use of their information, and there is no evidence that public release after 72 years has made them reluctant to participate.

Spammers and telemarketers have been using “implied consent” when they send me unsolicited email garbage, drop popups on my computer or call my home to sell me stuff. I have to say there are dubious elements to this concept. I do however like the concept of informed consent and think the Census had it right by leaving it up to census respondents to decide if they wish to share their personal information to future generations of researchers or potentially less progressive political regimes (see the question and your options).Â StatCan even provided a very extensive section on historical and genealogical position. See the informed consent Question 8 on the short form and Question 53 on the long form. These are perfectly legitimate questions supported with a ton of explanatory texte and is a perfect compromise to the debate.

Prof. Sager makes a compelling argument for access to this private information, but he believes we should give up our right to informed consent, that we are not smart enough to understand on our own the importance of historical and genealogical research.Â I vehemently disagree with these points. He does however correctly point out the importance of the Census for research and decision making.

I would like to have free – as in no cost – access to the non-private Census data and maps in the same way we have free access to the forms and the methodological guides. Now that, along with informed consent, is what a democracy looks like!

3 most common scientific roadblocks?

August 2, 2007 in academia, Access, openmovement, web by Tracey | Permalink

Accessing literature,
obtaining materials,
and sharing data.

Science is a collaborative endeavour and these 3 roadblocks are impeding scientific discovery according to John Wilbanks, executive director of the Science Commons initiative, founder of the Semantic Web for Life Sciences project and the Neurocommons.

New Social Science Data Infrastructure for Ontario Libraries

July 25, 2007 in academia, Access, canada, datasets, infrastructure, Uncategorized by Tracey | Permalink

I met with Wendy Watkins at the Carleton University Data Library Carleton University Data Library yesterday. She is one of the founders and current co-chair of DLI and CAPDU (Canadian Association of Public Data Users), a member of the governing council of the International Association of Social Science Information Service and Technology (IASSIST) and a great advocate for data accessibility and whatever else you can think of in relation to data.

Wendy introduced me to a very interesting project that is happening between and among university libraries in Ontario called the Ontario Data Documentation, Extraction Service Infrastructure Initiative (ODESI). ODESI will make discovery, access and integration of social science data from a variety of databases much easier.

Administration of the Project:

Carleton University Data Library in cooperation with the University of Guelph. The portal will be hosted at the Scholar’s Portal at the University of Toronto which makes online journal discovering and access a dream. The project is partially funded by the Ontario Council of University Libraries (OCUL) and OntarioBuys operated out of the Ontario Ministry of Finance. It is a 3 year project with $1 040 000 in funding.

How it works:

ODESI operates on a distributed data access model, where servers that host data from a variety of organizations will be accessed via Scholarsâ€™ Portal. The metadata are written in the DDI standard which produces XML. DDI is the

Data Documentation Initiative [which] is an international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML.

The standard has been adopted by several international organizations such as IASSIST, Interuniversity Consortium for Political and Social Research (ICPSR), Council of European Social Science Data Archives (CESSDA) and several governmental departments including Statistics Canada, Health Canada and HRSDC.

Collaboration:

This project will integrate with and is based on the existing and fully operational Council of European Social Science Data Archives (CESSDA), which is cross boundary data initiative. CESSDA

promotes the acquisition, archiving and distribution of electronic data for social science teaching and research in Europe. It encourages the exchange of data and technology and fosters the development of new organisations in sympathy with its aims. It associates and cooperates with other international organisations sharing similar objectives.

The CESSDA Trans-Border Agreement and Constitution are very interesting models of collaboration. CESSDA is the governing body of a group of national European Social Science Data Archives. The CESSDA data portal is accompanied by a multilingual thesaurus, currently 13 nations and 20 organizations are involved and data from thousands of studies are made available to students, faculty and researchers at participating institutions. The portal search mechanism is quite effective although not pretty!

In addition, CESSDA is associated with a series of National Data Archives, Wow! Canada does not have a data archive!

Users:

Users would come to the portal, search across the various servers on the metadata fields, access the data. Additionally, users will be provided with some tools to integrate myriad data sets and conduct analyses with the use of statistical tools that are part of the service. For some of the data, basic thematic maps can also be made.

Eventually the discovery tools will be integrated with the journal search tools of the Scholar’s Portal. You will be able to search for data, find the journals that have used that data or vice versa, find the journal and then the data. This will hugely simplify the search and integration process of data analysis. At the moment, any data intensive research endeavour or data based project needs to dedicate 80-95% of the job to find the data from a bunch of different databases, navigating the complex licensing and access regimes, maybe pay a large sum of money, organizing the data in such a way that it is statistically accurate then make those comparisons. Eventually one gets to talk about results!

Data Access:

Both the CESSDA data portal project and ODESI are groundbreaking initiatives that are making data accessible to the research community. These data however will only be available to students, faculty and researchers at participating institutions. Citizens who do not fall into those categories can only search the metadata elements, see what is available but will not get access to the data.

Comment:

It is promising that a social and physical infrastructure exists to make data discoverable and accessible between and among national and international institutions. What is needed is a massive cultural shift in our social science data creating and managing institutions that would make them amenable to the creation of policies to unlock these same public data assets, some of the private sector data assets (Polls, etc.) and make them freely (as in no cost) available to all citizens.

quality repositories

July 18, 2007 in academia, datasets, tools by Hugh | Permalink

Quality Repositories, is a website that comes out of a stats (?) course at University of Maryland. It aims to evaluate the usefulness and availability of various sources of public data, from US Government, non-US government, academic, and sports related (?) data sets. Evaluations are based on criteria such as: online availability, browsability, searchability, retrievable formats etc. The about text:

Data repositories provide a valuable resource for the public; however, the lack of standards in terminology, presentation, and access of this data across repositories reduces the accessibility and usability of these important data sets. This problem is complex and likely requires a community effort to identify what makes a “good” repository, both in technical and information terms. This site provides a starting point for this discussion….

This site suggests criteria for evaluating repositories and applies them to a list of statistical repositories. We’ve selected statistical data because it is one of the simplest data types to access and describe. Since our purpose is partly to encourage visualization tools, statistical data is also one of the easiest to visualize. The list is not comprehensive but should grow over time. By “repositories” we mean a site that provides access to multiple tables of data that they have collected. We did not include sites that linked to other site’s data sources.

The site was created by Rachael Bradley, Samah Ramadan and Ben Shneiderman.

(Tip to Jon Udell and http://del.icio.us/tag/publicdata)

Cost Recovery Policies are NOT Synonymous with Data Quality

July 17, 2007 in academia, Access, datasets, infrastructure, policy, Uncategorized by Tracey | Permalink

One of the great data myths is that cost recovery policies are synonymous with higher data quality. Often the myth making stems from effective communications from nations with heavy cost recovery policies such as the UK who often argue that their data are of better quality than those of the US which have open access policies. Canada, depending on the data and the agencies they come from is at either end of this spectrum and often in between.

I just read an interesting study that examined open access versus cost recovery for two framework datasets. The researchers looked at the technical characteristics and use of datasets from nations of similar socio-economic, jurisdiction size, population density, and government type (Netherlands, Denmark, German State of the North Rhine Westfalia, US State of Massachusetts and the US Metropolitan region of Minneapolis-St. Paul). The study compared parcel and large scale topographic datasets typically found as framework datasets in geospatial data infrastructures (see SDI def. page 8). Some of these datasets were free, some were extremely expensive and all under different licensing regimes that defined use. They looked at both technical (e.g. data quality, metadata, coverage, etc.) and non-technical characteristics (e.g. legal access, financial access, acquisition procedures, etc.).

For Parcel Datasets the study discovered that datasets that were assembled from a centralized authority were judged to be technically more advanced while those that require assembly from multiple jurisdictions with standardized or a central institution integrating them were of higher quality while those of multiple jurisdictions without standards were of poor quality as the sets were not harmonized and/or coverage was inconsistent. Regarding non-technical characteristics many datasets came at a high cost, most were not easy to access from one location and there were a variety of access and use restrictions on the data.

For Topographic Information the technical averages were less than ideal while for non-technical criteria access was impeded in some cases due to involvement of utilities (tendency toward cost recovery) and in other cases multiple jurisdictions – over 50 for some – need to be contacted to acquire a complete coverage and in some cases coverage is just not complete.

The study’s hypothesis was:

that technically excellent datasets have restrictive-access policies and technically poor datasets have open access policies.

General conclusion:

All five jurisdictions had significant levels of primary and secondary uses but few value-adding activities, possibly because of restrictive-access and cost-recovery policies.

Specific Results:

The case studies yielded conflicting findings. We identified several technically advanced datasets with less advanced non-technical characteristics…We also identified technically insufficient datasets with restrictive-access policies…Thus cost recovery does not necessarily signify excellent quality.

Although the links between access policy and use and between quality and use are apparent, we did not find convincing evidence for a direct relation between the access policy and the quality of a dataset.

Conclusion:

The institutional setting of a jurisdiction affects the way data collection is organized (e.g. centralized versus decentralized control), the extent to which data collection and processing are incorporated in legislation, and the extent to which legislation requires use within government.

…We found a direct link between institutional setting and the characteristics of the datasets.

In jurisdictions where information collection was centralized in a single public organization, datasets (and access policies) were more homogenous than datasets that were not controlled centrally (such as those of local governments). Ensuring that data are prepared to a single consistent specification is more easily done by one organization than by many.

…The institutional setting can affect access policy, accessibility, technical quality, and consequently, the type and number of users.

My Observations:
It is really difficult to find solid studies like this one that systematically look at both technical and access issues related to data. It is easy to find off the cuff statements without sufficient backup proof though! While these studies are a bit of a dry read, they demonstrate the complexities of the issues, try to tease out the truth, and reveal that there is no one stop shopping for data at any given scale in any country when it comes to data. In other words, there is merit in pushing for some sort of centralized, standardized and interoperable way – which could also mean distributed – to discover and access public data assets. In addition, there is an argument to be made to make those data freely (no cost) accessible in formats we can readily use and reuse. This of course includes standardizing licensing policies!

Reference Institutions Matter: The Impact of Institutional Choices Relative to Access Policy and Data Quality on the Development of Geographic Information Infrastructures by Van Loenen and De Jong in Research and Theory in Advancing Data Infrastructure Concepts edited by Harlan Onsrud, 2007 published by ESRI Press.

If you have references to more studies send them along!

academia

EU Guidelines for Open Access

open access conference vid

DataBase Aesthetics

Kindred blog – Zzzoot

DataNet – Data Archive

Census Privacy – Implied VS Informed Consent

3 most common scientific roadblocks?

New Social Science Data Infrastructure for Ontario Libraries

quality repositories

Cost Recovery Policies are NOT Synonymous with Data Quality

about

Previously, on datalibre

Comments on Posts

Recent Comments

Archives