Guest blog post: Data dissemination initiatives from Statistics Canada, by Aspi Balsara, government documents librarian.

Canadians working with statistical and research data include government documents librarians whom we find in most university research libraries.  Many government document librarians and their colleagues, the data librarians, participate in the Data Liberation Initiative (DLI) which I introduced in a post honouring one of its founders.  They are also often members of the Canadian Association of Public Data Users (CAPDU) among many other important data related organizations.  The DLI also does much capacity building for research, data and map librarians in yearly face to face meetings and online discussion, developing expertise which is then shared among colleagues in their home institutions.

Aspi Balsara is one of the government documents librarians at the Queen Elizabeth II Library, Memorial University of Newfoundland.  He is a CAPDU member and has been kind enough to share his latest FAQ about various initiatives to disseminate Statistics Canada data.  The post is technical and specific in nature, but demonstrates quite nicely the kind of expertise we have across Canada in this area, a knowledge base that is often overlooked.  The FAQ introduces many databases and formats while also answering new dissemination policy questions.

Finally, this post also introduces a data community of practice with experts who collaborate nationally to benefit their local users using LISTSERVs technology, which ain’t fancy, but sure is effective in a place like Canada with its smart people scattered all over a big geographical expanse.  Twitter does some things well, but these lists and their archives are invaluable in fostering near real time deep collaboration.  People get to meet face to face once a year thanks to the DLI, so the relationships are quite strong.

***********************

FAQ on various dissemination initiatives from Statistics Canada

1.     Are all Public Use Microdata Files (PUMFs) available to the public, or only some of them?  All PUMFs are available, free of charge.  This has been the case for the past year and a  half.

2.     How does the public access and order a PUMF?   The public may order it directly from the Statistics Canada homepage, using the Search the site feature.  After filling out the order form, the customer will then be contacted by Statistics Canada to sign a licence agreement.  Upon receipt, the data is put on CD-ROM and shipped.

3.     Do these freely available PUMFs include SPSS and SAS command files (as they do for DLI subscribers)?  The codes are generally available in SAS which is what Statistics Canada (SC) uses.  SPSS is used mainly by the academic sector. SPSS may be available sometimes, but when derived from SAS, its quality is questionable as it does not include “missing values”.  Eventually, through the Common Tool for Social Surveys, SC will have more standardized output, including good SPSS codes.

4.     Since PUMFs are now publicly accessible, what value do DLI subscribers get for their subscription?  Revenue from DLI subscriptions pays for the infrastructure, regional and national training for the DLI contacts, prompt support through the listserv, and other initiatives.   No money goes toward paying for the data. This has been emphasized at the DLI training “bootcamps” where DLI contacts are asked to convey to their library administration the value of the training and support available from the DLI.  This is also pointed out in the DLI annual reports.

DLI subscribing institutions can share a PUMF in a classroom or lab environment.  Otherwise, a professor would have to obtain a licence from Statistics Canada that each student would be required to sign before using the PUMF.

Through the DLI, member institutions also have access to the Discharge Abstract Database (DAD) Research Analytic Files from the Canadian Institute for Health Information (CIHI).  The Discharge Abstract Database is only available to DLI members (see no. 11 below).

5.     In November 2010, Statistics Canada announced its intention to launch a subscription service to all its PUMFs.   This service was targeted to non-Canadian subscribers for an annual fee of $5000.00.   Is there any information about it?  This service aims at national and international organizations outside the DLI who wish to access SC’s complete PUMF collection, be informed of new releases, and avail of a service that answers their queries.  This service is called the “Public Use Microdata File (PUMF) Collection”.

See: http://www.statcan.gc.ca/bsolc/olc-cel/olc-cel?catno=11-625-XWE&1ang=eng

6.     With free CANSIM access beginning February 1, 2012, will the CANSIM component in E-Stat continue to be provided?   If so, will it be updated more than just once a year? In April 2012, Statistics Canada announced that E-Stat would be archived on June 30,    2012. It was last updated July 2011 and will remain so until removed permanently on June 30, 2013. (In the meantime, E-Stat can be accessed by clicking Students and teachers in the left menu bar of Statistics Canada’s homepage.)  Hence, there is no point using the E-Stat version of CANSIM anymore.

Other resources on E-Stat, such as the 1996, 2001 and 2006 censuses can be accessed from: http://www12.statcan.gc.ca/census-recensement/index-eng.cfm as well as from the library webpages at:  http://guides.library.mun.ca/canadianstatistics  and http://guides.library.mun.ca/content.php?pid=207197&sid=1734802

A new web location has yet to be determined for Census years 1665-1871, 1986 and 1991, as well as environment and elections data (currently accessible via E-Stat).

7.     Is it just the CANSIM data that will be freely available as of February 1, 2012, or all of Statistics Canada’s data? In addition to CANSIM, select census data products for 2011 will be freely available.  Statistics Canada will maintain current pricing practices for print publications, maps, CD-ROMs and custom products and services.

See: http://www42.statcan.gc.ca/smr09/smr09_035-eng.htm

8.     Are the geography products also freely available? As of November 29, 2011, geography data from the 2006 and 2011 censuses are available free of charge except for postal code products since they are provided by Canada Post. As it stands, DLI member institutions have access to postal code information products that can only be used for research and teaching purposes and cannot be shared with non-DLI institutions.  While these products are freely available to DLI subscribers, it should be noted that Statistics Canada is presently negotiating with Canada Post for continued access to postal code products.  If and when an agreement is concluded, it will be added as an appendix to the DLI licence agreement.

 9.     Does the public have to pay for DA (Dissemination Area), Block level data (basic population and dwelling counts) and FSA (Forward Sortation Area) data?   Data for DAs – for 2011 and previous census years – are now available for free upon request.  This is why you will see a “contact us” link for census tables at the DA level (whereas previously there was a $ sign since these tables were not freely available).  Block level data is available at the population and dwelling count level from GeoSearch or GeoSuite, and there is no charge.  FSAs come under postal code data covered above in no. 8.

10.  The new DLI licence (sent to subscribing institutions in September 2012) no longer states explicitly that data are restricted to research and educational purposes only.  Does this mean that commercial use of the data is now permitted?   Firstly, the majority of Statistics Canada’s standard and custom products will be disseminated under the terms and conditions of the Statistics Canada Open Licence Agreement.   See:  http://www.statcan.gc.ca/reference/licence-eng.html   It permits a worldwide, royalty-free, non-exclusive licence to use, reproduce, publish, freely distribute, or sell the information. This means that standard data products once distributed by the DLI, such as Intercorporate Ownership, SABAL – Small Area Business and Labour Database can now be made accessible to the general public and not just the       university community. However, lifting the restriction on such data is left to the discretion of the DLI member institution since it is then obliged to shoulder responsibility for providing support to outside clients. Organizations that prefer to maintain the restriction may refer non-university users to Statistics Canada for assistance.

Postal products are still restricted to DLI members (as explained in no. 8 above).

Public Use Microdata Files (PUMFs) are covered in an appendix to the new DLI Licence Agreement. Basically, bona fide members of a DLI member institution may use a PUMF for commercial purposes but cannot provide the file to outside clients.  For instance, a professor may publish the findings from a PUMF in a text book, but may not reproduce the data or share it.  Similarly, she may submit research for a client that draws upon a PUMF but cannot include the data. Should the client wish to consult the PUMF, she would follow the procedure described in no. 2 above.

11.  When will CIHI (Canadian Institute for Health Information) add its files to the DLI?

Plans are under way to make the Discharge Abstract Database (DAD) available as a DLI file (see no. 4).  The DAD focuses on inpatient acute care discharge in Canada (excluding Quebec).   The files will be available to the DLI community through the DLI FTP site once all members have signed and returned the licence agreement distributed last September.

Aspi Balsara

Feb 14, 2012

Revised:  April 17, 2012;  May 4, 2012, October 15, 2012