What We Don’t Know

A modest proposal for fixing Canada’s data deficit

Data is back in fashion in Ottawa. As Prime Minister Justin Trudeau brings back Laurier’s “sunny ways,” one of the promised effects will be to infuse light into the data dark age of the Harper government’s suppression of data and information. While the Harper government’s killing of the long-form census was its most obvious and egregious attack on data, there were numerous others, such as putting out of business the National Council of Welfare and its statistical reports, which were critical to any understanding of poverty in Canada. The Harper people explained their actions as protecting privacy and saving money. Its critics dismissed these excuses, saying the data vandalism was an attempt to keep facts from spoiling the government’s storylines.

Is data truly back in Ottawa? Certainly it is rhetorically, with pledges to restore the long-form census, fortify Statistics Canada and commit the government to evidence-based policy making. Some wags were fond of characterizing the Harper government approach as policy-based evidence making, raising images of data sleuths scurrying around for statistics to support whatever the government was intent on doing.

Data seems to be back in purpose as well. Much has been made of the Privy Council Office’s initiative around “deliverology,” an idea borrowed from the United Kingdom government of Tony Blair, which deliberately sets out to measure how well government is doing in meeting its commitments. The PCO “results and delivery” unit is up and running. It has created a process for assessing how the government is doing across the twelve broad issue areas of the government (for example, stronger diversity, better relations with and outcomes for the indigenous community, and growing the middle class), and defining what constitutes success, and what needs to be observed and measured along the way.

It is clear that data needs to be a big part of the process. So an early part of deliverology is finding out how adequate our data is in Canada. We may have a good idea of what we need to know in assessing progress toward the government’s goals, but do those data sets exist? Are they accessible? Are they useful? More broadly, do we have a plan for data? Is it just a matter of StatsCan returning to business as usual? Or do we need a more ambitious plan to meet Canada’s data challenge, a plan that might envision new data protocols and even new agencies?

Some complain that criticism of the Harper government’s treatment of data is overcooked, and that it was merely tidying things up and eliminating intrusions on the privacy of Canadians. They claim that the criticism was just so much partisan hacking.

This was not the view of the international statistics community, which, unconcerned with the ins and outs of Canadian politics, wondered how a modern nation could handicap itself in this way. In Canada, objections came not just from the wonkery starved of their oxygen, but from many commercial and economic organizations needing the data to understand markets, from institutions needing to understand trends influencing their clients, and from governments at all levels faced with constructing effective policy and programs.

The 2010 termination of the long-form census was the biggest stroke, which created an uproar in the media and in the research community. Long-form data fed other crucial studies such as the Labour Force Survey and the Survey of Household Spending on which the Consumer Price Index is based.

Other than the long-form census, a big area of loss between 2010 and 2012 was in social data:

The Participation and Activity Limitation Survey, known in the research community as PALS, was interrupted and replaced with an inferior data set. This was the major source of information on people with disabilities and the supports they need.

Social Security Statistics: Canada and the Provinces, which monitored federal/­provincial/territorial/municipal programs, disappeared.

The Survey of Labour and Income Dynamics (known as SLID) was terminated in 2012, losing a key tracking of individual movement in and out of poverty.

Welfare Incomes and Poverty Profile were ended when the 2012 budget killed the National Council of Welfare, the social policy advisory body to the government. Those reports were indispensable sources of information in policy formation.

With the social policy community in shock as these critical data sets fell away, the Caledon Institute of Social Policy decided that Welfare Incomes and Poverty Profile had to be saved, and stepped into the breach. It helped that Caledon president Ken Battle had been the architect of both reports when he headed the National Council of Welfare, before he went on to cofound Caledon in 1992 (with the author of this article).

In the process of incorporating the two reports into Caledon, Battle looked at the state of social data in Canada generally, and Caledon launched the Canada Social Report, an effort to put key social data in one place. While the CSR is a useful and growing resource, its development has illustrated two points: there is a great need for a national resource that houses key data for Canada’s ongoing social development and it must be done at scale, probably by a stable and well-resourced agency. Caledon is a well-regarded and successful public policy think tank, but it is small and mostly privately funded. Finding the resources to run a national data agency is beyond its reach.

Another key objective of the new federal government is to begin to close the gap on our infrastructure maintenance and provision. Across the country we have crumbling and deteriorating roads, bridges, tunnels, energy distribution systems, schools, hospitals and social housing. We also need to build new infrastructure to meet the needs of a growing population and economy. We need new installations of the infrastructure above, and in public transit, airports and harbours. The government has made infrastructure an early funding priority.

Because of that, questions have arisen about how well we do at building things. Are we on time and on budget?

This question intrigues Matti Siemiatycki, professor of geography and planning at the University of Toronto. He wrote a 2016 paper on “Cost Overruns on Infrastructure Projects: Patterns, Causes, and Cures” for the Institute on Municipal Finance and Governance.

Siemiatycki documents what many people sense, even if just from reading newspaper headlines, that on time and on budget are rare events. Looking at a large sampling of projects in Canada and internationally, both publicly and privately executed, he reports that costs run over by 28 percent. Rail projects go over by 45 percent, bridges and tunnels by 34 percent, and surface roads by 20 percent. Large information and technology projects go over by 27 percent, with one in six of them over by 200 percent. And the average overrun for Olympic Games? A mighty 179 percent! Faster higher stronger, indeed.

These are alarming numbers. Siemiatycki can explain some of it: scope changes and change orders once budgets have been approved; “handover” problems between governments, private contractors and subcontractors; incomplete studies prior to approval on technical and engineering issues; inflation in labour and material costs; delays from strikes, materials sourcing or coordination with utility companies; and unforeseen events such as bad weather or accidents.

He also notes that there are human factors at work, such as “optimism biases,” which lead project proponents to assume things will work out well for the most part. There is also “strategic misrepresentation” by proponents who want to keep budgets low and schedules short in order to get approval from a council or parliament, or from a corporate board of directors.

But a significant observation Siemiatycki makes is this:

The world is in the midst of a big data and analytics revolution. From professional sports to product marketing, sophisticated new methods are being developed to improve performance by collecting and statistically analyzing massive amounts of data. Yet infrastructure megaproject delivery remains a sector that has been largely untouched by this trend.

He goes on to recommend that project suppliers be subject to rigorous prequalification in bids, as a way of separating good from poor performers. But, he notes, “the  strength and legitimacy of the prequalification system is predicated on the development of a data collection regime that is rigorous in capturing both the size and causes of cost overruns as well as construction quality.”

Finally, Siemiatycki recommends a systematic knowledge exchange. Citing the Major Projects Leadership Academy at Oxford University in the United Kingdom, he says that using the knowledge arising from this data collection to educate current and emerging project managers is critical in reducing ­inefficiencies.

In two critical areas, then, social policy and infrastructure provision, we are hearing voices calling for a much stronger data capability. On one hand, it is a response to the ebbing of StatsCan and the withdrawal of key elements on which policy makers and analysts relied. On the other, it is a call for new capacity, collecting and analyzing data we have not had in the past.

One solution put forward is a simple reinvestment in StatsCan. After all, the new government has restored the long-form census. Surely it can restore the interrupted data series, or patriate those such as Welfare Incomes that are now housed elsewhere. This presumes that all we lack is the will of the government in power to be committed to evidence as the basis of decision making.

There is no doubt that StatsCan is a formidable and able enterprise. Formed in 1918 as the Dominion Bureau of Statistics, renamed and reconstituted in 1971, it has been regarded as one of the very best national statistical organizations by observers such as The Economist. Over the decades, it has built up a strong body of data, much of it the result of contribution agreements with data generators across the country, principally provinces and territories. It issues many regular and useful reports across a broad range of topics.

And yet it is not the most accessible of organizations. Many researchers have stories of visiting the StatsCan facility in search of data only to be subjected to a Soviet-style process of surrendering cellphones and computers, being denied copying of documents, and other minor privations only to find out some time later that much of the information in question was already in the public domain. They would not characterize StatsCan as being customer friendly. Some have unflatteringly compared the approach to that of the Toronto Transit Commission: TTC officials think their customers are buses and trains, not riders; StatsCan thinks its customer is the data, not the user of the data.

StatsCan and the TTC are not alone in this orientation. In fact, it is at the base of a great discussion in the data world. On one side of the ­discussion, there are traditionalists who believe that data is proprietary, and that there are deep privacy and security issues at stake in its distribution. They believe that data falling into the wrong hands can be used against the interest of the state, or whoever owns the data. They believe that they have the ability to pose the right questions to the data set to produce answers that will serve society. Or at least they have the motivation to find out the right questions. In the case of an agency like StatsCan, they have earned enough respect that they may well be right.

On the other side are those who advocate for open data, making available as much data as possible to the public, recognizing, of course, that proper attention be paid to anonymity and security. Advocates say data should be provided in its most raw form with as little intermediation between it and the user as possible. The benefits in transparency and accountability would be significant, they say. Of more interest to many are potential benefits in innovation and engagement by people who would pose new queries to the data in order to develop new answers to old questions, new solutions, and new tools for governments and citizens.

This tussle over data is going on everywhere. Some governments have adopted open data protocols: municipal governments in San Francisco, New York, Edmonton; 40 U.S. states and four Canadian provinces; and many national governments, including Canada. For the most part these are partial commitments, with only selected data sets made available, but in most cases there is a commitment for more to come. It should be said that having an open data policy does not necessarily mean that a government has become more client- or citizen-focused in its approach. The government can still be very selective about what it makes available.

How then should the government go about achieving a more effective data strategy? Is there a better model than StatsCan? Some point to the Canadian Institute of Health Information. CIHI was formed in 1994 with the mission of disseminating quality healthcare information. A private non-profit organization, it is governed by a board with members from the federal and provincial/territorial governments. Like StatsCan it has forged a large number of data contribution arrangements across the country, and makes an effort to engage many stakeholders in health care, including institutional and private actors.

An advantage of a dedicated agency, apart from making data widely available, is that it can push a research agenda into new frontiers, or deeper waters. Because of its relationships with contributors and users of data and information, it can develop new questions, tackle harder problems and test innovative enquiries. While there are others in society with deep knowledge of medical issues, say sleep apnea or pancreatic cancer, perhaps in university or hospital research facilities, a dedicated body with data at hand can push consistently across a range of projects. And with a distributed “ownership” represented by a diverse governance membership, no one government or interest will dominate the direction of the research agenda.

A key part of CIHI’s mandate is to make healthcare information publicly available to Canadians. It is also committed to supporting data access for graduate students as a way of developing future generations of healthcare system professionals and analysts. It balances access with a commitment to data security and privacy. Without arguing that CIHI is perfect, one can say it provides a model for other areas with data issues and concerns. StatsCan would continue as Canada’s predominant data agency, and would be a significant data supplier to agencies modelled on CIHI. Its power and expertise should not be diminished. But new agencies would permit stronger focus and deeper investigation of their areas of consideration.

What might that model look like in social policy and in infrastructure?

A Canadian institute of social information, CISI if you will, would take the idea that Caledon has developed with the Canada Social Report to scale. It would become a comprehensive data and information agency that would assemble the broadest possible range of data. It would, of course, track the condition and performance of the major social support programs: the Canada Child Benefit, the Working Income Tax Benefit, the Canada Pension Plan, along with the Old Age Supplement and Guaranteed Income Supplement, Employment Insurance and the various disability supports.

It would also bring to one agency data on unemployment rates, a basic barometer of economic health; poverty rates, showing whether more or fewer Canadians are living in poverty; and the incomes of Canadians, rising or falling. These three are basic measures of how Canadians are doing at any one time.

Social data encompasses a broad range, some of it currently measured and some not. The state of our education system from kindergarten to post-graduate is critical to national competitiveness. So is the vitality of our cities. We need to see the data story, and be able to compare ourselves to those who are world leaders in performance and support. What is the status of our food security? Are Canadians adequately housed in homes of good quality and that are affordable, and do they have security of tenure? Or are too many even without housing, homeless? How are seniors faring, and women, and youth? In Canada we do not have to ask a question about the adequacy of our knowledge of conditions of our indigenous community, particularly those living on reserve, because we know it falls far short. How do we measure up on quality-of-life measures, and how do we compare with those who are doing well? And, on all these dimensions, are we getting better or worse?

A comprehensive approach to social data would allow for deep dives. Caledon uses unemployment data as an example. The national unemployment statistic tells one story, but is it the right story? As I write this the reported Canadian unemployment rate is 6.8 percent. But it varies by province and by region. The rate in Toronto is different from the one in northern Ontario, and Kelowna’s is different than Corner Brook’s. Young men have a rate more than twice the national average. And it differs by education level. For a social policy researcher looking to help design the best public program support for those out of the labour market, or for the private company looking to inform its human resource practice, being able to parse these differences is important. These deep dives are significant because they push the far frontiers of our knowledge to produce greater insight, which can lead to better responses.

Much of this data is available, and a CISI would have to identify sources and negotiate contribution agreements. Data would have to be anonymized, kept secure and monitored for quality. It would have to be accessible, taking the CIHI mission of making social information available to Canadians. And it would have to operate at scale to be adequately funded and to be comprehensive.

If we created a CISI, could we also have a CIII, a Canadian institute of infrastructure information? Fortunately, the aforementioned Professor Siemiatycki has written another paper, “Implementing a Canadian Infrastructure Investment Agency,” telling us how.

As he says, “the CIIA would be positioned as a national centre of excellence supporting rigorous project evaluation, procurement best practices and project financing under a single roof.” He envisions an arm’s-length agency acting in an advisory role to those building infrastructure on the matters of project financing, project selection and prioritization, project delivery, the maintenance of transparency and accountability in the process, and on building human capacity in the system to improve overall performance. It would focus on projects over $100 million, which is where most of the over-budget, behind-schedule problems occur.

As infrastructure data is a road less travelled than social policy, such an agency would be designed from the ground up. In social policy, for example, Welfare Incomes already has a negotiated contribution agreement between the federal, provincial and territorial governments. Infrastructure reporting would have to be negotiated de novo among the many actors, at various governments and in the private sector. Starting such an agency would require an entrepreneurial leader in its early years to build the relationships, programs and products, as well as the advisory models. Siemiatycki stresses the importance of independence, and the entrepreneurial style required shows why it is necessary. Deciding to make a call to a deputy minister or a CEO should not require approvals up and down a bureaucratic hierarchy, not to mention cross-ministry or cross-agency sign-off.

Since Siemiatycki wrote his paper, the federal government has announced the formation of the Canadian Infrastructure Bank. There has been some debate about whether Canada actually needs an infrastructure bank, but almost every discussion of it concludes that we need better data and information about infrastructure. There are many sources of infrastructure financing, the argument goes, but there is not nearly enough on data and information. So maybe we will get the bank anyway, and a vibrant information facility will be part of it.

If a CISI and a CIII are good ideas, what should be some of the basic design elements?

They should operate on an open data technology platform. Technology has brought us a long way from handwritten lists or even from the construction of PDFs. The world has moved to open data, where raw data is made available so that users can formulate their own questions that generate answers that meet their needs. Open data should be the default, and withholding data from the open regime should only be done after a reasonable argument has been made. In effect, this reverses the traditional situation, where the reasonable argument has to be made why the data should be placed in an open platform. Care has to be taken with data, of course. It must be anonymized so that the identity of individuals is protected. It must not reveal information that would place either the state or individuals at risk, as long as that perception of risk is real. But it must realize that sometimes we do not even know the questions that need to be asked, and allowing data to have a wide range of questions posed to it may provide valuable answers to questions ranging from freeing up traffic flows to how we should treat people with autism.

They should pick up on the CIHI model of being a non-profit independent agency. CIHI has a board representing the key stakeholders and other important viewpoints, but, as in all governance, board members’ first allegiance must be to the well-being of CIHI, not the stakeholder (the province or territory, or professional group such as doctors or nurses) they come from. Such independence is vital both for broad ownership of the enterprise and for the authenticity and legitimacy of the data.

Collaboration should drive much of the data collection. All levels of government collect data: municipalities know about water, waste, transit and immigration settlement; provinces know about their big files such as health care and education, and also about land use and energy systems; and the federal government knows about income supports, airports and harbours. The First Nations know about their people and their needs, about the land, and about justice. Corporations know about the jobs, markets and consumers. Our wide range of institutions such as hospitals, schools, police services and the military know their issues and how people respond to them. All can contribute data, and can benefit from having that data made available, analyzed and turned into useful information.

These institutes should operate along the spectrum of data leading to information leading to knowledge. They need to have data available on open platforms, but also need to apply the analysis that can produce reports to show trends and comparisons. And from time to time they need to say what it all means, to describe to Canadians the evolving story of the country.

And the institutes should be clients of StatsCan, not competitors with it. StatsCan will continue to be the world-respected leader it is. But these particular institutes will allow for deeper, more flexible, and more accessible engagement.

Is this a costly enterprise? The annual budget of CIHI just exceeds $100 million. Given the importance of health care to Canadians and the benefits CIHI produces, that is probably a good bargain. Siemiatycki estimates his version of a CIII would come in at $20 million or less. Given the broader range of data in a CISI, it might cost $40–50 million a year. For less than $200 million, we would have three major areas of importance to Canadians supplied with the data they need on a modern technology platform. Against an annual federal budget of about $300 billion, that is an affordable amount. In fact, it may be an investment we can’t afford not to make because it will contribute to policy and programs that are more effective and of better value.

It seems that for Justin Trudeau’s government’s sunny ways, evidence, data and information will be a strong platform. After at least a decade during which Canadians got used to governments happy to operate in the dark, the light is coming back on. The creation of a set of vital information institutes would create a legacy for the current government that would be hard to undo. Based as it is on a broad collaboration, and having independent status, an organization like CIHI would be hard to kill. With care in their construction and conduct, so would a CISI, CIII and beyond.