The IDI and government data linking

Browsing on The Treasury’s website the other day, it was the title that caught my eye: “Talkin’ about a revolution”.   I’m rather wary of revolutions.  Even when –  not always, or perhaps even often –  good and noble ideas help inspire them, the outcomes all too often leave a great deal to be desired.   There are various, quite different, reasons for that, but one is about the failure to think through, or care about, things –  themselves initially small or seemingly unimportant – that the revolution opens the way to.

This particular “revolution” – billed as “a quiet and sedate revolution, but a revolution nonetheless” – was sparked by Statistics New Zealand’s Integrated Data Infrastructure (IDI).   Here is the Treasury author

The creation of Stats NZ’s IDI (or Integrated Data Infrastructure), a treasure trove of linked data, sparked the revolution, and its ongoing development drives it along. The IDI doesn’t collect anything new. Instead it gathers together data that is already collected, links it together at a person level, anonymises it, and makes it available to researchers in government, academia, and beyond.

The author goes on

Since 2013, its growth has been far more rapid. From a handful of users in its early years, there are now hundreds of people using IDI data to help answer thorny questions across the full range of social and economic research domains. The IDI is incredibly powerful for research, and has a number of important strengths.

  • Longitudinal – Providing a picture of people’s lives over time, crucial for understanding the effect of policies and services.
  • A full enumeration – Incorporating administrative data for almost all New Zealanders, enabling a focus on minority groups and small geographic areas.
  • Accessible – By making data available to researchers at relatively low cost, agencies are no longer gatekeepers of the data they collect, and a culture of sharing in the research community is encouraged.
  • Cross-sectoral – Allowing researchers to explore the relationships between different aspects of people’s lives that may be invisible to individual agencies.

There is a breathless enthusiasm about it all.

Stats NZ’s new online research database highlights the huge breadth of research underway for the benefit of all.

It is never made clear quite how the Treasury author gets to his conclusion that all this research benefits us all.

And here is the SNZ graphic illustrating the range of data they have put together (and linked)


I’m a bit torn about the IDI (and its business companion, the LBD).   As an economist and policy geek, I’m fascinated by some of results researchers have been able to come up with using this new database.  A few months ago I wrote (positively) here about how Treasury staff had been able to derive new estimates on internal migration.   Here is a chart I showed then on the various databases linked together that enabled those estimates.

tsy popn
And here is a more-detailed SNZ graphic on what data are in the IDI at present (and more series are still being added).


More details are here.

Note that it is not even all government data –  for example, the Auckland City Mission is providing data on people it assists.  Specifically

Auckland City Mission data

Source: Auckland City Mission
Time: From 1996
What the data is about:  Income, expenses, housing status, and household composition of Auckland City Mission clients, and the services these clients use. Auckland City Mission is a social service provider in Auckland CBD, that helps Aucklanders in need by providing effective integrated services and advocacy. Note: data dictionary available on the IDI Wiki in the Data Lab.
Application code: ACM

Even if in 1996 those individuals gave their consent for their (anonymised) data to be used, few people in 1996 would have had any idea of the practical linking possibilities in 2018.   (And at a point of vulnerability how much ability did they have to decline consent anyway?)

It is researcher heaven.  But it is also planner’s heaven.

Statistics New Zealand sings the praises of the IDI (as does Treasury –  and any other agency that uses the database).  I gather it is regarded as world-leading, offering more linked data than is available in most (or all) other advanced democracies –  and that that is regarded as a plus.   SNZ (and Treasury) make much of the anonymised nature of the data, and here I take them at their word.  A Treasury researcher (say) cannot use the database to piece together the life of some named individual (and nor would I imagine Treasury would want to).   The system protections seem to be quite robust –  some argue too much so – and if I don’t have much confidence in Statistics New Zealand generally (people who can’t even conduct the latest Census competently), this isn’t one of the areas I have concerns about at present.

But who really wants government agencies to have all this data about them, and for them to be able link it all up?   Perhaps privacy doesn’t count as a value in the Treasury/government Living Standards Framework, but while I don’t mind providing a limited amount of data to the local school when I enrol my child (although even they seem to collect more than they need) but I don’t see why anyone should be free to connect that up to my use of the Auckland City Mission (nil), my parking ticket from the Dunedin City Council (one), or (say) my tiny handful of lifetime claims on ACC.  And I have those objections even if no individual bureaucrat can get to the full details of the Michael Reddell story.

The IDI would not be feasible, at least on anything like its current scale, if the role of central government in our lives were smaller.   Thus, the database doesn’t have life insurance data (private), but it does have ACC data.  It has data on schooling, and medical conditions, but not on (say) food purchases, since supermarkets aren’t a government agency.   I’m not opposed to ACC, or even to state schools (although I would favour full effective choice), but just because in some sense there is a common ultimate “owner”, the state, is no reason to allow this sort of extensive data-sharing and data-linking (even when, for research purposes, the resulting data are anonymised).   There is a mentality being created in which our lives (and the information about our lives) is not our own, and can’t even be stored in carefully segregated silos, but is the joined-up property of the state (and enthusiastic, often idealistic, researchers working for it).   We see it even in things like the Census where we are now required by law to tell the state if we have trouble “washing all over or dressing” or, in the General Social Survey, whether we take reusable bags with us when we go shopping.    And the whole point of the IDI is that it allows all this information to be joined up and used by governments –  they would argue “for us”, but governments view of what is in our good and our own are not necessarily or inevitably well-aligned.

In truth my unease is less about where the project has got to so far, but as to the future possibilities it opens up.  What can be done is likely, eventually, to be done.   As I noted, Auckland City Mission is providing detailed data for the IDI.  We had a controversy a couple of years ago in which the then government was putting pressure on NGOs (receiving government funding) to provide detailed personal data on those they were helping –  data which, in time, would presumably have found its way into the IDI.   There was a strong pushback then, but it is not hard to imagine the bureaucrats getting their way in a few years’ time.  After all, evaluation is (in many respects rightly) an important element in what governments are looking for when public money is being spent.

Precisely because the data are anonymised at present, to the extent that policy is based on IDI research results it reflects analysis of population groups (rather than specific individuals).  But that analysis can get quite fine-grained, in ways that represent a double-edged sword: opening the way to more effective targeting, and yet opening the way to more effective targeting.  The repetition is deliberate: governments won’t (and don’t) always target for the good.  It can be a tool for facilitation, and a tool for control, and there doesn’t seem to be much serious discussion about the risks, amid the breathless enunciation of the opportunities.

Where, after all, will it end?   If NGO data can be acquired, semi-voluntarily or by standover tactics (your data orno contract), perhaps it is only a matter of time before the pressure mounts to use statutory powers to compel the inclusion of private sector data? Surely the public health zealots would love to be able to get individualised data on supermarket purchases (eg New World Club Card data), others might want Kiwisaver data, Netflix (or similar) viewing data, library borrowing (and overdue) data, or domestic air travel data, (or road travel data, if and when automated tolling systems are implemented), CCTV camera footage, or even banking data.  All with (initial) promises of anonymisation –  and public benefit – of course.  And all, no doubt, with individually plausible cases about the real “public” benefits that might flow from having such data.  And supported by a “those who’ve done nothing wrong, have nothing to fear” mantra.

After all, here the Treasury author’s concluding vision

Innovative use of a combination of survey and administrative data in the IDI will be a critical contributor to realising the current Government’s wellbeing vision, and to successfully applying the Treasury’s Living Standards Framework to practical investment decisions. Vive la révolution!

Count me rather more nervous and sceptical.  Our lives aren’t, or shouldn’t be, data for government researchers, instruments on which officials –  often with the best of intentions –  can play.

And all this is before one starts to worry about the potential for convergence with the sort of “social credit” monitoring and control system being rolled out in the People’s Republic of China.    Defenders of the PRC system sometimes argue –  probably sometimes even with a straight face –  that the broad direction of their system isn’t so different from where the West is heading (credit scores, travel watchlists and so).   That is still, mostly, rubbish, but the bigger question is whether our societies will be able to (or will even choose to) resist the same trends.  The technological challenge was about collecting and linking all this data,  and in principle that isn’t a great deal different whether at SNZ or party-central in Beijing.   The difference –  and it is a really important difference –  is what is done with the data, but there is a relentless logic that will push erstwhile free societies in a similar direction  –  if perhaps less overtly – to China.  When something can be done, it will be hard to resist eventually being done.    And how will people compellingly object when it is shown –  by robust research –  that those households who feed their kids Cocopops and let them watch two hours of daytime TV, while never ever recycling do all sort of (government defined –  perhaps even real – hard), and thus specialist targeted compulsory state interventions are made, for their sake, for the sake of the kids, and the sake of the nation?

Not everything that can be done ends up being done.  But it is hard to maintain those boundaries, and doing so requires hard conversation, solid shared values etc, not just breathless enthusiasm for the merits of more and more linked data.

As I said earlier in the post, I’m torn.  There is some genuinely useful research emerging, which probably poses no threat to anyone individually, or freedom more generally.   And those of you who are Facebook users might tell me you have already given away all this data (for joining up) anyway –  which, even if true, should be little comfort if we think about the potential uses and abuses down the track.   Others might reasonably note that in old traditional societies (peasant villages) there was little effective privacy anyway –  which might be true, but at least those to whom your life was pretty much an open book were those who shared your experience and destiny (those who lived in the same village).   But when powerful and distant governments get hold of so much data, and can link it up so readily, I’m more uneasy than many researchers (government or private, whose interests are well-aligned with citizens) about the possibilities and risks it opens up.

So while Treasury is cheering the “revolution” on, I hope somewhere people are thinking harder about where all this risks taking us and our societies.

A stuff-up by Statistics New Zealand

Many readers will recall the fiasco of the leak of an OCR announcement back in March 2016.  It turned out that the Reserve Bank’s systems were had been so lax for years that people in the lock-ups they then held could simply email back to their offices (or to anyone else) news of the announcement that was supposed to be being tightly held.  This weakness only came to light because someone in Mediaworks emailed the news of this particular OCR announcement to their office, and someone in that office emailed me (from memory I was supposed to go on one of their radio shows later that morning).  I drew the matter to the Bank’s attention.

In the wake of that episode, the Bank (rightly in my view) cancelled the pre-release lock-ups for journalists and analysts.  But other government agencies went right on, relying on trust more than anything else.   One notable example was Statistics New Zealand, which produces and publishes many of the most market-moving pieces of economic data.    When asked about any possible changes to their procedures (outlined here) following the Reserve Bank leak in 2016, they responded

Statistics NZ has not undertaken any reviews or made any changes to the department’s policy for media conferences following the Official Cash Rate leak at the Reserve Bank of New Zealand and the subsequent Deloitte report into that leak released last week.


While Statistics NZ has never had a breach, if that trust is abused and an embargo is broken, offenders and their organisation would be barred from attending future media conferences.

As I noted back then

Unfortunately, that was probably the sort of discipline/incentive the Reserve Bank was implicitly relying on as well.

Unfortunately, after the confusion the Prime Minister gave rise to earlier in the week, confusing the crown accounts and GDP (which had some people abroad worried that the Prime Minister actually had had an advanced briefing), there was apparently more trouble this morning.  But this time, the fault was entirely with Statistics New Zealand, and not with those in the lock-up.

The embargo for the lock-up on gross domestic product (GDP) for the June 2018 quarter, held today, 20 September 2018, was lifted about one minute earlier than the planned time of 10.45am.

The lock-up is held in Stats NZ’s Wellington offices from 10am to 10.45am, to allow key financial media, bank economists, and other government agencies to understand the information and ask questions about GDP, before the embargo is lifted. It is held under strict embargo conditions.

Stats NZ staff in the lock-up check official New Zealand time on the Measurement Standards Laboratory of New Zealand (MSL) website.

However, a computer script (JavaScript) bug meant that the official time clock website that appeared on the staff member’s phone picked up the phone’s own time setting, which was slightly fast.*

In other words, those in the embargoed lock-up had the data –  and could communicate it to their dealing rooms – a minute earlier than anyone not in the lock-up got the data.     And it seems to have mattered.  GDP was higher than expected and the exchange rate jumped.   People who were in the lock-up got the jump on that.  I’ve heard that the exchange rate moved before 10:45 (the official release time), which isn’t surprising if people in the lock-up had been told the embargo had been lifted.

What is striking about the statement SNZ put out –  and it wasn’t exactly distributed widely (say, to all the people who got the GDP release itself) –  is that there is no mention at all of these possible early trades, which (in effect) distributed money/profits from one group of people (those not in the know) to another (those in the know).  Unlike the 2016 Reserve Bank leak, there seem to have been real financial consequences to this mistake.  And it isn’t clear that Statistics New Zealand is taking it that seriously.   When I asked about any investigation being undertaken, the implication of their reply was that there would be no further investigation or review beyond the narrow technical statement I linked to earlier. I hope that is not correct (and I hope, for example, the Reserve Bank is insisting on something more).

Writing about these data lock-ups in 2016 I noted of the SNZ situation

Is Statistics New Zealand that different?  There is, obviously, no policy message SNZ is trying to put across with its releases, and so no risks of different messages getting to different people.  But the security risks are the same.  Perhaps it is simply more efficient to have everyone in the same room, to clarify key technical points, but couldn’t the same end be achieved –  on a more competitively neutral basis (to analysts based abroad, say) –  by a dial-in (even webcast) conference call held a bit later on the day of the release?

That still seems right to me. I cannot see the case for a pre-release lock-up (and I can see a case for a technical conference call later in the day).   Mistakes will happen while they keep on with lock-ups.   The reliance on trust seems to be as strong as ever, and (as far as we know) that has been honoured.  This time, the stuff-up was by Statistics New Zealand themselves.   It was unnecessary, and it will at the margin (and especially in conjunction with the political contretemps earlier in the week) damage confidence in our statistics agency and the integrity of our data.

On our disappearing migration data

Having written here earlier in the week about the reckless and irresponsible way in which the government and Statistics New Zealand are degrading the quality of our very timely net immigration data (itself a major, and quite cyclically variable, economic indicator), I noticed a couple of comments that prompted me to dig out some numbers for this post.

The first, in a comment here, was that the self-reported intentions-based PLT measure probably couldn’t be counted on as very accurate anyway.  And the second, in someone else’s commentary, was that at least we will still (I hope) have monthly reporting of total passenger movements (tourists, business travellers etc as well as the permanent and long-term movements) from which a reasonable steer might be gleaned.

The best way of looking at whether the PLT measures are reasonable is to compare them with the new 12/16 method numbers –  available with a long lag, but which involve looking back, using passport records, and checking which people actually came (or went) for more than 12 months ((the threshold for the PLT definition).   Unfortunately, SNZ is still not publishing seasonally adjusted estimates for the 12/16 method numbers, so one can only really do the comparisons using rolling annual totals.   On this chart, I’ve shown the rolling 12 month totals for (a) the 12/16 method, (b) the PLT series, and (c) total net passenger movements for almost 30 years (although the 12/16 method data are only available this century).

migration 31 Aug

All the cycles are pretty similar, at least if one takes a broad sweep of the data.  That isn’t surprising, as most short-term visitors go home again pretty quickly, leaving something like an underlying trend of permanent and long-term movements.   And it confirms that the PLT numbers have been a useful –  although not perfect –  indicator of the actual permanent and long-term movements (captured in the 12/16 numbers).  Importantly, the turning points tend to be very similar.

One wouldn’t expect those two series to be the same, as they measure different things: the PLT numbers are about intentions, and if plans change so will behaviour.  If lots of people come to New Zealand (or leave for Australia) and things don’t work out and they change their mind, ideally we would want to know.    The divergence that looks to have opened up between the grey and orange lines at the end of the (grey) series might prove to have been something like that.  But in future we won’t know because (a) we won’t have the PLT data at all, and (b) the grey line will only be available with a reasonable degree of certainty with quite a long lag.   As a reminder, here is the new SNZ chart I included in the post the other day, illustrating the huge error margins around the timely estimates SNZ proposes publishing using their new (unpublished and untested) methodology.


But the other thing worth noticing is how noisy the blue line is.  There is a great deal of volatility, which makes distilling any signals (about permanent and long-term movements) very hard on a timely basis. That was why the PLT numbers have been so useful.  The blue line is thrown around in particular by big sporting events: eg the Lions tours in 2005 and 2017, and the Rugby World Cup in 2011.    There are big additional net arrivals, and then big additional net departures a month or two later, with mirror effects in the annual numbers a year later as well.  I have found the total net passenger arrivals data useful in the past –  in both 2002 and 2011 they pointed to something larger in the permanent and long-term movements than the PLT numbers themselves were reflecting, and that sense was later reflected in the 12/16 numbers (much larger net inflows in 2002/03, and somewhat larger net outflows in 2010/11).

What of the monthly seasonally adjusted data (the stuff designed for high frequency timely monitoring)?  Here is a chart of the PLT and total series, with scales set so as not to allow the flows associated with the Rugby World Cup (in particular) to dominate the chart.

migration mthyl sa

At a monthly frequency, the noise in the total passenger (orange) line totally dominates any signal, while the volatility in the monthly PLT series (that we are soon to lose altogether is very small).    What should perhaps be more concerning –  and is a bit perplexing –  is why the volatility of the total passenger series is itself quite variable across time, even outside the months associated with major sporting events.   Right now, for example, the volatility in the monthly series is quite extreme.    Here is the same chart for just the last four years or so.

migration mthly

The Lions Tour is very evident in mid-2017, but the heightened volatility goes well beyond that.

All of which leaves me not quite sure what to make of the very first chart.   The blue line (annual net inflows of all passengers) has fallen back a long way already (down from around 80000 to around 40000), and similarly-sized falls in the past have often been coincident with, or perhaps a little ahead of, large falls in the PLT numbers (and the 12/16 numbers).  There are some reasons to think we might see something similar now.  Fortunately, for the next couple of months we will still have the PLT data

PLT mthly

But after that –  thanks to government and SNZ choices –  we will be flying blind.    We’ll have good information eventually on what actually happened, but it will be available with such a lag as to be more use to economic historians than to people trying to make sense of, and respond to, contemporaneous economic developments.  And the net total passenger movements data is sufficiently noisy that it probably won’t give us much of a steer (and even then with big error margins) before the lagging 12/16 data do.

This is simply reckless behaviour around a major set of timely economic data.

Do they expect to be taken seriously?

I don’t really have time for this today, but….

I wrote again yesterday about how getting rid of departure cards seems set to degrade the quality of our timely net migration data (currently some of the best available anywhere in the world, which we need since our net migration flows are large and volatile).  SNZ has previously promised that future PLT estimates

will be generated through a probabilistic predictive model of traveller type (ie short-term traveller, or long-term migrant), based on available characteristics of travellers. Such a model will provide a provisional estimate of migration, which we can then revise (if required) as sufficient time passes for us to apply the outcomes-based measure.

In media commentary yesterday, the Minister of Immigration was heard to suggest that under the new system the data will be better than what we’ve had now.

That seemed unlikely, but later yesterday morning SNZ put out a media release including this

Moving to the new methodology means it will be 17 months before final migration estimates are available. That’s because someone has to be in the country for 12 months out of 16 before they can be classified as a long-term migrant.

“A delay of that length would have been unacceptable to those who rely on migration data for planning and analysis, so we are developing a statistical model that will provide a provisional estimate of migration. A first look at provisional external migration estimates will be released tomorrow,” said Mrs Theyers.

In future, statistics for New Zealanders travelling overseas will be largely based on when they return. Some variables – including occupation and country of next residence – will no longer be available.

That statement itself confirmed one of my points –  some important data is going to be lost altogether (eg data on net outflows to Australia will in future have to be inferred, rather than available directly –  and while I’m sure that isn’t the motivation, that will be convenient for governments).  But there was a promise that they would reveal more today.  I was hopeful we might get a proper discussion paper, with details of their modelling techniques, and the results of backtesting, and (for example) the identification of key periods (especially around turning points –  a key focus of macroeconomic analysts) where the new procedure worked well and when it hadn’t.

But no.

What was released this morning was three charts and a page of text.  There is nothing about methodology, nothing about backtesting, nothing about the identification of turning points, in fact nothing that any serious analyst is likely to find useful.

We are told

To mitigate the impacts of such a delay, we are developing a statistical model that gives provisional estimates of migration to give a timelier statistic. The first provisional migration estimates are now available.

“Preliminary data presented today gives our customers their first glimpse of what migration statistics will look like once the outcomes-based approach becomes the official way we measure migration in New Zealand,” population insights senior manager Brooke Theyers said today.

But nothing at all about the model.

But here are results they are happy to show us


(I presume that these numbers are not seasonally adjusted, which probably accounts for some of the jumping around in the median estimates from month to month).

Recall that under the 12/16 methodology, the numbers from 17 months ago become final (and are, in many –  but not all – respects better quality than the current PLT numbers).  But the latest monthly data has huge margins of errors –  even a 50 per cent confidence interval looks to be about 3000 people wide (on a monthly basis –  and bearing in mind that the average monthly inflow in recent years has been about 6000 people).

But to repeat:

  • no model,
  • no series as to how the estimates have evolved over time with the addition more data,
  • no backtesting,
  • no analysis of turning point information

Almost nothing at all.  And none of this is being consulted on, instead the government and SNZ are simply junking one of our best high frequency sets of economic data, about a variable which adds considerable volatility to the New Zealand economy.   We should expect a lot more, especially from a notionally independent national statistics agency.


Tossing away valuable emigration data

We had confirmation yesterday that departure cards are to be scrapped.    This was flagged by the Prime Minister a few months ago, and I wrote about the issue here.   Since then it appears that there has been no proper public consultative process.

As I noted in March

I’m sure airlines and airport operators hate the cards.  There have been prevous efforts to get rid of them.  They are, nonetheless, a core element of the data collections (in conjunction with arrivals cards) that give us some of the very best immigration data anywhere.  In a country with –  year in, year out – some of the very largest immigration, and emigration, flows anywhere in the advanced world.

We are told by the government that this brings us more into line with other countries

On Sunday, Lees-Galloway said the move would bring New Zealand into line with other countries, few of which had departure cards with the level of detail required by the New Zealand card.

(although even then we appear to overshooting in scrapping the cards completely).

But the statistical and related policy issues New Zealand grapples with are different from those in many other countries, most of whom don’t have big outflows of their own citizens, or big cyclical fluctuations in those flows.     Immigration of non-citizens is managed through the administrative approvals required to get a visa.  But people don’t need government approval to leave again and New Zealanders (of course) are free to come and go without any prior approval from the New Zealand government.

So departure cards captured the intentions of people coming and going.  Those stating that they intend to have changed countries for 12 months or more make up the permanent and long-term migration data that, for decades, has been a major and very timely indicator of what is going on, in a country with some of the largest swings in net migration of any country in the world.   It isn’t as perfect indicator by any means –  very timely ones rarely are – but it has consistently contained valuable information, especially around turning points.   And now the government proposes to scrap this data collection.

The Minister of Customs reckons the cards aren’t necessary

Customs Minister Meka Whaitiri said the cards were no longer needed for their original purpose – to account for all passengers crossing the New Zealand border.

“We have smarter systems now that capture passenger identity information and travel movement records electronically,” she said.

“Information captured by the departure cards is now mainly used for statistical purposes.

“Statistics NZ has developed an alternative way to produce migration and tourism statistics, based on actual movements rather than passengers’ stated intentions on the departure cards.”

I certainly agree that departure cards aren’t needed to capture the total flows, but it is the timely breakdown of that data that has been extensively used for decades.   And the operative word there is “timely”.  The new 12/16 method data –  looking back and seeing how long people were actually here/away – is better for long-term analytical purposes, but it is available only with a 17 month lag, whereas the departure card based data is available within weeks.  That difference matters, and it is worth bearing in mind that 17 months is almost half a parliamentary term.

We are told that Statistics New Zealand has “developed an alternative way to produce” the data, but we’ve seen no details of this, and there has been no consultative document made available for comment.  In  my earlier post I included this quote from SNZ claiming that in future estimates of the PLT breakdown

will be generated through a probabilistic predictive model of traveller type (ie short-term traveller, or long-term migrant), based on available characteristics of travellers. Such a model will provide a provisional estimate of migration, which we can then revise (if required) as sufficient time passes for us to apply the outcomes-based measure.

I commented then

I hope that they plan to rigorously evaluate the accuracy of such models, including when they’ve worked well and when they haven’t, and how well they capture the effects of policy changes, and that they expose their models and evaluation to external scrutiny before scrapping such a valuable source of hard data as the departure card.

But we have seen no sign of such an evaluation at all, and yet in a few months that data that have been used for decades will be discontinued, with  no ability to recreate it in future if the new models that are talked about prove not to have been very good.

Without seeing the models it is hard to comment on where they might go wrong.  But the key point is that statistical models often work fine when past behavioural patterns keep on as they were in the past, and they often fail when behaviour changes.   It is the behavioural changes that are often of most interest to the analyst, and it looks as though there will now be very long lags before we have the data to enable any such changes to be recognised.

I just heard Iain Lees-Galloway claiming on Radio NZ that future statistical information will be improved by scrapping the departure cards.  That seems very unlikely – essentially impossible, because you cannot really know the intentions of travellers other than by asking them, and intentions actually matter in this business.

I could add to the lament around official immigration statistics that there has still been very little progress in making available regular, timely, seasonally adjusted, accessible data from MBIE on visa approvals.  These are major economic and social data for New Zealand, which should be readily available almost instantly, including through SNZ’s Infoshare site.  There is no reason why immigration approvals data should not be at least as readily useable as, say, building approvals data. I know MBIE has a project underway to improve the situation, and make available an immigration data dashboard, but it seems to be moving very slowly –  it must be a year now since MBIE first told me about it, and it is months since the person doing the work invited me to provide comments on a prototype.  It is encouraging that something appears to be in the works, but in the meantime we limp on with inadequate, not user-friendly, administrative data, while the government simply abandons the best timely data we have on what people leaving New Zealand are planning to do.




Census day

It is census day.  I’d probably be filling in my census forms now, except that they haven’t yet arrived.  We wanted paper forms, and I rang to request them within minutes of the initial SNZ letter arriving, with the access codes.  That was 10 days ago now.  So I’m less than impressed.  Doubly so as one of my kids is off at a school camp, and whereas I’d planned to fill out most of her form for her and send it along with her, since the forms haven’t arrived some teacher will presumably be overseeing her completion of the form, including a bunch of quite sensitive information that is simply none of the teacher’s business.

There is an article in the Dominion-Post this morning (“An intrusive, insulting exercise”)  from a journalist attacking the very existence of the census.  I’m torn.  I’m a keen user of some census data.  But I can’t help wondering what business it is of the state to coerce –  under direct threat of prosecution –  much of this information out of people.  As the journalist notes, threats to government data security have become more real.  And I also wonder whether Statistics New Zealand is not increasingly an instrument of a socio-political agenda (note the several pages of defensiveness about the absence of “gender identity” questions – this time).    Glancing through the questions, I’m also struck by the imprecision of several of them (eg under “Which country were you born in?” the third option is “England”, which is barely more of a country than, say, Canterbury or Otago are –  the latter two had their own parliaments rather more recently (1876) than England did (1707?).

The ethnicity question has been in the media in the last few days, with some  people bothered that “Pakeha” isn’t an option.  I guess they have a point.  But what bothered me was something else. Here is the question.


How many Niueans are there in the entire world?   Apparently about 25000.  At the last census there were more than 200000 people in New Zealand born in “England” –  plus others who probably identify as English.   And yet SNZ don’t even list it as one their top 8 options.  It would be interesting to understand why.  I’d probably normally tick the form as “NZ European”, but I think that (when the forms finally arrive) this time I might write down English, Scottish, Northern Irish, and perhaps British as well.  Since SNZ tell us ethnicity is, on their reckoning, self-perceived, the answers won’t (can’t really?) be wrong –  and those places are where my ancestors come from.

There are questions that leave one wondering about the reliability of the results of the census.  Here is a language question


On a form in English, they feel the need to remind us to remember to mark English if we can have a conversation in English?.  Quite how thick do they think respondents are that they need to talk down to us thus?   (And why is it any business of the government whether someone can hold a conversation in, say, Pukapukan or Polish? –  English, Maori, and Sign Language might, arguably, be a different matter.)

Then there is the religion question.


I consistently refuse to answer this question, not because I’m ashamed of my faith – Christian – but because it is the one question I’m lawfully allowed to refuse to answer.   The government and SNZ attempt to market the census on the basis of all the important public policy/spending choices it will inform, but it isn’t clear what decisions they think they will be making on the basis of individual’s declared religion (or lack of it).   And then there is the picky point: few Presbyterians will think of “Presbyterian” as a religion, but as a denomination within Christianity.

And then there is the question that probably bothers me most.


Quite what business is any of this to the government?  Frankly, if I had difficulty washing or dressing, I’d rather take the risk of being prosecuted (or perhaps even lie) than face the humiliation/embarrassment (as many will regard it) of writing that down on some government official’s form.

There are the questions that look like some activist’s request


What marks out cigarettes, in the minds of the bureaucrats who put this together, from pipes or cigars?  What business is it of the government’s.    And if cigarettes, what about alcohol, drugs, or other things people might think of as social vices –  “have you ever requested a single-use plastic bag?” for example.  Then again, perhaps I shouldn’t encourage them.

And, to the very end, the worthy social agenda continues.  The form ends –  the sample on-line firm, not yet having got my forms –  thus.


Actually, if there are blank unused forms, I’d prefer to rip them up, drop in the rubbish bin and see them off to the landfill.  But quite what I do with my rubbish shouldn’t really be any concern of Statistics New Zealand.

For much of the sort of information in the questions I’ve highlighted, it is hard to see a legitimate public policy interest in the information (coerced as you’ll recall) and also hard not to think that to the extent that there is interest in the issues in some quarters, reasonable steers could not be obtained much more cheaply, and non-coercively, through the use of well-designed voluntary surveys, undertaken at the expense of those interested in the data, and without the privacy concerns regarding the provision of so much joined-up data in one place to public servants.


Please improve immigration data, not undermine it

On her visit to Australia, the Prime Minister has been quoted as suggesting that departure cards might soon be discontinued, and that she will be pursuing her Customs and Statistics ministers on the matter.

I’m sure airlines and airport operators hate the cards.  There have been prevous efforts to get rid of them.  They are, nonetheless, a core element of the data collections (in conjunction with arrivals cards) that give us some of the very best immigration data anywhere.  In a country with –  year in, year out – some of the very largest immigration, and emigration, flows anywhere in the advanced world.

I wrote about this a few months ago when, under the previous government, Statistics New Zealand publicised the possibility/likelihood of departure cards being discontinued.  At the time, SNZ suggested that

“In the near future, the outcomes-based ‘12/16-month rule’ is expected to become a key component in how we determine the number of migrants in New Zealand.”

The “12/16 data” are the new series of permanent and long-term movements derived by lining up, using passport details, people coming and going, and waiting until more than a year after the initial movement to see if the movement loooks permanent or long-term.    It is all very interesting – I’ve praised SNZ for putting the collection in place –  and provides a more accurate measure of actual long-term comings and goings than the (stated intentions based) arrival and departure card.   But it is only available with a very long lag  (ie more than 16 months), whereas the existing PLT data are available monthly, with a few weeks lag (and in principle could be produced even more frequently).

I’m reproducing here the concerns I expressed in September

I’ve explained here previously why the resulting PLT data has its limitations.   It isn’t a good basis to use to look at immigration policy itself.  Approvals data from MBIE is better for those purposes –  and would be better still if they made the information available in an accessible format on a more timely basis.     And the PLT data are based on self-reported intentions, and intentions aren’t always what people end up doing.  Some people think they are leaving permanently, and are back six months later, and vice versa..   But intentions data isn’t nothing either  (just as business surveys capture intentions/expectations and things don’t always turn out as they expect).    The patterns –  and especially the cyclical patterns, the turning points –  in the PLT data tend to match those in the (lagged) 12/16 data quite closely.

There are quite enough gaps (and long lags) in New Zealand economic data as it is –  monthly CPIs, monthly manufacturing data, quarterly income measure of GDP just for starters –  that I’m just stagggered that key economic agencies are apparently willing to let SNZ/Customs go ahead and consider dropping departure (and arrival?) cards.  Where are Treasury and the Reserve Bank on this?

How, specifically, does it matter?   Without departure or arrival cards we would, of course, still have immigration approvals data for most non-citizens (other than Australians).  In principle, they could be published weekly or monthly with just a day or two’s lag, and be available in quite accessible formats.  Since approvals lead actual arrivals, there is certainly useful information in those approvals numbers (it is just that they aren’t made easily available now).

We could presumably also have data on the total number of people crossing the border (gross and net) from passport scanning.   I’m not aware that those numbers are published at present, but they could be.  And presumably they could be broken down by nationality (or at least by the passport the person happened to be travelling on).    That would be useful –  relative to having no arrivals or departures data –  but not very.   If you look at total net arrivals or departures (or net) data it is enormously volatile, and thrown around things like Lions tours –  in other words, holidaymaker and other short-term visitor numbers swamp movements of migrants.   Using that data alone, we’d have no ability to pick turning points for some considerable time after the turn had already happened.

The gaps would be particularly serious for the movement of New Zealanders, and more than half the variability in the 12/16 measure of net migration has arisen from fluctuations in the movements of New Zealanders.  We would have no secure way of knowing if someone leaving was planning to be off for a week’s holiday, or intending to stay away for ever.  The 12/16 method would eventually tell us what they did –  but there is a lag of almost 18 months on the availability of that information.    And even if the new plan involves keeping arrival cards and only getting rid of departure cards, most of the variability in New Zealanders’ migration movements is in the numbers leaving, not the numbers arriving.

Less importantly, without the departure cards we would seem likely to lose the ability to analyse migration (including reflows outwards by migrants who become NZ citizens) by the birthplace of the migrant.

Perhaps someone has done a robust cost-benefit analysis on getting rid of departure (and arrival?) cards.  If so, I would be keen to see it, and particularly keen to see how the relevant officials have factored in the loss of some of world’s best migration data to macroeconomic monitoring and forecasting, in a country with some of the most volatile immigration flows in the advanced world (and not a great track record of getting monetary policy, or housing markets, right as it is).  And even if one sets aside the macroeconomic analysts interests, it is not as if net migration numbers are one of those issues of no political salience at all.  Put an 18 month lag on decent data, and you risk not silencing debate – which some might wish for – but allowing all sorts of misconceptions and concerns to flourish, which no one will be in a position to allay.  It would, frankly, seem crazy.    Immigration has a economic and political salience here which it might not have in a country with land borders and small permanent inflows/outflows.

Frankly, it looks like a pretty irresponsible proposal.   The departure cards provide the only information on what New Zealanders are doing, and the comings and goings of New Zealanders are a big part of the PLT migration story (and aren’t, of course, under government control).

And in case anyone thought the PLT numbers were simply flaky measures, with no information

…here are the total net flows on the two measures [12/16 in blue, PLT in orange]


They don’t match up perfectly –  one wouldn’t expect them to, and there is information even in the differences (eg what led people to change their plans) –  but no analyst would happily give up a series that provided a 17 month lead this (relatively) good on the 12/16 series.

Turning points matter a lot for macroeconomic analysis and monitoring, and the turning points in the two series are very similar.

The claim from Statistics New Zealand is that they can fill the gap with estimates that

will be generated through a probabilistic predictive model of traveller type (ie short-term traveller, or long-term migrant), based on available characteristics of travellers. Such a model will provide a provisional estimate of migration, which we can then revise (if required) as sufficient time passes for us to apply the outcomes-based measure.

I hope that they plan to rigorously evaluate the accuracy of such models, including when they’ve worked well and when they haven’t, and how well they capture the effects of policy changes, and that they expose their models and evaluation to external scrutiny before scrapping such a valuable source of hard data as the departure card.

And talking of data gaps, I’ve also written here before about the very long lags in MBIE making available, in readily usable form, the summary administrative data on actual immigration approvals (and estimates of the stock of migrants).   Some of the data you can get yourself, if you don’t mind manipulating spreadsheets that are hundreds of thousands of lines long, but for most people, for practical purposes, the data are really only available annually, and typically with quite a long lag.    That is really inexcusable.  Like it or not, immigration policy is a major instrument of government economic and social policy, and approvals data (and associated stock estimates) are a valuable part of informing the public debate.  Information is almost always better than no information.

[UPDATE: A reader highlights that not even the spreadsheets are currently available.]

MBIE publishes the summary results, and accessible tabular data, in their annual Migration Trends and Outlook publication. In many respects, it is a very useful publication, even if (a) the data are only annual (whereas, say, building approvals data are available monthly), and (b) the publication has a minimum lag of 4 to 5 months (in other words, data for the full year to June 2016 was only published in late November 2016).  That isn’t remotely good enough, especially for administrative data.  Neither MBIE nor SNZ has to collect the data –  it all sits in MBIE’s own systems, generated every single working day.  There is no obvious reason why the data  – all the summary data (number of approvals in each category, occupation, age, sex, country of origin etc) –  couldn’t be made accessibly available monthly within a few days of the end of each month.

I’ve made these criticisms previously.  And that was when Migration Trends and Outlook  was coming out on its normal slow timetable (a 5 to 17 month lag).   But go to the MBIE website looking for the 2016/17 publication –  in March 2018 –  and it still hasn’t been published, more than eight months after the end of the year in question, 20 months after the start of the period to which the data would releate.

Some readers might be inclined to suspect MBIE of some deliberate strategy to keep the information from the public.  I’m not.  That is not only because I’m not naturally a conspiracy theorist, and have had plenty experience of the failures of bureaucracy. It is also because a few months ago I was invited to a meeting by an MBIE official who was part of a team working on improving the Migration Trends and Outlook publication, looking for my comments/ideas on data, immigration research etc  The official seemed quite genuine, and enthusiastically told me of the efforts MBIE was putting in to improving the publications, and (if I recall rightly) the timeliness of the data (even while stressing that it was quite hard and there were “systemss issues”.  That meeting last year would have been before the usual publication data of the Migration Trends and Outlook publication, so I came away from the meeting quite encouraged.   I’m still quite willing to believe that MBIE has the project in hand, but in the meantime……where is the 2016/17 data?  It is now March 2018.