Where to go from here

Some Observations

An ever-changing landscape

The world data and governance seems to be in a time of rapid change, as the policy makers realise the importance of evidence and data in shaping the future of society. In the course of writing this report, the UK acquired for the first time a legal framework enabling government departments to share data with researchers for research purposes (Digital Economy Act, 2017), HMRC data became available for uses other than tax policy research, the Department of Work and Pensions released data for 13 of its core benefits and decisions (up from 2 in its beta form).

In March 2016, Scotland acquired new devolved powers on some tax and welfare and employment support benefits (Scotland Act 2016). Scotland is expected to acquire legislative control over 11 benefits after June 2017 and executive responsibility of all devolved benefits by April 202018. In addition to potentially introducing another variation in the types of data collected and a different set of administration and decision policies, the devolution of the powers will mean that more organisations will be involved as data controllers for data generated after that time, and that there will most likely break the continuum in the data collection of the devolved benefits.

This implies that sometime in the near future the way information about some of the benefits included here change, as administrative responsibilities transition from central government to the Scottish government.

All the above are some of the reasons why, any attempt to produce a comprehensive audit of the data landscape is nearly impossible and this report is bound to have missed key changes and information in it going out of date very quickly. See item (d) in this section though for a way forward.

Data (Il)literacy

In thelast decade or so, there have been great changes in the burden placed upon researchers when it comes to empirical research. Not only are researchers expected to have a thorough knowledge of the methodology of analysing both quantitative and qualitative data, but they are also expected to be data literate and know how to find their way around a complex system of governance (and data governance nonetheless) to get that data for their research. Shifting the culture to data-driven, accountable, empirical policymaking is a long process and change relies not just in data owners making available data for research, but also researchers being able and willing to use them for research that will impact society in a positive way and improve everyone’s lives. Researchers should seek every opportunity to collaborate with peers and policy makers to advance science and ensure that policy making is as well-informed as possible.

There is no consistency

There is great variety in the types of data researchers can use for research, both in terms of the quality, the coverage, the documentation as well as the process they have to go through to access it:

Availability, Volume and Quality

We found that the remarks of the Administrative Data Taskforce report – applied in most cases, although steps are being taken to transition to a more data-friendly and research friendly direction19. While in this transition phase though, it is inevitable that there will be great inconsistency in what data, how many and of which quality will be made available for research.

Some government departments (like DWP) are more data-minded and have made available (and are in the process of making available) more and more data, in a well-documented format for researchers to access (which they can also analyse online without a need to download everything) via their online tool (StatXplore). Others (like the HMCTS in England, Wales and Scotland or NI CTS/TAS in N. Ireland) publish some data in the form of word or pdf versions of the decisions and judgements, but not in a consistent tabular format that will facilitate statistical analysis.

The researcher is the one that most of the times will be tasked with selecting what is relevant for their research and what can be achieved within the related timescale that they have.

Method of Access

The process of negotiating and accessing data as well as the terms and conditions vary greatly, depending on the types of data requested, the data owners and how they have agreed with users of their service that their data will be used. Researchers are encouraged to have a look at the “How to access” sections for each of the datasets listed in this report. If negotiating access to a data source that is not already available via a clearly defined pathway, it is advisable that you enquire what is possible as early as possible in the negotiation process.

Coverage

Spatial coverage will depend on the topic and the remit of the government department or executive agency you are liaising with. A good way to find out is via their respective websites or by enquiring with their service staff directly. Temporal coverage is usually straightforward if the administrative process has not changed much over the years and the department’s record retention policy ensures records are kept for a long time. Note that there are great variations in these though: for example HMRC has a policy of retaining all of its tax records, some departments – like the dental health records held by GPs and the NHS Business Services Authorities are only kept for 5 years. Also, note that in the case where the administrative process has changed – e.g. if a benefit was replaced with another – data might not be cross comparable across.

Ownership

Identifying who is the owner of the data (the Data Controller as per the Information Commissioner’s Office definition) is not quite straightforward, as the department that collects the information might not necessarily be the one making decisions about the data. An example of this is the JobCentre data – these are collected by staff an individual centres across the country – these are submitted to the Department for Work and Pensions who are the ones deciding what can be done with the data and who can access it. Quite a lot of the data collected at the Local Authorities level is also submitted to central government departments – and they might have the right to let you access it. Understanding how the data flows (i.e. moves from organization to organization) and under what terms is crucial in understanding who you need to sign a Data Sharing Agreement with (i.e. a contract-like document giving you and/or your institution permission to acquire the data and use them for your project).

Sometimes data are collected and managed by private companies on behalf of government departments – this could prove to be a barrier if for example the government department cannot make it available for research because they are not the data owners.

The future is collaborative

The current work is an organic resource – as things are changing and as we get a better understanding of the field of Administrative Justice Research and its research landscape, we rely on everyone to help keep it up to date and enriched with data sources as much as possible.

Rather than a couple of people working on the report enquiring and collecting the information, it is hoped that by making this report available to the community as an open source public book via GitBook, researchers and other stakeholders will contribute to it as they work more on the data or as government departments make more data available.

I would like UK data on [X], but this is not listed here. What should I do?

Well this is great news! You have just found a potential gap in the data landscape and we would love to hear from you. In compiling this guide, we hit this wall a lot of times, and here is what we did:

Step 1. Do an online search. Use google.com, bing.com, yahoo.com, or any other of your favourite engines. Sometimes try more than one, as results can vary.

Step 2. Do a search on the Data.gov.uk, the ADRN Metadata Catalogue and the ESRC Business and Local Government Data Research Centre Data Portal online catalogues by department or topic.

Step 3. Try a search on the online catalogues of some of the repositories of social and economic data such as the Office for National Statistics, the UK Data Service, Stats Wales, Scottish Government Statistics, The Northern Ireland Statistics and Research Agency. Have a look at the Administrative data as a research resource: a selected audit report by Jones and Elias and the DataNav report by Amnesty International, the Engine Room and Benetech. Have a look at the ONS Statement of Administrative Data Sources, the one from Scotland, from Wales and from other departments.

Step 4. Try to understand how the administrative process works. Usually, the relevant government department will have information on their pages about the process, to help people through the process. Try to find out what forms people must fill in and who they need to submit them to for their request to move forward. These offices are usually a good starting point for researchers to enquire with and ask more information about data availability, quality and coverage.

Step 5. Visit the government department’s webpages. Due to the Transparency agenda, and the recently ascended Digital Economy Act (2017) most government departments make available some statistics, usually about the salaries or performance data. These statistics sometimes contain the contact details of the section or group of people within the department that produced these statistics – these are always a good starting point to talk to, to enquire about the availability of data and other potential data sources. Search for their list/statement of administrative sources, (see [the one from NHS] for example (http://content.digital.nhs.uk/pubs/listadminsources) most departments publish these as part of the transparency agenda on their websites, too.

Step 6. Ask. Get in touch with the UK Administrative Justice Institute researchers (check the UKAJI researcher register); get in touch with colleagues and people who have published on the topic.

Step 7. Be flexible. Be prepared to change direction and be open to suggestions of alternative data sources, as you proceed and find out more about what is available and whether it can be used for statistical analysis.

Step 8. Be persistent. Do not give up. Ask again. And again.

Step 9. Let us know – feel free to add the outcome of your findings on the community GitBook version of this guide, please see the how to contribute section or contact me if you could like more information about how to do this.

Getting the data

As mentioned above, this can vary greatly by country, department, types of data requested and level of detail of variables that you require. Once you have identified the right data for your research question, clarify with the data owners what the process is for applying to get it. In our report we have included information on how the data sources can be listed, so these should give you an idea of the different requirements.

None of the government departments we review in this report charge for making available their data to researchers – some others (like the NHS Digital) do though (see their current costs of managing the application, processing the data and providing access), so worth clarifying this as early in the process as possible, too.

Some funders issue special calls or have special funding streams that researchers can apply in the UK if they would like to do secondary data analysis (i.e. use already existing data rather than collecting new), such as the Economic and Social Research Council’s Secondary Data Analysis Initiative.

Analysing the data

Analysing administrative data falls within the field of Secondary Data Analysis, which involves the analysis of an existing data source which had previously been collected by another researcher or organisation, usually for different purposes(see Brewer, 2012; Elliott, 2015; Heaton, 2003; Johnston, 2017; Life, 2003; Tatsuoka, 2012). The main idea is that using data that has already been collected is highly efficient and economical – there is no extra cost for collecting the information. There is also another advantage: administrative data collections are as close as researchers could ever get to whole populations, with the opportunity to provide us with a detailed picture and leading to a greater understanding of the issues under consideration and better informed policy making. They also often contain information on populations who are under-represented in mainstream surveys.

However, the fact that the data has already been collected, for different purposes, means that researchers will most likely not have all the information on how and why certain types of information was collected. They might also not know when data collection changed (e.g. a variable stop being collected, or another one started being collected or even when the same variable referred to different metric!). As a result they might find that they have to spend some time familiarising themselves with the data, understanding them, cleaning and understanding the decisions around them. For more on the challenges and opportunities of using secondary data in research see (Heaton, 2008; Hinds, Vogel, & Clarke-Steffen, 1997; Trzesniewski, Donnellan, & Lucas, 2011; Vartanian, 2010).

Here are some useful resources on issues other than methodology that might assist you with your quest:

Data Management

A very good starting point is the ESRC Data management plan guidance. The UK Data Archive have an extensive set of resources on their pages, and they have published an authoritative guide on the topic. The Digital Curation Centre also has a dedicated website with resources on the topic and information on how to put together a data management plan for your research. JISC also make available a set of sources on how and why researchers should manage their research data, on Data protection and on Managing research data in your institution.

As with other types of data, research proposals using administrative data will undergo institutional ethics review, but might find that the set of rules and criteria applied to primary data collection does not directly apply to secondary data analysis of administrative data. Nevertheless, there are ethical factors to consider when using and linking such data in the context of the research project on the issues of ethics and consent most notably:

  • Administrative data is not primarily collected for research – but rather by government departments and other executive agencies for operational reasons. Individuals included in the data collection might not be explicitly aware of their information being used for research purposes, as the government department might not directly as for their permission or because they cannot be directly identified in the data. This is still a grey area, but highlights the researchers’ ethical and moral responsibility in ensuring that any use of the data is done in a way that will safeguard the confidentiality of the persons, even though the data might not be directly classed as ‘personal data’ as per the Data Protection Act.
  • The more data sources we link to each other, the higher the risk of statistical disclosure, of specific information from a de-identified data source accidentally being attributed to a specific individual household or business, thus identifying them. Government departments usually have procedures in place to ensure that this does not happen, but researchers should ensure they do their bit, too, by following best practice in data handling, storing and processing.

Note that when applying to government departments with a request to link your own (primary) data to their administrative records, you must provide evidence that you have sought adequate informed consent by your participants and that any copyright issues have been addressed.

There are several good reads on the topic of ethics and administrative data, such as the ESRC’s Framework for Research Ethics (2016) as well as these sources: (Bishop, 2017; Burton et al., 2015; Connelly, Playford, Gayle, & Dibben, 2016; Cooper, 2004; Evans et al., 2015; Gaye et al., 2014; Metcalf et al., 2016; McGuire et al., 2011; OECD, 2013; OECD, 2016; Richard & King, 2014; Stiles & Boothroyd, 2015; UK Cabinet, 2016; Weller & Kinder-Kurlanda, 2016; Zwitter, 2014).

The National Statistician’s Data Ethics Advisory Committee was set up to assist in cases where an ethics committee is unable to assess the ethics of a project using administrative data or where an ethics process is not available for researchers and they have extensive experience in assessing secondary data applications. They list a set of best practice principles for data projects, that might be helpful:

  1. The use of data has clear benefits for users and serves the public good.
  2. The data subject’s identity (whether person or organisation) is protected, information is kept confidential and secure, and the issue of consent is considered appropriately.
  3. The risks and limits of new technologies are considered and there is sufficient human oversight so that methods employed are consistent with recognised standards of integrity and quality.
  4. Data used and methods employed are consistent with legal requirements such as the Data Protection Act, the Human Rights Act, the Statistics and Registration Service Act and the common law duty of confidence.
  5. The views of the public are considered in light of the data used and the perceived benefits of the research.
  6. The access, use and sharing of data is transparent, and is communicated clearly and accessibly to the public.

Ensuring Confidentiality of Statistical Outputs

When acquiring or analyzing an administrative dataset, researchers and data owners must ensure the confidentiality of statistical outputs. The Office for National Statistics has guidance on the dissemination of statistics including their protocol on Release Practices. WISERD also has a useful resource on Statistical Disclosure Detection and Control in a research environment. This ADRN guide includes an introduction to one of the methods that can be employed to do that, called Output Statistical Disclosure Control, as employed within the ‘5-Safes’ Safeguarding framework.

HMRC have published their policy on confidentiality and access in their HMRC statistics pages, and so has DWP. The National Records of Scotland have also outlined the principles around confidentiality of statistical outputs from their Census data.

Data Linkage

Sometimes a single dataset will not provide all the information a researcher needs to do their analysis - it is very likely that they will maximise usefulness if linked with another administrative dataset, survey or even data they have collected themselves. Understanding the methodology around data linkage is crucial when doing research with secondary data, as it might introduce bias and might impact upon the quality of the data at hand.

An Introduction to Data Linkage (ADRN) gives readers a practical introduction to data linkage, covering data preparation, deterministic and probabilistic linkage methods and analysis of linked data, with examples relevant to health and other administrative data sources.

Information is occasionally made available via some data controllers (see for example the data linkage pages of the Understanding Society survey, the Scottish Government Data Linkage Strategy, and this summary page from the Administrative Data Liaison Service).

These sources offer more information on data linkage, its methodologies and examples of analysis using linked data: (Harron, Goldstein, & Dibben, 2015; Harron, Wade, Gilbert, Muller-Pebody, & Goldstein, 2014; Kinner et al., 2015; Lugg-Widger et al., 2017; Morgan, Marlow, Costeloe, & Draper, 2016; Pickrell et al., 2015; Setiawan et al., 2015; Vatsalan, Sehili, Christen, & Rahm, 2017; Waugh, Anand, Anderson, & De Wit, 2017)

The ADRN guide on Legal Issues for ADRN Users sets out the legal background to data protection laws in the UK, and offers a broad explanation of the current law relating to data sharing and linkage, as well as a consideration of the implications of the impending EU General Data Protection Regulation 2016.

These sources offer more information on legal issues around administrative and related data: (Dixon-Woods et al., 2017; Kaye et al., 2016; Laurie et al., 2015; Laurie & Stevens, 2016; Mitchell et al., 2017; Stevens & Laurie, 2014; Woollard, 2014).

Of relevance to work with administrative data is also the Digital Economy Act (2017), and in particular Part 5, Chapters 5 and 6 and the Data Protection Act (1998/2003/2015)

Data Quality

Common data quality concerns with administrative data include incomplete data, incorrect data formats and mistyped data. There is also sometimes inconsistency in the way staff or others contributing information are recording data in the data resource, particularly in those cases where data have not been collected with statistical analysis in mind. Handling the data (i.e. cleaning it, aggregating or linking it with other datasets) is also likely to introduce errors or bias in the analysis and the validity of the research and insights.

The National Statistician’s Office notes, published in 2014, provides some guidance on using administrative data for Government statisticians, and provides a good overview of the challenges and ways to work around them.

Since then, the UK Statistics Authority, via its Office for Statistics Regulation has issued a Regulatory Standard for the quality assurance of administrative data that are used to produce official statistics, this is available for download bia the Quality Assurance of Administrative Data pages, and includes the standard and a toolkit that helps users implement it.

The Office for National Statistics have used this frame work to make available notes on the quality assurance of administrative data used in the UK (see for example the UK Public Sector Finances report).


18. See for example this article
19. Such as the creation of the Administrative Data Research Network; the Government Digital Service, the Data.gov.uk data catalogue and the Digital Economy Act (2017)

results matching ""

    No results matching ""