CRIS de-identification
Research type
Research Study
Full title
Assessment of the de-identification algorithm in UKCRIS
IRAS ID
260056
Contact name
Mike Denis
Contact email
Sponsor organisation
Clinical Trials Research Governance, University of Oxford
Duration of Study in the UK
0 years, 11 months, 31 days
Research summary
The widespread use of Electronic Health Records in UK Mental Health has the potential to lead to great breakthroughs in clinical research and thus improve the lives of patients. Clinical Records Interactive Search (CRIS) extracts data from NHS Trusts’ electronic health records for research purposes. A key step in making these data available to researchers is the robust de-identification of data to protect patients’ right to privacy.
The data in CRIS have both structured and unstructured fields. While de-identifying structured fields (such as field labeled ‘patient name’) is easy, de-identifying free text (such as clinical notes and correspondence) is much harder. Mistakes can occur for many reasons, such as misspellings.
In UKCRIS, the de-identification is currently done using a bespoke algorithm. The algorithm finds all words it considers personal identifiers and masks them with “ZZZZ”. The efficacy of this de-identification algorithm has not been explicitly studied to date. This is a problem, since it means that the patients and Trusts can only be offered estimated information about the robustness of the de-identification.
The current project, UKCRIS de-identification assessment, aims to fill this knowledge gap by assessing performance of the de-identification algorithm used in UKCRIS. This is done by comparing a sample of unmasked free text notes with the same notes which have been de-identified by the algorithm. By counting all patient identifiers present in the text and all cases where the algorithm has either masked or not masked an identifier or mistakenly masked a word that is not an identifier, we can determine how well the algorithm is able to de-identify patient data.
REC name
South Central - Oxford B Research Ethics Committee
REC reference
19/SC/0305
Date of REC Opinion
18 Jun 2019
REC opinion
Favourable Opinion