Dr. Johanna Choumert-Nkolo, Henry Cust, Callum Taylor
Researchers and policy-makers increasingly recognise the growing importance of data in the design, implementation, and evaluation of development programmes. Generally, however, there has been relatively little research examining the methods used to collect high quality data.
In this mini blog post series, we present the results from our new paper in which we shed light on how to use paradata to improve survey data quality:
Choumert-Nkolo J., Cust H., Taylor C. (2019) Using paradata to collect better survey data: Evidence from a household survey in Tanzania. Review of Development Economics
In this first post, we present the concept of “paradata”. Although paradata are widely used in survey methodology research they are still much less familiar to development economists. Therefore, despite their potential as a powerful tool for improving data quality, paradata are clearly underused.
Survey paradata are data about the data collection process, such as survey timings, locations, and response rates; while survey questionnaire data are the actual responses of the individuals interviewed, and auxiliary data are complementary administrative data or census data. Researchers typically focus on survey questionnaire data which are then used for their analysis; yet, paradata constitute an invaluable tool to improve the quality of questionnaire data and understand potential measurement errors. Whilst paradata are not new, the advent of electronic data collection (Computer Assisted Personal Interviewing – CAPI) has helped collect more systematic paradata and formalise their use.
Examples of paradata are provided below:
Paradata | Measure |
Timestamps | Date and time of contact |
Number of interviews per day, average interview length | |
Time per question, time per section | |
Interviewers’ performance | |
Analysis of responses according to the day or time in the day | |
Field teams’ workload (budgeting, human resources) | |
Time between interviews | |
Measurement errors (respondents or interviewers who rush / low understanding of the questionnaire resulting in a long interview) | |
Interview interruptions (time gaps between sections / disturbing the flow of the questionnaire) | |
GPS coordinates | Track the movements of interviewers during and between interviews |
Identify coverage bias, e.g. in random walk sampling | |
Data correction, data entry, keystrokes | Navigation throughout the questionnaire (e.g. time, change of answers) |
Counts of household visits/contact attempts | Level of effort among interviewers |
Cost / response rate analysis | |
Inform on the best time to visit respondents for future surveys and follow-up surveys | |
Non-response rate | Acceptability of the survey overall or for specific populations |
Interviewer trends | |
Non-response bias (completed interviews, reasons for refusal, interviewer’s observations, …) | |
Audio recording[1] | Audio audit, number of interruptions |
Interviewers’ characteristics (Gender, age, experience, etc.) | Interviewers’ trends on various outcomes |
Random number generator | Respondent selection, order of response list, order of questions |
[1] Audio-recording should be used carefully and only with informed consent of respondents.
Paradata are a great tool to monitor data quality in real time and to manage data collection costs and resources. Using paradata to monitor fieldwork allows researchers to identify issues or idiosyncrasies developed by specific interviewers and to take actions while fieldwork is on-going within minimal time lag.
To conclude, paradata should be used to improve data quality for (i) testing and piloting questionnaires and field protocols to manage time and resources more effectively, (ii) monitoring fieldwork and data quality on a day to day basis, (iii) and for cleaning data and assessing data quality.
In the next blog post, we will focus on timestamps and provide specific examples on how we have used them to monitor data quality for a field survey we conducted in Tanzania.