Assessing the National Surveys for its Representativeness: An Analysis of the Data Quality of the National Sample Survey (NSS)
Speakers:
Mudit KapoorISI Delhi
Abstract:-
Abstract: This paper is a quantitative analysis of the data quality of the National Sample Survey (NSS) in terms of three estimates, (i) the proportion of the rural population, (ii) the proportion of the Scheduled Caste (SC) population, and (iii) the proportion of the working-age (age between 15 and 59 years) population. We follow Meng (2018) to demonstrate that the data defect correlation, a measure of the correlation between the indicator variable, which takes a value of 1 if the population unit is selected and 0 otherwise, and the variable of interest, is significantly high, which warrants a reduction in the bias-adjusted effective sample size from more than 4.5 lakh observations to less than 500 to 5000, a reduction in statistical efficiency ranging from 97% to 99.8%. The paper has implications for surveys that use the same sampling strategy, such as the National Family Health Survey (NFHS) and the Periodic Labour Force Survey (PLFS). We emphasise that increasing the data quantity cannot address data quality issues. On the contrary, it leads to Big Data Paradox (Meng, 2018) “The more the data, the surer we fool ourselves.”
(joint with Shamika Ravi, S V Subramanian)