Estimation of proportion
Treść / Zawartość
W populacji składającej się z N elementów jest nieznana liczba M elementów wyróżnionych. W artykule w przystępny sposób prezentuję różne problemy związane z estymacją frakcji θ = M/N.
A population of N elements contains an unknown number M of marked units. Problems of estimating the fraction θ = M/N are discussed. The well known standard solution isˆθ = K/n which is the uniformly minimum variance unbiased estimator, maximum likelihood estimator, estimator obtained by the method of moments, and in consequence it shares all advantages of such estimators. In the paper some versions of the estimator are considered which are more adequate in real situations. If we know in advance that the unknown fraction lies in a given interval (t1, t2) and we consider an estimator ˆθ1 as better than the estimator ˆθ2 if the average of its mean square error is smaller on that interval, then the optimal estimator is given by (3). The values of the estimator for (t1, t2) = (0, 0.5) and for (t1, t2) = (0.3, 0.4) in a sample of size n = 10 if the number of marked units in the sample equals K, are given in the table TABELKA and the mean square errors of these estimator, versus the error of the standard estimator ˆθ = K/n are presented in Rys. 2. Averaging the mean square error with a weight function, for example such as in Rys.3, gives us the Bayesian estimator with the mean square error like in Rys. 4 (for n = 10). If in some real situations we are interested in minimizing the mean square error “in the worst possible case”, the adequate is the minimax estimator. Another situation appears if the population can be divided in some more homogenous subpopulations, for example in two subpopulations with fractions of marked units close to zero or close to one in each of them. Then stratified sampling is more effective; then the mean square error of estimation may be significantly reduced. In the paper the problem of randomizedresponses is also presented, very shortly and elementarily. The problem arises if a unit in the sample can not be for sure recognized as “marked” or “not marked” and that can be done with some probability only. The situation is typical for survey interview: it allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while remaining confidential. The final section of the paper is devoted to some remarks concerning the confidence intervals for the fraction. The exact optimal solution is well known for mathematicians but it is probably not very easy for statistical practitioners to follow all theoretical details, and typically confidence interval based on asymptotic approximation of the binomial distribution by a normal distribution are used. That is neither sufficiently exact nor correct. The proper and exact solution is given by quantiles of a suitable Beta distribution which are easily computable in typical statistical and mathematical computer packages.