Journal of Applied Mathematics and Computation

ISSN Print: 2576-0645 Downloads: 87905 Total View: 1370230
Frequency: quarterly ISSN Online: 2576-0653 CODEN: JAMCEZ

Effects of Zero Observations on Modelling Categorical Data

Tong Wang*, Wentao Li, Jing Ruan

Wenzhou Polytechnic, Wenzhou, Zhejiang, China.

*Corresponding author: Tong Wang

Published: May 6,2023


We study the maximum likelihood estimate by using the numerical method Newton Raphson under the Poisson sampling conditions. Log-linear model and logistic model are introduced here, where we both analyze the Independent model and Saturated model for the following studies. Two kinds of zero observations are considered when a model is fitting using MLE, they are sampling zeros and structure zeros. Take two-way contingency table for example, we obtain the correspondence from log-linear model to logistic model when zero entries a contingency table. Each term in a logistic regression model can correspond one term in log-linear model. We study the relationships between those terms through the standard errors in the presence of sampling zeros. We conclude that it is the numbers and positions of zeros that matters when it comes to the effects of zero observations. As for these problems, we analyze the reasons and proposed some useful suggestion to this.


[1] A. Agresti and M. Kateri. Categorical data analysis. Springer, 2011.

[2] R. Baker, M. Clarke, and P. Lane. Zero entries in contingency tables. Computational Statistics & Data Analysis, 3:33–45, 1985.

[3] M. Birch. Maximum likelihood in three-way contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), 25(1):220–233, 1963.

[4] Y. Bishop, S. Fienberg, and P. Holland. Discrete multivariate analysis mit press. Cambridge, Mass, 1975.

[5] Y. M. Bishop. Full contingency tables, logits, and split contingency tables. Biometrics, pages 383–399, 1969.

[6] Y. M. Bishop, S. E. Fienberg, and P. W. Holland. Discrete multivariate analysis: theory and practice. Springer Science & Busi-ness Media, 2007.

[7] M. B. Brown and C. Fuchs. On maximum likelihood estimation in sparse contingency tables. Computational Statistics & Data Analysis, 1:3–15, 1983.

[8] R. Christensen. Log-linear models and logistic regression. Springer Science & Business Media, 2006.

[9] D. J. Cole. Determining parameter redundancy of multi-state mark–recapture models for sea birds. Journal of Ornithology, 152(2):305–315, 2012.

[10] R. Doll and A. B. Hill. Smoking and carcinoma of the lung. British medical journal, 2(4682):739, 1950.

[11] N. Dureh, C. Choonpradub, and P. Tongkumchum. An alternative method for logistic regression on contingency tables with zero cell counts. Songklanakarin Journal of Science & Technology, 38(2), 2016.

[12] B. S. Everitt. The analysis of contingency tables. Chapman and Hall/CRC, 1992.

[13] S. S. Far, M. Papathomas, and R. King. Parameter redundancy and the existence of maximum likelihood estimates in log-linear models. arXiv preprint arXiv:1902.10009, 2019.

[14] S. E. Fienberg et al. An iterative procedure for estimation in contingency tables. The Annals of Mathematical Statistics, 41(3):907–917, 1970.

[15] S. E. Fienberg and A. Rinaldo. Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation. Journal of Statistical Planning and Inference, 137(11):3430– 3445, 2007.

[16] S. E. Fienberg and A. Rinaldo. Maximum likelihood estimation in log-linear mod-els—supplementary material. Technical report, Citeseer, 2012.

[17] S. E. Fienberg, A. Rinaldo, et al. Maximum likelihood estimation in log-linear models. The Annals of Statistics, 40(2):996–1023, 2012.

[18] M. Friedlander. Fitting log-linear models in sparse contingency tables using the emleloglin r package. arXiv preprint ar-Xiv:1611.07505, 2016.

[19] S. J. Haberman. The analysis of frequency data. 1974.

[20] B. Halpin. Modelling categorical data: Loglinear models and logistic regression. G&S, Limerick University, August, pages 26–30, 2002.

[21] D. Holt. Log-linear models for contingency table analysis: On the interpretation of parameters. Sociological Methods & Research, 7(3):330–336, 1979.

[22] T.-w. Hu. The fitting of log-regression equation when some observations in the regressand are zero or negative. Metroeconomica, 24(1):86–90, 1972.

[23] W. Jing and M. Papathomas. On the correspondence of deviances and maximum likelihood and interval estimates from log-linear to logistic regression modelling. arXiv preprint arXiv:1711.10440, 2017.

[24] J. B. Lang. On the comparison of multinomial and poisson log-linear models. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):253–266, 1996.

[25] J. I. Myung and D. J. Navarro. Information matrix. Encyclopedia of Statistics in Behavioral Science, 2005.

[26] M. Papathomas. On the correspondence from bayesian log-linear modelling to logistic regression modelling with g-priors. Test, 27(1):197–220, 2018.

[27] R. L. Plackett. The analysis of categorical data. 1974.

[28] W.-Y. Poon, M.-L. Tang, and S.-J. Wang. Influence measures in contingency tables with application in sampling zeros. Sociological methods & research, 31(4):439–452, 2003.

[29] M. Ridout, C. G. Dem´etrio, and J. Hinde. Models for count data with many zeros. In Proceedings of the XIXth international biometric conference, volume 19, pages 179–192. International Biometric Society Invited Papers. Cape Town, South Africa, 1998.

[30] A. Rinaldo. Computing maximum likelihood estimates in log-linear models. 2005.

How to cite this paper

Effects of Zero Observations on Modelling Categorical Data

How to cite this paper: Tong Wang, Wentao Li, Jing Ruan. (2023) Effects of Zero Observations on Modelling Categorical Data. Journal of Applied Mathematics and Computation7(1), 177-187.