Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method
Keywords:
Affinity coefficient, Pearson's correlation coefficient, Monte Carlo simulation method, probability lawsAbstract
The affinity coefficient and its extensions have both been used in hierarchical and non-hierarchical Cluster Analysis. The purpose of the present empirical study on the distribution of the basic and the generalized affinity coefficients and on the distribution of the standardized affinity coefficient, by the method of Wald and Wolfowitz, under different assumptions, is to assess the effect of the statistical probability distributions of the variables (columns) of the initial data matrix, and of the respective parameters, in the distribution of the values of these coefficients. We present some results concerning the asymptotic distribution of the referred coefficients under the assumption that the variables (for which the values of these coefficients ​​are calculated) are independent and have statistical probability distributions specified apriori. In this distributional study, based on the Monte Carlo simulation method, we considered ten well-known statistical probability distributions with different variations of the respective parameters. The simulation studies lead to the conclusion that the coefficients’ convergence for the normal distribution is quite fast and, in general, a good approximation is obtained for small sample sizes, that is for sample sizes above 20 and in many cases for sample sizes above 10.
References
Ahrens, J. H. and Dieter, U., “Computer Generation of Poisson Deviates From Modified Normal Distributionsâ€, ACM Trans. Math. Software, vol. 8, no.2, pp.163-179, 1982.
Aldenderfer, M. and Blashfield, R., Cluster Analysis, Sage University Paper, 44, 1984.
Bacelar-Nicolau, H., “Contribuições ao Estudo dos Coeficientes de Comparação em Análise Classificatóriaâ€, PhD Thesis, FCL, Universidade de Lisboa, 1980.
Bacelar-Nicolau, H., “Two Probabilistic Models for Classification of Variables in Frequency Tablesâ€, In: Bock, H. H. (Eds.), Classification and Related Methods of Data Analysis. North Holland, pp. 181-186, 1988.
Bacelar-Nicolau, H., “The Affinity Coefficientâ€, In: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday (Eds.), Berlin: Springer-Verlag, pp. 160-165, 2000.
Bacelar-Nicolau, H., “On the Generalised Affinity Coefficient for Complex Data.Biocybernetics and Biomedical Engineeringâ€, vol. 22, no. 1, pp. 31-42, 2002.
Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., “Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Setsâ€, Biocybernetics and Biomedical Engineering, vol. 29, no. 2, pp. 9-18, 2009.
Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., “Clustering Complex Heterogeneous Data Using a Probabilistic Approachâ€, In Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), published on the CD Proceedings of SMTDA2010 (electronic publication), 2010.
Brandt, S., Data Analysis – Statistical and Computational Methods for Scientists and Engineers, Third ed., Springer - Verlag, New York, 1999.
Box, G. E. P. and Muller, M. E., “A Note on the Generation of Random Normal Deviatesâ€, Annals of Mathematical Statistics, vol. 29, no. 2, pp. 610-611, 1958.
Dagpunar, J., Principles of Random Variate Generation, Clarendon Press, Oxford, United Kingdom, 1988.
Fraser, D. A. S., Non Parametric Methods in Statistics, Chapman and Hall, pp. 235-237, 1975.
Kachitvichyanukul, V., Schmeiser, B., “Computer Generation of Hypergeometric Random Variatesâ€, Journal of Statistical Computation and Simulation, vol. 22, pp. 127-145, 1985.
Kemp, C. D., “A Modal Method for Generating Binomial Variablesâ€, Commun. Statist. - Theor. Meth, vol. 15, no. 3, pp. 805-813, 1986.
L’Ecuyer, P., “Efficient and Portable Combined Random Number Generatorsâ€, Communications of the ACM, vol. 31, no. 6, pp. 742-751, 1988.
Lerman, I. C., “Sur l`Analyse des Données Préalable à une Classification Automatiqueâ€, Rev. Math. et Sc. Hum., vol . 32, no. 8, pp. 5-15, 1970.
Lerman, I. C., Classification et Analyse Ordinale des Données, Paris, Dunod, 1981.
Matusita, K., “On the Theory of Statistical Decision Func¬tionsâ€, Ann. Instit. Stat. Math., vol. III, pp. 1-30, 1951.
Matusita, K., “On the Notion of Affinity of Several Distributions and Some of its Applicationsâ€, Annals of Mathematical Statistics, vol. 19, no. 2, pp. 181-192, 1967.
Nicolau, F. C., “Cluster Analysis and Distribution Functionâ€, Methods of Operations Research, vol. 45, pp. 431-433, 1983.
Nicolau, F. C. and Bacelar-Nicolau, H., “Some Trends in the Classification of Variablesâ€, In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y. (Eds.), Data Science, Classification, and Related Methods. Springer-Verlag, pp. 89-98, 1998.
Nicolau, F. C., Bacelar-Nicolau, H., “Teaching and Learning Hierarchical Clustering Probabilistic Models for Categorical Dataâ€, Online IASE and ISI Conference Proceedings, IASE at ISI, 54, IPM-71, 2003.
Tiago de Oliveira, J., “The ï¤-Method for Obtention of Asymptotic Distributionsâ€, Applications. Public. Inst. Statist, Univ. Paris, vol XXVII, pp. 49-70, 1982.
Sousa, Ã., “Contribuições à Metodologia VL e Ãndices de Validação para Dados de Natureza Complexaâ€, PhD Thesis, Universidade dos Açores, 2005.
Downloads
Published
Issue
Section
License
- Papers must be submitted on the understanding that they have not been published elsewhere (except in the form of an abstract or as part of a published lecture, review, or thesis) and are not currently under consideration by another journal published by any other publisher.
- It is also the authors responsibility to ensure that the articles emanating from a particular source are submitted with the necessary approval.
- The authors warrant that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required.
- The authors ensure that all the references carefully and they are accurate in the text as well as in the list of references (and vice versa).
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Attribution-NonCommercial 4.0 International that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- The journal/publisher is not responsible for subsequent uses of the work. It is the author's responsibility to bring an infringement action if so desired by the author.