Abstracts
Abstract
Item response theory provides a useful and theoretically well-founded framework for educational measurement. It supports such activities as the construction of measurement instruments, linking and equating measurements, and evaluation of test bias and differential item functioning. It further provides underpinnings for item banking and flexible test administration designs, such as multiple matrix sampling, flexi-level testing, and computerized adaptive testing. First, a concise introduction to the principles of IRT models is given. The models discussed pertain to dichotomous items (items that are scored as either correct or incorrect) and polytomous items (items with partial credit scoring, such as most types of openended questions and performance assessments). Second, it is shown how an IRT measurement model can be enhanced with a structural model, such as, for instance, an analysis of variance model, to relate data from achievement and ability tests to students’ background variables, such as socio-economic status, intelligence or cultural capital, to school variables, and to features of the schooling system. Two applications are presented. The first one pertains to equating and linking of assessments, and the second one to a combination of an IRT measurement model and a multilevel linear model useful in school effectiveness research.
Keywords:
- Educational assessment,
- educational evaluation,
- item response theory,
- one-parameter logistic model,
- school effectiveness research,
- test equating,
- two-parameter logistic model
Résumé
La théorie de réponse à l’item (TRI) fournit un cadre utile et théoriquement bien fondé pour la mesure en éducation. Elle soutient des activités telles que la construction d’instruments de mesure, les procédures de mise en relation et de vérification d’équivalence des mesures, l’évaluation du biais d’un test et le fonctionnement différentiel d’items. Elle prévoit la base pour des banques d’items et des designs flexibles pour l’administration d’un test, comme les méthodes d’échantillonnage multicritérié, « flexi-level testing », et la méthode du test adaptatif par ordinateur. Tout d’abord, une brève introduction aux principes de modèles TRI est donnée. Les modèles discutés concernent des items dichotomiques (items qui sont corrects ou incorrects) et des items polytomiques (items à un crédit partiel, comme la plupart des questions ouvertes et questions de l’évaluation des compétences). Deuxièmement, on montre comment un modèle de mesure TRI peut être amélioré en utilisant un modèle structurel, par exemple, un modèle d’analyse de la variance, pour établir un lien entre les données provenant de tests pour mesurer le rendement et la capacité des élèves à des variables, tels leur statut socio-économique, leur niveau d’intelligence ou leur capital culturel, et à des variables caractérisant l’école et le système scolaire. Deux applications sont présentées. La première se rapporte aux procédures de type mise en parallèle (equating et linking), et la seconde à une combinaison d’un modèle de mesure TRI et d’un modèle linéaire multiniveaux utilisé dans la recherche relative à l’efficacité de l’école.
Mots-clés :
- Évaluation de l’éducation,
- théorie de réponse à l’item,
- modèle logistique à un paramètre,
- recherche sur les « écoles efficaces »,
- test equating,
- modèle logistique à deux paramètres
Resumo
A teoria de resposta ao item (TRI) fornece um quadro útil e teoricamente bem fundamentado para a medida em educação. Sustenta actividades como a construção de instrumentos de medida, os procedimentos de relacionamento e de verificação de equivalência de medidas, avaliação do desvio de um teste e o funcionamento diferencial de itens. Prevê a base para os bancos de itens e desenhos flexíveis para a administração de um teste, como os métodos de amostragem multicriterial, “flexi-level testing” e o método do teste adaptativo por computador. Antes de mais, é dada uma breve introdução aos princípios dos modelos TRI. Os modelos discutidos dizem respeito aos itens dicotómicos (itens que são correctos ou incorrectos) e a itens politómicos (itens de crédito parcial, como a maior parte das perguntas abertas e das perguntas de avaliação de competências). Em segundo lugar, mostra-se como um modelo de medida pode ser melhorado utilizando um modelo estrutural, por exemplo, um modelo de análise da variância, para relacionar os dados provenientes de testes para medir o rendimento e a capacidade dos alunos com variáveis, tais como o seu estatuto socio-económico, o seu nível de inteligência ou o seu capital cultural e com variáveis que caracterizam a escola e o sistema escolar. Apresentam-se duas aplicações. A primeira está relacionada com procedimentos do tipo colocar em paralelo (equating et linking), e a segunda é uma combinação de um modelo de medida TRI com um modelo linear multinível utilizado na investigação relativa à eficácia da escola.
Palavras chaves:
- Avaliação da educação,
- teoria da resposta ao item,
- modelo logístico de um parâmetro,
- investigação sobre as “escolas eficazes”,
- test equating,
- modelo de dois parâmetros
Download the article in PDF to read it.
Download
Appendices
References
- Béguin, A.A., & Glas, C.A.W. (2001). MCMC estimation and some fit analysis of multidimensional IRT models. Psychometrika, 66, 541-562.
- Birnbaum, A. (1968). Some latent trait models. In F.M. Lord & M.R. Novick (Eds.), Statistical theories of mental test scores (pp. 395-479). Reading, MA: Addison- Wesley.
- Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51.
- Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: an application of an EM-algorithm. Psychometrika, 46, 443-459.
- Bock, R.D., Gibbons, R.D., & Muraki, E. (1988). Full-information factor analysis. Applied Psychological Measurement, 12, 261-280.
- Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.
- Cardinet, J. (1997). From classical test theory to generalizability theory: The contribution ofANOVA. European Journal of Applied Psychology, 47, 197-204.
- Cardinet, J., Tourneur, Y., & Allal, L. (1976). The symmetry of generalizability theory: Applications to educational measurement. Journal of Educational Measurement, 13, 119-135.
- Cardinet, J., Tourneur, Y., & Allal, L. (1981). Extension of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18, 183-204.
- Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.
- Emons, W.H.M. (1998). Nonequivalent groups IRT observed score equating: Its applicability and appropriateness for the Swedish Scholastic Aptitude Test. Enschede, The Netherlands: Twente University.
- Fischer, G.H. (1974). Einführung in die theoriepsychologischer tests: Introduction to the theory of psychological tests. Bern: Huber.
- Fischer, G.H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48, 3-26.
- Fischer, G.H., & Molenaar, I.W. (1995). Rasch models. Their foundation, recent developments and applications. New York, NJ: Springer.
- Fox, J.P., & Glas, C.A.W. (2001). Bayesian Estimation of a Multilevel IRT Model using Gibbs Sampling. Psychometrika, 66, 271-288.
- Fox, J.P., & Glas, C.A.W. (2003), Bayesian modeling of measurement error in predictor variables using Item Response Theory. Psychometrika, 68, 169-191.
- Glas, C.A.W., & Suárez-Falcon, J.C. (2003). A comparison of item-fit statistics for the three-parameter logistic model. Applied Psychological Measurement, 27, 87-106.
- Glas, C.A.W., & van der Linden, W.J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 256-272.
- Goldstein, H. (1986). Multilevel mixed linear models analysis using iterative generalized least squares. Biometrika, 73, 43-56.
- Gulliksen, H. (1950). Theory of mental tests. New York, NJ: Wiley.
- Joreskog, K.G. (1970). A general method for analysis of covariance structures. Biometrika, 57, 239-251.
- Lawley, D.N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61, 273-287.
- Lawley, D.N. (1944). The factorial analysis of multiple test items. Proceedings of the Royal Society of Edinburgh, 62-A, 74-82.
- Lord, F.M. (1952). A theory of test scores. Psychometric Monograph, No. 7.
- Lord, F.M. (1953). The relation of test score to the trait underlying the test. Educational and Psychological Measurement, 13, 517-548.
- Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, N.J.: Erlbaum.
- Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174.
- McDonald, R.P. (1967). Nonlinear factor analysis. Psychometric Monographs, No.15.
- Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159- 176.
- Muthén, B. (1984). A general structural equation model with dichotomous, ordered, categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.
- Patz, R.J., & Junker, B.W. (1999). Applications and extensions of MCMC in IRT: Multiple Item Types, Missing Data, and rated responses. Journal of Educational and Behavioral Statistics, 24, 342-366.
- Pearson, K. (1904). On the laws of inheritance in man. II. On the inheritance of mental and moral character in man and its comparison with the inheritance of the physical character. Biometrika, 3, 131-160.
- Pearson, K. (1907). Mathematical contributions to the theory of evolution. XVI. On further methods of determining correlation. Draper’s Company, Research Memoirs, Biometrics Series IV, pp. 39. London: Cambridge University Press.
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
- Reckase, M.D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.
- Samejima, F. (1969). Estimation of latent ability using a pattern of graded scores. Psychometrika, Monograph Supplement, No. 17.
- Shalabi, F. (2002). Effective schooling in the West Bank. Doctoral thesis, Twente University.
- Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.
- Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271-295.
- Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.
- van der Linden, W.J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181-204.