Abstracts
Abstract
In this article, we report on an empirical study conducted to evaluate the utility of analytic rubric scoring (ARS) vis-à-vis comparative judgment (CJ) as two approaches to assessing spoken-language interpreting. The primary motivation behind the study is that the potential advantages of CJ may make it a promising alternative to ARS. When conducting CJ on interpreting, judges need to compare two renditions and decide which one is of higher quality. Such binary decisions are then modeled statistically to produce a scaled rank order of the renditions from “worst” to “best.” We set up an experiment in which two groups of raters/judges of varying scoring expertise applied both CJ and ARS to assess 40 samples of English-Chinese consecutive interpreting. Our analysis of quantitative data suggests that overall ARS outperformed CJ in terms of validity, reliability, practicality and acceptability. Qualitative questionnaire data helped us obtain insights into the judges’/raters’ perceived advantages and disadvantages of CJ and ARS. Based on the findings, we tried to account for CJ’s underperformance vis-à-vis ARS, focusing on the specificities of interpreting assessment. We also propose potential avenues for future research to improve our understanding of interpreting assessment.
Keywords:
- analytic rubric scoring,
- comparative judgment,
- spoken-language interpreting,
- rater-mediated assessment,
- interpreting quality assessment
Résumé
Dans cet article, nous rendons compte d’une étude empirique menée pour évaluer l’utilité de la notation analytique (ARS) par rapport au jugement comparatif (CJ) en tant que deux approches pour évaluer l’interprétation en langue parlée. La principale motivation derrière l’étude est que les avantages potentiels du CJ peuvent en faire une approche prometteuse par rapport à l’ARS. Lors de la conduite de CJ sur l’interprétation, les juges doivent comparer deux interprétations et décider laquelle est de meilleure qualité. Ces décisions binaires sont ensuite modélisées statistiquement pour produire un ordre de classement à l’échelle des rendus du « pire » au « meilleur ». Nous avons mis en place une expérience dans laquelle deux groupes d’évaluateurs/juges de différentes expertises de notation ont appliqué CJ et ARS pour évaluer 40 échantillons d’interprétation consécutive anglais-chinois. Notre analyse des données quantitatives suggère que l’ARS globale a surpassé CJ en matière de validité, fiabilité, praticité et acceptabilité. Les données du questionnaire qualitatif nous aident à obtenir un aperçu des avantages et des inconvénients perçus par les juges/évaluateurs de CJ et ARS. Sur la base des résultats, nous avons essayé de tenir compte de la sous-performance de CJ vis-à-vis de l’ARS, en nous concentrant sur les spécificités de l’interprétation de l’évaluation. Nous proposons également des pistes de recherche futures pour améliorer notre compréhension de l’évaluation de l’interprétation.
Mots-clés :
- notation des rubriques analytiques,
- jugement comparatif,
- interprétation de la langue parlée,
- évaluation par un évaluateur,
- évaluation de la qualité en interprétation
Resumen
Este artículo expone un estudio empírico acerca de la utilidad de la notación analítica (ARS) con relación al juicio comparativo (CJ) como enfoques para evaluat la interpretación en lengua hablada. Se justifica este estudio por las ventajas potenciales del CJ comparado con ARS. Al llevar a cabo un CJ en interpretación, los jueces deben comparar dos interpretaciones y decidir cuál es mejor. Luego se modelizan estadísticamente estas decisiones binarias para generar un orden de clasificación de las producciones de la «peor» a la «mejor». Hemos realizado un experimento en el cual dos grupos de evaluadores/jueces expertos en distintas formas de notación emplearon CJ y ARS para evaluar 40 interpretaciones consecutivas del inglés al chino. El análisis de los datos cuantitativos sugiere que ARS superó globalmente CJ en cuanto a validez, confiabilidad, practicidad y aceptabilidadmatière de validité, fiabilité, praticité et acceptabilité. Los datos del cuestionario cualitativo contribuyen a observar las ventajas e inconvenientes percibidos por los evaluadores/jueces de CJ y ARS. A partir de los resultados, hemos intentado considerar el menor despeño de CJ con relación a ARS, concentrándonos en las especificidades de la interpretation de la evaluación. Igualmente proponemos pistas de investigación para mejorar nuestra comprensión de la interpretación de la evaluación.
Palabras clave:
- notación de los rubros analíticos,
- juicio comparativo,
- interpretación de la lengua hablada,
- evaluación mediante un evaluador,
- evaluación de la calidad en interpretación
Appendices
Bibliography
- AIIC (1982): Practical Guide for Professional Conference Interpreters. AIIC. Consulted on October 10, 2018, https://aiic.org/document/547/AIICWebzine_Apr2004_2_Practical_guide_for_professional_conference_interpreters_EN.pdf.
- Andrich, David (1978): Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement. 2:451-462.
- Bradley, Ralph A. and Terry, Milton E. (1952): Rank analysis of incomplete block designs: The method of paired comparisons. Biometrika. 39:324-345.
- Bramley, Tom, Bell, John and Pollitt, Alastair (1998): Assessing changes in standards over time using Thurstone paired comparisons. Education Research and Perspectives. 25:1-24.
- Gile, Daniel (1995): Fidelity assessment in consecutive interpretation: An experiment. Target. 7(1):151-164.
- Grbić, Nadja (2008): Constructing interpreting quality. Interpreting. 10(2):232-257.
- Han, Chao (2015): Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting. 17(2):255-283.
- Han, Chao (2016): Investigating score dependability in English/Chinese interpreter certification performance testing: A generalizability theory approach. Language Assessment Quarterly. 13(3):186-201.
- Han, Chao (2017): Using analytic rating scales to assess English/Chinese bidirectional interpretation: A longitudinal Rasch analysis of scale utility and rater behavior. Linguistica Antverpiensia New Series—Themes in Translation Studies. 16:196-215.
- Han, Chao (2018a): Latent trait modelling of rater accuracy in formative peer assessment of English-Chinese consecutive interpreting. Assessment & Evaluation in Higher Education. 43(6):979-994.
- Han, Chao (2018b): Using rating scales to assess interpretation: Practices, problems and prospects. Interpreting. 20(1):59-95.
- Han, Chao (2019): A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting. Language Testing. 36(3):419-438.
- Han, Chao, Chen, Sijia and Fan, Qin (2019): Rater-mediated assessment of translation and interpretation: Comparative judgement versus analytic rubric scoring. The 5th International Conference on Language Testing and Assessment, Guangdong University of Foreign Studies, Guangzhou, June 6-7, 2019. (Unpublished)
- Han, Chao, Chen, Sijia, Fu, Rongbo, et al. (2020): Modeling the relationship between utterance fluency and raters’ perceived fluency of consecutive interpreting. Interperting. 22(2):211-237.
- Hartley, Anthony, Mason, Ian, Peng, Grace, et al. (2003): Peer- and self-assessment in conference interpreter training. Centre for Languages, Linguistics and Area Studies (LLAS), University of Southampton. Consulted on June 10, 2015, https://researchportal.hw.ac.uk/en/publications/peer-and-self-assessment-in-conference-interpreting-training.
- Jones, Ian and Inglis, Matthew (2015): The problem of assessing problem solving: can comparative judgment help? Educational Studies in Mathematics. 89(3):337-355.
- Jones, Ian and Wheadon, Chris (2015): Peer assessment using comparative and absolute judgment. Studies in Educational Evaluation. 47:93-101.
- Jones, Ian, Swan, Malcolm and Pollitt, Alastair (2015): Assessing mathematical problem solving using comparative judgment. International Journal of Science and Mathematics Education. 13:151-177.
- Laming, Donald (2004): Marking university examinations: some lessons from psychophysics. Psychology Learning and Teaching. 3:89-96.
- Lee, Jieun (2008): Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer. 2(2):165-184.
- Lee, Sangbin (2015): Developing an analytic scale for assessing undergraduate students’ consecutive interpreting performances. Interpreting. 17(2):226-254.
- Linacre, John M. (2002): What do infit and outfit, mean-square and standardized mean? RaschMeasurement Transactions. 16(2):878.
- Liu, Minhua (2013): Design and analysis of Taiwan’s interpretation certification examination. In: Dinna Tsagari and Roelof Van Deemter, eds. Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163-178.
- Liu, Minhua, Chang, Chia-chien and Wu, Shao-chuan (2008): Interpretation evaluation practices: Comparison of eleven schools in Taiwan, China, Britain, and the USA. Compilation and Translation Review. 1(1):1-42.
- Luce, R. Duncan (1959): Individual choice behavior: A theoretical analysis. New York: Wiley.
- McMahon, Suzanne and Jones, Ian (2015): A comparative judgment approach to teacher assessment. Assessment in Education: Principles, Policy & Practice. 22(3):368-389.
- Mead, Peter (2005): Methodological issues in the study of interpreters’ fluency. The Interpreters’ Newsletter. 13:39-63.
- Pollitt, Alastair (2012): Comparative judgment for assessment. International Journal of Technology and Design Education. 22(2):157-170.
- Pollitt, Alastair and Murray, Neil L. (1996): What raters really pay attention to? In: Michael Milanovic and Nick Saville, eds. Studies in language testing 3: Performance testing, cognition and assessment. Cambridge: Cambridge University Press, 74-91.
- Setton, Robin and Dawrant, Andrew (2016): Conference Interpreting: A Trainer’s Guide. Amsterdam: John Benjamins.
- Steedle, Jeffrey T. and Ferrara, Steve (2016): Evaluating comparative judgment as an approach to essay scoring. Applied Measurement in Education. 29(3):211-223.
- Tarricone, Pina and Newhouse, C. Paul (2016): Using comparative judgment and online technologies in the assessment and measurement of creative performance and capacity. International Journal of Educational Technology in Higher Education. 13:1-11.
- Thurstone, Louis Leon (1927): A law of comparative judgment. Psychological Review. 34:273-286.
- Thurstone, Louis Leon (1954): The measurement of values. Psychological Review. 61(1):47-58.
- Wang, Ji-hong, Napier, Jemina, Goswell, Della, et al. (2015): The design and application of rubrics to assess signed language interpreting performance. The Interpreter and Translator Trainer. 9(1):83-103.
- Wu, Jessica, Liu, Min-hua and Liao, Cecilia (2013): Analytic scoring in interpretation test: Construct validity and the halo effect. In: Hsien-hao Liao, Tien-en Kao and Yaofu Lin, eds. The Making of a Translator: Multiple Perspectives. Taipei: Bookman, 277-292.
- Wu, Shao-chuan (2010): Assessing simultaneous interpreting: A study on test reliability and examiners’ assessment behavior. Doctoral thesis, unpublished. Newcastle: Newcastle University.