Abstracts
Abstract
As corpus-based translation studies continues to expand, researchers have employed data analytic techniques from neighbouring disciplines, such as corpus linguistics, to explore a wider variety of research questions. The field has evolved from early frequency-based approaches to corpus-based translation studies to now include more advanced statistical analyses to understand the complex web of variables encapsulated by the translation process. Big data analytic techniques that originated in data analytics and related quantitative fields could be usefully applied to research questions in translation and interpreting studies. To assess their applicability, this article first outlines what distinguishes big data from general corpora in translation and interpreting studies, identifying how data volume, variety, and velocity are applicable properties to be considered in corpus-based translation and interpreting studies research. Then, the article presents three types of big data analysis techniques, namely crosslingual and multilingual data analysis, sentiment analysis, and visual analysis. These analyses are presented in conjunction with potential research areas that would benefit from these complementary analytical approaches. The article concludes with a discussion of the implications of big data analytics in corpus translation studies, while charting the trajectory of a more quantitative, corpus-based approach to translation studies.
Keywords:
- big data,
- quantitative research,
- multilingual data analysis,
- sentiment analysis,
- audiovisual analysis
Résumé
Au fur et à mesure que les études de traduction basées sur les corpus continuent de se développer, les chercheurs ont utilisé des techniques d’analyse de données de disciplines adjacentes telles que la linguistique des corpus pour explorer davantage de questions de recherche. Le domaine a évolué depuis les premières approches basées sur la fréquence jusqu’aux études de traduction basées sur des corpus pour inclure désormais des analyses statistiques plus avancées qui aident à comprendre le réseau complexe de variables qui composent le processus de traduction. Les techniques d’analyse de données massives dérivées de l’analyse des données et des domaines quantitatifs reliés pourraient être appliquées avec succès pour répondre aux questions de recherche des études de traduction et d’interprétation. Pour évaluer leur applicabilité, cet article décrit d’abord ce qui distingue les données massives des corpus généraux dans les études en traduction et en interprétation, en identifiant comment le volume, la variété et la vitesse des données sont des propriétés applicables à prendre en compte dans les études de traduction et d’interprétation basées sur le corpus. L’article présente ensuite trois types de techniques d’analyse des données massives, à savoir l’analyse des données translinguistiques et multilingues, l’analyse des sentiments et l’analyse visuelle. Ces analyses sont présentées conjointement avec les domaines de recherche potentiels qui pourraient bénéficier de ces approches analytiques complémentaires. L’article se termine par une réflexion sur les implications de l’analyse des données massives pour les études de traduction de corpus, tout en décrivant la trajectoire d’une approche plus quantitative, basée sur le corpus, pour les études de traduction.
Mots-clés :
- données massives,
- recherche quantitative,
- analyse de données multilingues,
- analyse des sentiments,
- analyse audiovisuelle
Resumen
A medida que los estudios de traducción basados en corpus siguen expandiéndose, los investigadores han empleado técnicas de análisis de datos de disciplinas adyacentes como la lingüística de corpus para explorar un mayor número de preguntas de investigación. El campo ha evolucionado desde los primeros enfoques basados en la frecuencia hasta los estudios de traducción basados en corpus para incluir ahora análisis estadísticos más avanzados que permiten comprender la compleja red de variables que integra el proceso de traducción. Las técnicas de análisis de datos masivos que derivaron de la analítica de datos y los campos cuantitativos relacionados podrían aplicarse de forma satisfactoria para dar respuesta a las preguntas de investigación de los estudios de traducción e interpretación. Para evaluar su aplicabilidad, artículo, en primer lugar, esboza la distinción entre los datos masivos y los corpus generales en el contexto de los estudios de traducción e interpretación, centrándose en el volumen, la variedad y la velocidad de los datos como propiedades aplicables a ser consideradas en la investigación de estudios de traducción e interpretación basados en corpus. A continuación, el artículo presenta tres tipos de técnicas de análisis de datos masivos: análisis de datos translingüísticos y multilingües, análisis de sentimientos y análisis visual. Estos análisis se presentan conjuntamente con posibles áreas de investigación que se beneficiarían de estos enfoques analíticos complementarios. El artículo concluye con una reflexión sobre las implicaciones de la analítica de datos masivos para los estudios de traducción de corpus, al mismo tiempo que se perfila la trayectoria de un enfoque más cuantitativo, basado en corpus, para los estudios de traducción.
Palabras clave:
- datos masivos,
- investigación cuantitativa,
- análisis de datos multilingües,
- análisis de sentimientos,
- análisis audiovisual
Appendices
Bibliography
- Aggarwal, Charu C. (2015): Data Mining: The Textbook. Cham, Switzerland: Springer.
- Aggarwal, Charu C. and Zhai, ChengXiang, eds. (2012): Mining Text Data. Singapore: Springer.
- Baker, Mona (1993): Corpus linguistics and translation studies. In: Mona Baker, Gill Francis, and Elena Tognini-Bonelli, eds. Text and Technology: In Honour of John Sinclair. Amsterdam: John Benjamins, 233-250.
- Baker, Mona (1995): Corpora in translation studies: An overview and some suggestions for future research. Target. 7(2):223-243.
- Baños, Rocío, Bruti, Silvia, and Zanotti, Serenella (2013): Corpus linguistics and audiovisual translation: In search of an integrated approach. Perspectives: Studies in Translation Theory and Practice. 21(4):483-490.
- Barrón-Cedeño, Alberto, Gupta, Parth, and Rosso, Paolo (2013): Methods for cross-language plagiarism detection. Knowledge-Based Systems. 50:211-217.
- Bernardini, Silvia (2016): Intermodal corpora: A novel resource for descriptive and applied translation studies. In: Gloria Corpas Pastor and Miriam Seghiri, eds. Corpus-based Approaches to Translation and Interpreting: From Theory to Applications. Frankfurt: Peter Lang, 129-148.
- Bowker, Lynne and Delsey, Tom (2016): Information science, terminology and translation studies: Adaptation, collaboration, integration. In: Yves Gambier and Luc van Doorslaer, eds. Border Crossings: Translation Studies and Other Disciplines. Amsterdam: John Benjamins, 73-96.
- Brooke, Julian, Tofiloski, Milan, and Taboada, Maite (2009): Cross-linguistic sentiment analysis: From English to Spanish. International Conference RANLP 2009. 50-54.
- Carl, Michael, Bangalore, Srinivas, and Schaeffer, Moritz (2016). Computational linguistics and translation studies. In: Yves Gambier and Luc van Doorslaer, eds. Border Crossings: Translation Studies and Other Disciplines. Amsterdam: John Benjamins, 225-244.
- Chatterjee, Ankush, Gupta, Umang, Chinnakotla, Manoj Kumar, et al. (2018): Understanding emotions in text using deep learning and big data. Computers in Human Behavior. 93:309-317.
- Chen, Eric Evan and Wojcik, Sean P. (2016): A practical guide to big data research in psychology. Psychological Methods. 21(4):458–474.
- Defrancq, Bart, Daems, Joke, and Vandevoorde, Lore, eds. (2020): New Empirical Perspectives on Translation and Interpreting. New York: Routledge.
- Desjardins, Renée (2008): Intersemiotic translation and cultural representation within the space of the multi-modal text. TranscUlturAl. 1(1):48-58.
- Desjardins, Renée (2017): Translation and Social Media: In Theory, In Training and In Professional Practice. London: Palgrave.
- DiMaggio, Paul (2015): Adapting computational text analysis to social science (and vice versa). Big Data & Society. 2(2):1-5.
- Ezzikouri, Hanane, Oukessou, Mohamed, Madani, Youness, et al. (2018): Fuzzy cross language plagiarism detection (Arabic-English) using WordNet in a big data environment. ICCBDC’18: Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing. 22-27.
- Fernández-Ocampo, Anxo and Wolf, Michaela, eds. (2014): Framing the Interpreter: Towards a Visual Perspective. New York: Routledge.
- Gray, Joanne E. and Suzor, Nicolas P. (2020): Playing with machines: Using machine learning to understand automated copyright enforcement at scale. Big Data & Society. 7(1):1-13.
- Harlow, Lisa L. and Oswald, Frederick L. (2016): Big data in psychology: introduction to the special issue. Psychological Methods. 21(4):447-457.
- Holmes, Dawn E. (2017): Big Data: A Very Short Introduction. Oxford: Oxford University Press.
- Hu, Kaibao (2016): Introducing Corpus-Based Translation Studies. London: Springer.
- Islam, Zahurul and Mehler, Alexander (2012): Customization of the Europarl corpus for translation studies. In: Nicoletta Calzolari, et al., eds. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Istanbul: ELRA, 2505-2510.
- Ji, Meng and Oakes, Michael J. (2012): A corpus study of early English translations of Cao Xueqin’s Hongloumeng. In: Michael J. Oakes and Meng Ji, eds. Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins, 177-208.
- Ji, Meng and Oakes, Michael J. (2019): Challenges and opportunities of empirical translation studies. In: Meng Ji and Michael Oakes, eds. Advances in Empirical Translation Studies. Cambridge: Cambridge University Press, 252-264.
- Jin, Xiaolong, Wah, Benjamin W., Cheng, Xueqi, et al. (2015): Significance and challenges of big data research. Big Data Research. 2(2):59-64.
- Kannan, Ramakrishnan, Woo, Hyenkyn, Aggarwal, Charu C., et al. (2017): Outlier detection for text data. Proceedings of the 2017 Siam International Conference on Data Mining. 489-497.
- Koehn, Philipp (2005): Europarl: A parallel corpus for statistical machine translation. Conference Proceedings: The Tenth Machine Translation Summit. Phuket, Thailand: MT Summit, 79-86.
- Koehn, Philipp (2020): Neural Machine Translation. New York: Cambridge University Press.
- Koskinen, Kaisa (2020): Tailoring translation services for clients and users. In: Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey, eds. The Bloomsbury Companion to Language Industry Studies. London: Bloomsbury, 139-152.
- Kowalski, Maciej (2016): Learning curve with machine translation based on parallel, bilingual corpora. In: Dominik Ryzko, et al., eds. Machine Intelligence and Big Data in Industry. Cham, Switzerland: Springer, 11-22.
- Kruger, Haidee (2019): That again: A multivariate analysis of the factors conditioning syntactic explicitness in translated English. Across Languages and Cultures. 20(1):1-33.
- Laney, Doug (2001): 3D data management: Controlling data volume, velocity and variety. META Group Research Note. 6:70–73.
- Lewandowska-Tomaszczyk, Barbara (2012): Explicit and tacit: An interplay of the quantitative and qualitative approaches to translation. In: Michael J. Oakes and Meng Ji, eds. Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins, 1-34.
- Luz, Saturino and Sheehan, Shane (2020): Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge. Palgrave Communications. 6: Article 49.
- Mahmoodi, Jasmin, Leckelt, Marius, van Zalk, M.W.H., et al. (2017): Big Data approaches in social and behavioral science: four key trade-offs and a call for integration. Current Opinion in Behavioral Sciences. 18:57-62.
- Malamatidou, Sofia (2018): Corpus Triangulation: Combining Data and Methods in Corpus-based Translation Studies. New York: Routledge.
- Malmkjaer, Kirsten (2011): Translation universals. In: Kirsten Malmkjaer and Kevin Windle, eds. The Oxford Handbook of Translation Studies, Oxford: Oxford University Press, 83-94.
- Mauranen, Anna and Kujamäki, Pekka, eds. (2004): Translation Universals: Do They Exist? Amsterdam: John Benjamins.
- McCarty, Christopher, Molina, José Luis, Aguilar, Claudia, et al. (2007): A comparison of social network mapping and personal network visualization. Field Methods. 19(2):145-162.
- Mellinger, Christopher D. (2020): Core research questions and methods. In: Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey, eds. The Bloomsbury Companion to Language Industry Studies. London: Bloomsbury, 15-35.
- Mellinger, Christopher D. and Hanson, Thomas A. (2017): Quantitative Research Methods in Translation and Interpreting Studies. New York: Routledge.
- Mellinger, Christopher D. and Hanson, Thomas A. (2022): Research data. In: Federico Zanettin and Christopher Rundle, eds. Routledge Handbook of Translation and Methodology. New York: Routledge, 307-323.
- Mohammad, Saif M., Salameh, Mohammad, and Kiritchenko, Svetlana (2016): How translation alters sentiment. Journal of Artificial Intelligence Research. 55:95-130.
- Moisl, Hermann (2015): Cluster Analysis for Corpus Linguistics. Berlin: Walter de Gruyter.
- Nguyen, Thai-Son, Stüker, Sebastian, Niehues, Jan, et al. (2020): Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain. 7689-7693.
- Oakes, Michael J. and Ji, Meng, eds. (2012): Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins.
- Olohan, Maeve and Baker, Mona (2000): Reporting that in translated English: Evidence for subconscious processes of explicitation? Across Languages and Cultures. 1(2):141-158.
- Pak, Alexander and Paroubek, Patrick (2010): Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta: ELRA, 1320-1326.
- Park, Jaram, Baek, Young Min, and Cha, Meeyoung (2014): Cross-cultural comparison of nonverbal cues in emoticons on Twitter: Evidence from big data analysis. Journal of Communication. 64:333-354.
- Patton, Jon M. and Can, Fazli (2012): Determining translation invariant characteristics of James Joyce’s Dubliners. In: Michael Oakes and Meng Ji, eds. Quantitative Methods in Corpus-Based Translation Studies: A Practical Guide to Descriptive Translation Research. Amsterdam: John Benjamins, 209-229.
- Pereira, Nilce M. (2008): Book illustration as (intersemiotic) translation: Pictures translating words. Meta. 53(1):104-119.
- Richards, Neil M. and King, Jonathan H. (2014): Big data ethics. Wake Forest Law Review. 49(1):393-432.
- Ruiz-Garcia, Ariel, Elshaw, Mark, Altahhan, Abudulrahman, et al. (2016): Deep learning for emotion recognition in faces. In: Alessandro E.P. Villa, Paolo Masulli, and Antonio Javier Pons Rivero, eds. Artificial Neural Networks and Machine Learning – ICANN 2016, Part II. Cham, Switzerland: Springer, 38-46.
- Salameh, Mohammad, Mohammad, Saif M. and Kiritchenko, Svetlana (2015): Sentiment after translation: A case-study on Arabic social media posts. Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL. Denver, CO: ACL, 767-777.
- Shlesinger, Miriam (1998): Corpus-based interpreting studies as an offshoot of corpus-based translation studies. Meta. 43(4):1-8.
- Slota, Stephen C., Hoffman, Andrew S., Ribes, David, et al. (2020): Prospecting (in) the data sciences. Big Data & Society. 7(1):1-12.
- Stadthagen-Gonzalez, Hans, Imbault, Constance, Pérez Sánchez, Miguel A., et al. 2017. Norms and valence and arousal for 14,031 Spanish words. Behavior Research Methods. 49:111-123.
- Steiner, Erich (2017): Methodological cross-fertilization: Empirical methodologies in (computational) linguistics and translation studies. In: Oliver Czulo and Silvia Hansen-Schirra, eds. Crossroads between Contrastive Linguistics, Translation Studies and Machine Translation, TC II. Berlin: Language Science Press, 65-90.
- Toral, Antonio, Esplá-Gomis, Miquel, Klubička, Filip, et al. (2016): Crawl and crowd to bring machine translation to under-resourced languages. Language Resources & Evaluation. 51:1019-1051.
- Ustaszewski, Michael (2019): Optimising the Europarl corpus for translation studies with the EuroparlExtract toolkit. Perspectives: Studies in Translation Theory and Practice. 27(1):107-123.
- Van Doorslaer, Luc, Flynn, Peter, and Leerssen, Joep, eds. (2016): Interconnecting Translation Studies and Imagology. Amsterdam: John Benjamins.
- Wang, Huashu (2019): The development of translation technology in the era of big data. In: Feng Yue, et al., eds. Restructuring Translation Education. Singapore: Springer, 13-26.
- Wehrmeyer, Ella (2019): A corpus for signed language interpreting research. Interpreting. 21(1):62-90.
- Zanettin, Federico (2000): Parallel corpora in translation studies: Issues in corpus design and analysis. In: Maeve Olohan, ed. Intercultural Faultlines: Research Models in Translation Studies, Vol. 1. London: Routledge, 105-118.
- Zanettin, Federico (2012): Translation-Driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies. New York: Routledge.
- Zanettin, Federico (2013): Corpus methods for descriptive translation studies. Procedia: Social and Behavioral Sciences. 95:20-32.
- Zappavigna, Michele (2018): Searchable Talk: Hashtags and Social Media Discourse. London: Bloomsbury.
- Zhang, Leishi, Stoffel, Andreas, Behrisch, Michael, et al. (2012): Visual analytics for the big data era – A comparative review of state-of-the-art commercial systems. IEEE Symposium on Visual Analytics Science and Technology. Seattle, WA: IEEE, 173-182.
- Zhang, Tong and Kuo, C.-C. Jay (2001): Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing. 9(4):441-457.