Reconceptualising Fair Assessment in English-Medium Instruction: Content–Language Separation as a Validity Principle in Higher Education
Keywords:
English-medium instruction; fair assessment; content–language separation; higher education; Sustainable Development GoalsAbstract
This narrative review promotes fair assessment in English-medium instruction in higher education, in order to align with Sustainable Development Goals 4 and 10 and to ensure that grades reflect disciplinary learning rather than English fluency. This narrative review draws on searches of Scopus, Web of Science, and ERIC to access studies published from 2022 to 2025. It synthesizes evidence and highlights recurring trends in the way assessment design, scoring, and moderation can keep content and language on analytically separate strands across three assessment families: written products, oral or performance tasks, and tests or portfolios. Guided by validity and argument-based and socio-cognitive perspectives, the review traces how dense reading passages, speeded conditions, monolingual orientations, translanguaging constraints, and rater inconsistency can pull scores toward fluency or test-wiseness instead of disciplinary knowledge. Findings indicate that fairness improves when the construct is stated explicitly at design, and when content and language are separated in scoring, linguistic load is managed through readability and lexical coverage checks, and brief calibration and moderation routines stabilize judgment. These routines can be embedded in ordinary timetables, staffing, and resources, and support more valid, interpretable, and equitable decisions. Overall, explicit content–language separation, from task design through scoring and moderation, offers a scalable organizing principle for fair assessment in English-medium instruction, and the review sets out practical routines that programs and higher education institutions can enact with transparency.
https://doi.org/10.26803/ijlter.25.4.41
References
Ait-Hroch, A., El Gazi, S., & Ibrahimi, A. (2025). The constructive alignment of objectives, tasks, and assessment in hybrid language training. World Journal of Advanced Engineering Technology and Sciences, 15(03), 2606-2627. https://doi.org/10.30574/wjaets.2025.15.3.1182
Aizawa, I., Rose, H., Thompson, G., & McKinley, J. (2025). Content knowledge attainment in English medium instruction: Does academic English literacy matter? Language Teaching Research. https://doi.org/10.1177/13621688241304051
Al Fraidan, A. (2024). Beyond the bubble: Unveiling the multifaceted landscape of test wiseness and their operationalization among English-language majors. Theory and Practice in Language Studies, 14(6), 1735–1744. https://doi.org/10.17507/tpls.1406.14
Ardoin, S. P., Binder, K. S., Novelli, C., & Robertson, P. L. (2024). The common element of test taking: Reading and responding to questions. School Psychology, 40(5), 607–613. https://doi.org/10.1037/spq0000671
Asquith, S. (2022). An investigation into the roles of guessing and partial knowledge in the Vocabulary Size Test. TESL-EJ, 26(3). https://doi.org/10.55593/ej.26103a15
Baumeister, R. F., & Leary, M. R. (1997). Writing narrative literature reviews. Review of General Psychology, 1(3), 311–320. https://doi.org/10.1037/1089-2680.1.3.311
Bozb?y?k, M., Balaman, U., & I??k-Güler, H. (2024). Displays of co-constructed content knowledge using translanguaging in breakout and main sessions of online EMI classrooms. Linguistics and Education, 80, Article 101275. https://doi.org/10.1016/j.linged.2024.101275
Cade, A. E., & Meuller, N. (2024). Measuring the quality of the OSCE in a chiropractic programme: A review of metrics and recommendations. The Journal of Chiropractic Education, 38(1), 9–16. https://doi.org/10.7899/jce-22-29
Chambers, L., Vitello, S., & Vidal Rodeiro, C. (2024). Moderation of non-exam assessments: a novel approach using comparative judgement. Assessment in Education: Principles, Policy & Practice, 31(1), 32–55. https://doi.org/10.1080/0969594x.2024.2313237
Chapelle, C. A. (2021). Argument-based validation in testing and assessment. SAGE. https://doi.org/10.4135/9781071878811
Choi, J., French, M., & Ollerhead, S. (2020). Introduction to the special issue: Translanguaging as a resource in teaching and learning. Asian Journal of Applied Linguistics, 3(1), 1–10. https://doi.org/10.29140/AJAL.V3N1.283
Crain, P., & Bailey, B. P. (2022). Easier said or easier done? Exploring the relative merits of common feedback presentations. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW1), 1–19. https://doi.org/10.1145/3512933
De-la-Peña, C., & Luque-Rojas, M. J. (2021). Levels of reading comprehension in higher education: Systematic review and meta-analysis. Frontiers in Psychology, 12, Article 712901. https://doi.org/10.3389/fpsyg.2021.712901
Dimova, S., & Kling, J. (2022). Emerging assessment needs and solutions in EMI in higher education. Journal of English-Medium Instruction, 1(2), 137–152. https://doi.org/10.1075/jemi.00002.edi
Gonsalves, C. (2023). Knowledge of language in rubric design: A systemic functional linguistics perspective. In C. Gonsalves & J. Pearson (Eds.), Improving learning through assessment rubrics: Student awareness of what and how they learn (pp. 190?211). IGI Global Scientific Publishing. https://doi.org/10.4018/978-1-6684-6086-3.ch011
Graham, K. M., & Eslami, Z. R. (2020). Does the simple view of writing explain L2 writing development? A meta-analysis. Reading Psychology, 41(5), 485–511. https://doi.org/10.1080/02702711.2020.1768989
Green, B. N., Johnson, C. D., & Adams, A. (2006). Writing narrative literature reviews for peer-reviewed journals: Secrets of the trade. Journal of Chiropractic Medicine, 5(3), 101–117. https://doi.org/10.1016/S0899-3467(07)60142-6
Gronchi, M. (2024). Language assessment in EMI: unravelling the implicit–explicit dichotomy. Educational Linguistics, 3(2), 238–257. https://doi.org/10.1515/eduling-2023-0011
Gülle, T., & Bayyurt, Y. (2024). Translanguaging in content assessment: Voices, experiences and practices of EMI university students. Language Learning in Higher Education, 14(2), 313–336. https://doi.org/10.1515/cercles-2024-0021
Hidayati, N. O., & Andriyanti, E. (2025). Validating a diagnostic reading test for junior high school EFL learners in Indonesia’s English massive program using QUEST. Al-Lisan: Jurnal Bahasa, 10(2), 270–283. https://doi.org/10.30603/al.v10i2.6708
Holzknecht, F., Guggenbichler, E., Zehentner, M., Yoder, M., Konrad, E., & Kremmel, B. (2022). Comparing EMI university reading materials with students’ reading proficiency: Implications for admission testing. Journal of English-Medium Instruction, 1(2), 180–203. https://doi.org/10.1075/jemi.21006.hol
Hou, Z., Zhang, J., Jadallah, M., Enriquez?Andrade, A., Tran, H. T., & Ahmmed, R. (2024). Translanguaging practices in global K–12 science education settings: A systematic literature review. Journal of Research in Science Teaching, 62(1), 270–306. https://doi.org/10.1002/tea.22008
Inbar?Lourie, O. (2022). EMI programs and formative assessment. Journal of English-Medium Instruction, 1(2), 204–231. https://doi.org/10.1075/jemi.21014.inb
Iskandarova, G. (2024). Current issues in language assessment and language assessment research and its implication. Baltic Journal of Legal and Social Sciences, 3, 228–231. https://doi.org/10.30525/2592-8813-2024-3-24
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Karanfil, T., & Neufeld, S. (2020). The role of order and sequence of options in multiple-choice questions for high-stakes tests of English language proficiency. International Journal of Applied Linguistics and English Literature, 9(6), 110–129. https://doi.org/10.7575/aiac.ijalel.v.9n.6p.110
Karnas-Haines, C. (2021). Making assessment actionable through assessor training: A tool for building trust through moderation and calibration. Grand Challenges in Assessment, 2(3). https://doi.org/10.61669/001c.24570
Li, H., Zhang, S., & Tang, X. (2024). Effects of test-taking strategy and lexico-grammatical ability on L2 local-level reading comprehension. Reading in a Foreign Language, 36(1), 1–37. https://doi.org/10.64152/10125/67474
Lin, A. M. Y. (2023). Can the Monkey King break through the ‘Jin-Gang-Quan’ (?? ?)? Overcoming the multiple contradictions in EMI education. Language and Education, 38(1), 139–147. https://doi.org/10.1080/09500782.2023.2284802
Lo, Y. Y., & Leung, C. (2022). Conceptualising assessment literacy of teachers in Content and Language Integrated Learning programmes. International Journal of Bilingual Education and Bilingualism, 25(10), 3816–3834. https://doi.org/10.1080/13670050.2022.2085028
Lumban Raja, V. (2020). Test item analysis of reading comprehension examination faculty of teachers and training education. Kairos English Language Teaching Journal, 4(1), 52–65. https://doi.org/10.54367/kairos.v4i1.847
Macaro, E., Curle, S., Pun, J., An, J., & Dearden, J. (2018). A systematic review of English medium instruction in higher education. Language Teaching, 51(1), 36–76. https://doi.org/10.1017/S0261444817000350
Malmström, H., Pecorari, D., & Shaw, P. (2025). Development of academic vocabulary knowledge during English-medium instruction. Ibérica, 49, 157–180. https://doi.org/10.17398/2340-2784.49.157
Mayahi, N., & Jalilifar, A. R. (2022). Self-denigration in doctoral defense sessions: Scale development and validation. ESP Today, 10(1), 43–70. https://doi.org/10.18485/esptoday.2022.10.1.3
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13?104). American Council on Education/Collier Macmillan. https://archive.org/details/educationalmeasu0000unse_3ed
Middleton, R., Lewer, K., Antoniou, C., Pratt, H., Bowdler, S., Jans, C., & Rolls, K. (2024). Understanding the processes, practices and influences of calibration on feedback literacy in higher education marking: A qualitative study. Nurse Education Today, 135, Article 106106. https://doi.org/10.1016/j.nedt.2024.106106
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and analysis in task-based language assessment. Language Testing, 19(4), 477–496. https://doi.org/10.1191/0265532202LT241OA
Morton, J. K., Northcote, M., Kilgour, P., & Jackson, W. A. (2021). Sharing the construction of assessment rubrics with students: A model for collaborative rubric construction. Journal of University Teaching & Learning Practice, 18(4), Article 9. https://doi.org/10.53761/1.18.4.9
Morton, T. (2022). Using cognitive discourse functions and comparative judgement to build teachers’ knowledge of content and language integration for assessment in a bilingual education program. Journal of Immersion and Content-Based Language Education, 10(2), 302–322. https://doi.org/10.1075/jicb.21017.mor
Morton, T., & Nashaat-Sobhy, N. (2023). Exploring bases of achievement in content and language integrated assessment in a bilingual education program. TESOL Quarterly, 58(1), 5–31. https://doi.org/10.1002/tesq.3207
Motavas, S., & Mahmood, F. (2025, July). Improving educational equity and outcomes in a first-year engineering programming course through a content and language integrated approach [Conference presentation]. FYEE 2025 Conference Proceedings, Article 55247. https://doi.org/10.18260/1-2--55247
Murray, N. L. (2022). A model to support the equitable development of academic literacy in institutions of higher education. Journal of Further and Higher Education, 46,1054–1065. https://doi.org/10.1080/0309877X.2022.2044019
O’Donovan, B., Sadler, I., & Reimann, N. (2024). Social moderation and calibration versus codification: a way forward for academic standards in higher education? Studies in Higher Education, 49(12), 2693–2706. https://doi.org/10.1080/03075079.2024.2321504
Ojochegbe, A. T. (2024). Rethinking standardized testing in English language proficiency: Moving toward culturally responsive assessment models. Jurnal Pendidikan Indonesia, 5(12), 1990–1996. https://doi.org/10.59141/japendi.v5i12.6584
Owen, N., & Senel, A. (2025). Enhancing transparency in high?stakes English language assessment: A mixed?methods synthesis of empirical evidence and stakeholder perspectives. Review of Education, 13(2). https://doi.org/10.1002/rev3.70096
Park, J., & Wright, E. A. (2023). Distractor analysis to improve the quality of multiple-choice item development. English Language Assessment, 18(2), 73–94. https://doi.org/10.37244/ela.2023.18.2.73
Pearce, J., & Chiavaroli, N. (2020). Prompting candidates in oral assessment contexts: A taxonomy and guiding principles. Journal of Medical Education and Curricular Development, 7, Article 2382120520948881. https://doi.org/10.1177/2382120520948881
Pearson, W. S. (2021). Policies on minimum English language requirements in UK higher education, 1989–2021. Journal of Further and Higher Education, 45(9), 1240–1252. https://doi.org/10.1080/0309877X.2021.1945556
?ahan, K., & ?ahan, Ö. (2022). Content and language in EMI assessment practices: Challenges and beliefs at an engineering faculty in Turkey. In Y. Kirkgöz, & A. Karaka? (Eds.), English as the medium of instruction in Turkish higher education (Vol. 40). Springer. https://doi.org/10.1007/978-3-030-88597-7_8
Sato, T. (2023). Assessing the content quality of essays in content and language integrated learning: Exploring the construct from subject specialists’ perspectives. Language Testing, 41(2), 316–337. https://doi.org/10.1177/02655322231190058
Stoeckel, T., McLean, S., & Nation, P. (2021). Limitations of size and levels tests of written receptive vocabulary knowledge. Studies in Second Language Acquisition, 43(1), 181–203. https://doi.org/10.1017/S027226312000025X
Syukur, B. A., & Nurlaily, A. F. (2025). Rasch model-based evaluation of toefl listening items: analyzing difficulty, discrimination, and fit. Jurnal Smart, 11(2), 176–191. https://doi.org/10.52657/js.v11i2.2931
Wang, R. (2021). New perspectives on translanguaging and education [Book review]. International Journal of Bilingual Education and Bilingualism, 24(2), 305–307. https://doi.org/10.1080/13670050.2018.1454043
Watari, T., Koyama, S., Kato, Y., Paku, Y., Kanada, Y., & Sakurai, H. (2022). Effect of moderation on rubric criteria for inter-rater reliability in an objective structured clinical examination with real patients. Fujita Medical Journal, 8(3), 83–87. https://doi.org/10.20407/fmj.2021-010
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave Macmillan. https://doi.org/10.1057/9780230514577
White, M., & Ronfeldt, M. (2024). Monitoring rater quality in observational systems: Issues due to unreliable estimates of rater quality. Educational Assessment, 29(2), 124–146. https://doi.org/10.1080/10627197.2024.2354311
Wudthayagorn, J. (2025). Revisiting English-in-Education policies in Thailand: Ambitious goals, contradictory outcomes. LEARN Journal: Language Education and Acquisition Research Network, 18(2), 1–13. https://doi.org/10.70730/wcsz5574
Xin, K., & Yap, T. T. (2025). Balancing language learning with translanguaging: Insights from Yunnan Agricultural University. Jurnal Arbitrer, 12(1), 27–39. https://doi.org/10.25077/ar.12.1.27-39.2025
Yousefpoori-Naeim, M., Bulut, O., & Tan, B. (2023). Predicting reading comprehension performance based on student characteristics and item properties. Studies in Educational Evaluation, 79, Article 101309. https://doi.org/10.1016/j.stueduc.2023.101309
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Randip Kaur Valdev Singh, Harwati Hashim, Khairul Azhar Jamaludin

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published by IJLTER are licensed under a Creative Commons Attribution Non-Commercial No-Derivatives 4.0 International License (CCBY-NC-ND4.0).