Reconceptualising Fair Assessment in English-Medium Instruction: Content–Language Separation as a Validity Principle in Higher Education

Randip Kaur Valdev  Singh; Harwati  Hashim; Khairul Azhar  Jamaludin

Authors

Randip Kaur Valdev Singh
Harwati Hashim
Khairul Azhar Jamaludin

Keywords:

English-medium instruction; fair assessment; content–language separation; higher education; Sustainable Development Goals

Abstract

This narrative review promotes fair assessment in English-medium instruction in higher education, in order to align with Sustainable Development Goals 4 and 10 and to ensure that grades reflect disciplinary learning rather than English fluency. This narrative review draws on searches of Scopus, Web of Science, and ERIC to access studies published from 2022 to 2025. It synthesizes evidence and highlights recurring trends in the way assessment design, scoring, and moderation can keep content and language on analytically separate strands across three assessment families: written products, oral or performance tasks, and tests or portfolios. Guided by validity and argument-based and socio-cognitive perspectives, the review traces how dense reading passages, speeded conditions, monolingual orientations, translanguaging constraints, and rater inconsistency can pull scores toward fluency or test-wiseness instead of disciplinary knowledge. Findings indicate that fairness improves when the construct is stated explicitly at design, and when content and language are separated in scoring, linguistic load is managed through readability and lexical coverage checks, and brief calibration and moderation routines stabilize judgment. These routines can be embedded in ordinary timetables, staffing, and resources, and support more valid, interpretable, and equitable decisions. Overall, explicit content–language separation, from task design through scoring and moderation, offers a scalable organizing principle for fair assessment in English-medium instruction, and the review sets out practical routines that programs and higher education institutions can enact with transparency.

https://doi.org/10.26803/ijlter.25.4.41

References

Ait-Hroch, A., El Gazi, S., & Ibrahimi, A. (2025). The constructive alignment of objectives, tasks, and assessment in hybrid language training. World Journal of Advanced Engineering Technology and Sciences, 15(03), 2606-2627. https://doi.org/10.30574/wjaets.2025.15.3.1182

Aizawa, I., Rose, H., Thompson, G., & McKinley, J. (2025). Content knowledge attainment in English medium instruction: Does academic English literacy matter? Language Teaching Research. https://doi.org/10.1177/13621688241304051

Al Fraidan, A. (2024). Beyond the bubble: Unveiling the multifaceted landscape of test wiseness and their operationalization among English-language majors. Theory and Practice in Language Studies, 14(6), 1735–1744. https://doi.org/10.17507/tpls.1406.14

Ardoin, S. P., Binder, K. S., Novelli, C., & Robertson, P. L. (2024). The common element of test taking: Reading and responding to questions. School Psychology, 40(5), 607–613. https://doi.org/10.1037/spq0000671

Asquith, S. (2022). An investigation into the roles of guessing and partial knowledge in the Vocabulary Size Test. TESL-EJ, 26(3). https://doi.org/10.55593/ej.26103a15

Baumeister, R. F., & Leary, M. R. (1997). Writing narrative literature reviews. Review of General Psychology, 1(3), 311–320. https://doi.org/10.1037/1089-2680.1.3.311

Bozb?y?k, M., Balaman, U., & I??k-Güler, H. (2024). Displays of co-constructed content knowledge using translanguaging in breakout and main sessions of online EMI classrooms. Linguistics and Education, 80, Article 101275. https://doi.org/10.1016/j.linged.2024.101275

Cade, A. E., & Meuller, N. (2024). Measuring the quality of the OSCE in a chiropractic programme: A review of metrics and recommendations. The Journal of Chiropractic Education, 38(1), 9–16. https://doi.org/10.7899/jce-22-29

Chambers, L., Vitello, S., & Vidal Rodeiro, C. (2024). Moderation of non-exam assessments: a novel approach using comparative judgement. Assessment in Education: Principles, Policy & Practice, 31(1), 32–55. https://doi.org/10.1080/0969594x.2024.2313237

Chapelle, C. A. (2021). Argument-based validation in testing and assessment. SAGE. https://doi.org/10.4135/9781071878811

Choi, J., French, M., & Ollerhead, S. (2020). Introduction to the special issue: Translanguaging as a resource in teaching and learning. Asian Journal of Applied Linguistics, 3(1), 1–10. https://doi.org/10.29140/AJAL.V3N1.283

Crain, P., & Bailey, B. P. (2022). Easier said or easier done? Exploring the relative merits of common feedback presentations. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW1), 1–19. https://doi.org/10.1145/3512933

De-la-Peña, C., & Luque-Rojas, M. J. (2021). Levels of reading comprehension in higher education: Systematic review and meta-analysis. Frontiers in Psychology, 12, Article 712901. https://doi.org/10.3389/fpsyg.2021.712901

Dimova, S., & Kling, J. (2022). Emerging assessment needs and solutions in EMI in higher education. Journal of English-Medium Instruction, 1(2), 137–152. https://doi.org/10.1075/jemi.00002.edi

Gonsalves, C. (2023). Knowledge of language in rubric design: A systemic functional linguistics perspective. In C. Gonsalves & J. Pearson (Eds.), Improving learning through assessment rubrics: Student awareness of what and how they learn (pp. 190?211). IGI Global Scientific Publishing. https://doi.org/10.4018/978-1-6684-6086-3.ch011

Graham, K. M., & Eslami, Z. R. (2020). Does the simple view of writing explain L2 writing development? A meta-analysis. Reading Psychology, 41(5), 485–511. https://doi.org/10.1080/02702711.2020.1768989

Green, B. N., Johnson, C. D., & Adams, A. (2006). Writing narrative literature reviews for peer-reviewed journals: Secrets of the trade. Journal of Chiropractic Medicine, 5(3), 101–117. https://doi.org/10.1016/S0899-3467(07)60142-6

Gronchi, M. (2024). Language assessment in EMI: unravelling the implicit–explicit dichotomy. Educational Linguistics, 3(2), 238–257. https://doi.org/10.1515/eduling-2023-0011

Gülle, T., & Bayyurt, Y. (2024). Translanguaging in content assessment: Voices, experiences and practices of EMI university students. Language Learning in Higher Education, 14(2), 313–336. https://doi.org/10.1515/cercles-2024-0021

Hidayati, N. O., & Andriyanti, E. (2025). Validating a diagnostic reading test for junior high school EFL learners in Indonesia’s English massive program using QUEST. Al-Lisan: Jurnal Bahasa, 10(2), 270–283. https://doi.org/10.30603/al.v10i2.6708

Holzknecht, F., Guggenbichler, E., Zehentner, M., Yoder, M., Konrad, E., & Kremmel, B. (2022). Comparing EMI university reading materials with students’ reading proficiency: Implications for admission testing. Journal of English-Medium Instruction, 1(2), 180–203. https://doi.org/10.1075/jemi.21006.hol

Hou, Z., Zhang, J., Jadallah, M., Enriquez?Andrade, A., Tran, H. T., & Ahmmed, R. (2024). Translanguaging practices in global K–12 science education settings: A systematic literature review. Journal of Research in Science Teaching, 62(1), 270–306. https://doi.org/10.1002/tea.22008

Inbar?Lourie, O. (2022). EMI programs and formative assessment. Journal of English-Medium Instruction, 1(2), 204–231. https://doi.org/10.1075/jemi.21014.inb

Iskandarova, G. (2024). Current issues in language assessment and language assessment research and its implication. Baltic Journal of Legal and Social Sciences, 3, 228–231. https://doi.org/10.30525/2592-8813-2024-3-24

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Karanfil, T., & Neufeld, S. (2020). The role of order and sequence of options in multiple-choice questions for high-stakes tests of English language proficiency. International Journal of Applied Linguistics and English Literature, 9(6), 110–129. https://doi.org/10.7575/aiac.ijalel.v.9n.6p.110

Karnas-Haines, C. (2021). Making assessment actionable through assessor training: A tool for building trust through moderation and calibration. Grand Challenges in Assessment, 2(3). https://doi.org/10.61669/001c.24570

Li, H., Zhang, S., & Tang, X. (2024). Effects of test-taking strategy and lexico-grammatical ability on L2 local-level reading comprehension. Reading in a Foreign Language, 36(1), 1–37. https://doi.org/10.64152/10125/67474

Lin, A. M. Y. (2023). Can the Monkey King break through the ‘Jin-Gang-Quan’ (?? ?)? Overcoming the multiple contradictions in EMI education. Language and Education, 38(1), 139–147. https://doi.org/10.1080/09500782.2023.2284802

Lo, Y. Y., & Leung, C. (2022). Conceptualising assessment literacy of teachers in Content and Language Integrated Learning programmes. International Journal of Bilingual Education and Bilingualism, 25(10), 3816–3834. https://doi.org/10.1080/13670050.2022.2085028

Lumban Raja, V. (2020). Test item analysis of reading comprehension examination faculty of teachers and training education. Kairos English Language Teaching Journal, 4(1), 52–65. https://doi.org/10.54367/kairos.v4i1.847

Macaro, E., Curle, S., Pun, J., An, J., & Dearden, J. (2018). A systematic review of English medium instruction in higher education. Language Teaching, 51(1), 36–76. https://doi.org/10.1017/S0261444817000350

Malmström, H., Pecorari, D., & Shaw, P. (2025). Development of academic vocabulary knowledge during English-medium instruction. Ibérica, 49, 157–180. https://doi.org/10.17398/2340-2784.49.157

Mayahi, N., & Jalilifar, A. R. (2022). Self-denigration in doctoral defense sessions: Scale development and validation. ESP Today, 10(1), 43–70. https://doi.org/10.18485/esptoday.2022.10.1.3

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13?104). American Council on Education/Collier Macmillan. https://archive.org/details/educationalmeasu0000unse_3ed

Middleton, R., Lewer, K., Antoniou, C., Pratt, H., Bowdler, S., Jans, C., & Rolls, K. (2024). Understanding the processes, practices and influences of calibration on feedback literacy in higher education marking: A qualitative study. Nurse Education Today, 135, Article 106106. https://doi.org/10.1016/j.nedt.2024.106106

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and analysis in task-based language assessment. Language Testing, 19(4), 477–496. https://doi.org/10.1191/0265532202LT241OA

Morton, J. K., Northcote, M., Kilgour, P., & Jackson, W. A. (2021). Sharing the construction of assessment rubrics with students: A model for collaborative rubric construction. Journal of University Teaching & Learning Practice, 18(4), Article 9. https://doi.org/10.53761/1.18.4.9

Morton, T. (2022). Using cognitive discourse functions and comparative judgement to build teachers’ knowledge of content and language integration for assessment in a bilingual education program. Journal of Immersion and Content-Based Language Education, 10(2), 302–322. https://doi.org/10.1075/jicb.21017.mor

Morton, T., & Nashaat-Sobhy, N. (2023). Exploring bases of achievement in content and language integrated assessment in a bilingual education program. TESOL Quarterly, 58(1), 5–31. https://doi.org/10.1002/tesq.3207

Motavas, S., & Mahmood, F. (2025, July). Improving educational equity and outcomes in a first-year engineering programming course through a content and language integrated approach [Conference presentation]. FYEE 2025 Conference Proceedings, Article 55247. https://doi.org/10.18260/1-2--55247

Murray, N. L. (2022). A model to support the equitable development of academic literacy in institutions of higher education. Journal of Further and Higher Education, 46,1054–1065. https://doi.org/10.1080/0309877X.2022.2044019

O’Donovan, B., Sadler, I., & Reimann, N. (2024). Social moderation and calibration versus codification: a way forward for academic standards in higher education? Studies in Higher Education, 49(12), 2693–2706. https://doi.org/10.1080/03075079.2024.2321504

Ojochegbe, A. T. (2024). Rethinking standardized testing in English language proficiency: Moving toward culturally responsive assessment models. Jurnal Pendidikan Indonesia, 5(12), 1990–1996. https://doi.org/10.59141/japendi.v5i12.6584

Owen, N., & Senel, A. (2025). Enhancing transparency in high?stakes English language assessment: A mixed?methods synthesis of empirical evidence and stakeholder perspectives. Review of Education, 13(2). https://doi.org/10.1002/rev3.70096

Park, J., & Wright, E. A. (2023). Distractor analysis to improve the quality of multiple-choice item development. English Language Assessment, 18(2), 73–94. https://doi.org/10.37244/ela.2023.18.2.73

Pearce, J., & Chiavaroli, N. (2020). Prompting candidates in oral assessment contexts: A taxonomy and guiding principles. Journal of Medical Education and Curricular Development, 7, Article 2382120520948881. https://doi.org/10.1177/2382120520948881

Pearson, W. S. (2021). Policies on minimum English language requirements in UK higher education, 1989–2021. Journal of Further and Higher Education, 45(9), 1240–1252. https://doi.org/10.1080/0309877X.2021.1945556

?ahan, K., & ?ahan, Ö. (2022). Content and language in EMI assessment practices: Challenges and beliefs at an engineering faculty in Turkey. In Y. Kirkgöz, & A. Karaka? (Eds.), English as the medium of instruction in Turkish higher education (Vol. 40). Springer. https://doi.org/10.1007/978-3-030-88597-7_8

Sato, T. (2023). Assessing the content quality of essays in content and language integrated learning: Exploring the construct from subject specialists’ perspectives. Language Testing, 41(2), 316–337. https://doi.org/10.1177/02655322231190058

Stoeckel, T., McLean, S., & Nation, P. (2021). Limitations of size and levels tests of written receptive vocabulary knowledge. Studies in Second Language Acquisition, 43(1), 181–203. https://doi.org/10.1017/S027226312000025X

Syukur, B. A., & Nurlaily, A. F. (2025). Rasch model-based evaluation of toefl listening items: analyzing difficulty, discrimination, and fit. Jurnal Smart, 11(2), 176–191. https://doi.org/10.52657/js.v11i2.2931

Wang, R. (2021). New perspectives on translanguaging and education [Book review]. International Journal of Bilingual Education and Bilingualism, 24(2), 305–307. https://doi.org/10.1080/13670050.2018.1454043

Watari, T., Koyama, S., Kato, Y., Paku, Y., Kanada, Y., & Sakurai, H. (2022). Effect of moderation on rubric criteria for inter-rater reliability in an objective structured clinical examination with real patients. Fujita Medical Journal, 8(3), 83–87. https://doi.org/10.20407/fmj.2021-010

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave Macmillan. https://doi.org/10.1057/9780230514577

White, M., & Ronfeldt, M. (2024). Monitoring rater quality in observational systems: Issues due to unreliable estimates of rater quality. Educational Assessment, 29(2), 124–146. https://doi.org/10.1080/10627197.2024.2354311

Wudthayagorn, J. (2025). Revisiting English-in-Education policies in Thailand: Ambitious goals, contradictory outcomes. LEARN Journal: Language Education and Acquisition Research Network, 18(2), 1–13. https://doi.org/10.70730/wcsz5574

Xin, K., & Yap, T. T. (2025). Balancing language learning with translanguaging: Insights from Yunnan Agricultural University. Jurnal Arbitrer, 12(1), 27–39. https://doi.org/10.25077/ar.12.1.27-39.2025

Yousefpoori-Naeim, M., Bulut, O., & Tan, B. (2023). Predicting reading comprehension performance based on student characteristics and item properties. Studies in Educational Evaluation, 79, Article 101309. https://doi.org/10.1016/j.stueduc.2023.101309

Reconceptualising Fair Assessment in English-Medium Instruction: Content–Language Separation as a Validity Principle in Higher Education

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)