A Systematic Review of AI-Based and Teacher-Based Writing Assessment
(1) Universitas Airlangga
(2) Universitas Airlangga
(*) Corresponding Author
Abstract
Establishing valid and reliable writing assessments in education remains a persistent challenge, especially with the emergence of artificial intelligence (AI) as a tool in evaluation. Although previous studies have investigated both AI-based and teacher-based writing assessments, few have addressed them through a systematic lens, leading to fragmented insights and inconsistent frameworks. This study aims to investigate the current state-of-the-art of AI-based and teacher-based writing assessments and to identify emerging debates and research directions in the field. A total of 258 articles were collected from Scopus, ScienceDirect, JSTOR, Emerald, and Ebscohost, and filtered using PRISMA guidelines. Eight articles met the inclusion criteria. The findings reveal that AI-based assessments offer high consistency and efficiency in evaluating surface-level language features, but struggle with assessing higher-order discourse aspects such as coherence, argumentation, and rhetorical structure. Conversely, teacher-based assessments provide richer, context-aware feedback, yet are limited by issues of subjectivity and scalability. A hybrid model that integrates AI efficiency with human insight emerges as a promising solution to balance reliability and validity in writing assessment. Nevertheless, key debates remain regarding AI's scoring authority, construct validity, ethical concerns and, implementation across diverse educational contexts. This study calls for the development of unified frameworks and teacher training to support equitable and effective AI-human collaboration in writing assessment.
Keywords
Full Text:
PDFReferences
Alsalem, M. S. (2024). EFL teachers' perceptions of the use of an AI grading tool (CoGrader) in English writing assessment at Saudi universities: An activity theory perspective. Computers and Education: Artificial Intelligence, 6, 100228. https://doi.org/10.1016/j.caeai.2024.100228
Dikli, S. (2010). The nature of automated essay scoring feedback.
Computers and Composition, 27(3), 195–207. https://doi.org/10.1016/j.compcom.2010.05.002
Hand, B., & Li, M. (2024). Exploring ChatGPT-supported teacher feedback in the EFL context. Journal of Writing Research,
(1), 88–106. https://doi.org/10.1016/j.jowr.2024.01.005
Jamshed, S., Ahmed, W., Sarfaraj, M., & Warda, M. (2024). The impact of ChatGPT on English language learners’ writing skills: An assessment of AI feedback on mobile. TESOL Journal, 15(2), e00489. https://doi.org/10.1002/tesj.489
Kasih, E. N. E. W., & Putra, A. V. L. (2024). Artificial intelligence for literature class: Trends and attitude. In 2024 10th International Conference on Education and Technology (ICET) (pp. 142–148). IEEE. https://doi.org/10.1109/icet60097.2024.103456
Li, A. W., Huang, Y., Wu, Y., & Whipple, M. (2024). Evaluating the role of ChatGPT in enhancing EFL writing assessments in classroom settings: A preliminary investigation. System, 122,102878. https://doi.org/10.1016/j.system.2024.102878
Lin, T., & Crosthwaite, P. (2024). The grass is not always greener: Teacher vs. GPT-assisted written corrective feedback. Assessing Writing, 50, 100617. https://doi.org/10.1016/j.asw.2024.100617
Liu, S., Hao, J., & Wang, Y. (2020). AI in EFL writing assessment: Validity and reliability evidence. Language Testing in Asia,
(5), 45–63. https://doi.org/10.1186/s40468-020-00107-3
Ma, H., & Slater, T. (2016). Connecting Criterion scores and classroom grading contexts: A systemic functional linguistic model for teaching and assessing causal language. Assessing Writing, 30,
–43. https://doi.org/10.1016/j.asw.2016.07.003
Mehdaoui, A. (2024). Unveiling Barriers and Challenges of AI Technology Integration in Education: Assessing Teachers’ Perceptions, Readiness and Anticipated Resistance. Futurity Education, 4(4), 95–108. https://doi.org/10.57125/FED.2024.12.25.06
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., ... & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Saleh, S., & Alshulbi, A. I. (2025). The role of techno-competence in AI- based assessments: Exploring its influence on students’ boredom, self-esteem, and writing development. Language Testing in Asia, 15(6). https://doi.org/10.1186/s40468-025-
-1
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Thwaites, T., Smith, J., & Browne, R. (2025). Reliability and validity issues in writing assessment: An updated perspective. Assessing Writing, 62, 100640. https://doi.org/10.1016/j.asw.2025.100640
Xu, X., Sun, F., & Hu, W. (2025). Integrating human expertise with GenAI: Insights into a collaborative feedback approach in translation education. System, 129, 103600. https://doi.org/10.1016/j.system.2025.103600
Zhang, Y., Chen, L., & Zhao, H. (2019). Validity and reliability of automated writing assessment in EFL contexts. Language Testing, 36(2), 123–142. https://doi.org/10.1177/0265532218758127
Article Metrics
Abstract view : 0 timesPDF - 0 times
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 English Language and Literature International Conference (ELLiC) Proceedings

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Electronic ISSN: 2579-7263
CD-ROM ISSN: 2579-7549
Published by
FACULTY OF FOREIGN LANGUAGE AND CULTURE
UNIVERSITAS MUHAMMADIYAH SEMARANG
Jl. Kedungmundu Raya No.18 Semarang, Central Java, Indonesia
Phone: +622476740295, email: ellic@unimus.ac.id