The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation


  • Yu Tian Arizona State University | USA
  • Scott Crossley Vanderbilt University | USA
  • Luuk Van Waes University of Antwerp | Belgium


Corpus, Keystroke logging, Writing quality


Despite the growing interest in the dynamics of the writing process in writing research, publicly available large-scale corpora of keystroke logs have been rare. We introduce KLiCKe, a freely available collection of keystroke logs for around 5,000 argumentative texts written by adults in the United States. The KLiCKe corpus also includes human-rated holistic scores for the essays as well as writers' demographic details, their typing skills, and vocabulary knowledge. We describe our methods for constructing the corpus and present descriptives for different components of the corpus. To illustrate the use of the KLiCKe corpus, we report a study using a subset of the corpus to investigate whether keystroke features are associated with holistic writing quality for L1 and L2 writers. The study shows that higher writing scores are related to shorter pauses in general, shorter between-word pauses, lower proportion of deletions, higher proportion of insertions, and less process variance. The KLiCKe corpus provides a robust resource for researchers to study the dynamics of text production and revision that will help spur the development of process-oriented tools and methodologies in writing assessment and instruction.


Albrechtsen, D., Haastrup, K., & Henriksen, B. (2008). Vocabulary and writing in a first and Second language: Processes and development. Palgrave Macmillan.

Allen, L. K., Jacovina, M. E., Dascalu, M., Roscoe, R. D., Kent, K. M., Likens, A. D., & McNamara, D. S. (2016). {ENTER} ing the Time Series {SPACE}: Uncovering the Writing Process through Keystroke Analyses. International Educational Data Mining Society.

Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. ETS Research Report Series, 2012(2), i-61.

Alvès, R.A., Castro, S.L., & de Sousa, L. (2007). Influence of typing skill on pause–execution cycles in written composition. In Rijlaarsdam, G. (Series Ed.); M. Torrance, L. van Waes, & D. Galbraith (Volume Eds.), Writing and Cognition: Research and Applications (Studies in Writing, Vol. 20, pp. 55–65). Amsterdam: Elsevier.

Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241-259.

Barkaoui, K. (2016). What and when second‐language learners revise when responding to timed writing tasks on the computer: The roles of task type, second language proficiency, and keyboarding skills. The Modern Language Journal, 100(1), 320-340.

Barkaoui, K. (2019). What can l2 writers’pausing behavior tell us about their l2 writing processes? Studies in Second Language Acquisition, 41(3), 529-554.

Berninger, V. (2000). Development of language by hand and its connections to language by ear, mouth, and eye. Topics of Language Disorders, 20, 65–84.

Bowen, N. E. J. A., Thomas, N., & Vandermeulen, N. (2022). Exploring feedback and regulation in online writing classes with keystroke logging. Computers and Composition, 63, 102692.

Caporossi, G., Leblay, C., & Usoof, H. (2023) GenoGraphiX-LOG (Version 2.1.0) [Computer software]. HEC Montréal & University of Turku.

Carl, M (2012). Translog-II: a program for recording user activity data for empirical reading and writing research. In Proceedings of the eighth international conference on language resources and evaluation (LREC12), pp 4108–4112.

Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18, 80-98.

Choi, I., & Deane, P. (2020). Evaluating Writing Process Features in an Adult EFL Writing Assessment Context: A Keystroke Logging Study. Language Assessment Quarterly, 1-26.

Chukharev-Hudilainen, E. (2019). Empowering automated writing evaluation with keystroke logging. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing writing (pp.125-142). Brill.

Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H. H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583-604.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46.

Conijn, R., Cook, C., Van Zaanen, M., & Van Waes, L. (2022). Early prediction of writing quality using keystroke logging. International Journal of Artificial Intelligence in Education, 32(4), 835-866.

Conijn, R., Rossetti, A., Vandermeulen, N., & Van Waes, L. (n.d.). Phase to phase: Towards an automated procedure to identify phases in writing processes using keystroke data. SSRN. or

Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differences in L1 and L2 writing. Journal of second language writing, 18(2), 119-135.

Crossley, S., Tian, Y., & Wan, Q. (2022). Argumentation features and essay quality: Exploring relationships and incidence counts. Journal of Writing Research, 14(1), 1–34.

Deane, P. (2014). Using writing process and product features to assess writing quality and explore how those features relate to other literacy tasks. ETS Research Report Series, 2014(1), 1-23.

Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2), 151-177.

Deane, P., & Zhang, M. (2015). Exploring the feasibility of using writing process features to assess text production skills. ETS Research Report Series, 2015(2), 1-16.

Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking and keylogging data. Journal of Writing Research, 5(1).

Ferretti, R. P., & Lewis, W. E. (2018). Argumentative writing. In S. Graham, C. A. MacArthur, & J. Fitzgerald (Eds.) Best practices in writing instruction (pp.135-162). Guilford.

Ferris, D. R. (1994). Rhetorical strategies in student persuasive writing: Differences between native and non-native English speakers. Research in the Teaching of English, 45-65.

Frid, J., Wengelin, A., Johansson, V., Johansson, R., & Johansson, M. (2012, July). Testing the temporal accuracy of keystroke logging using the sound card. Paper presented at the 13th International EARLI SIG Writing Conference, Porto, Portugal.

Galbraith, D., & Baaijen, V. M. (2019). Aligning keystrokes with cognitive processes in writing. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing Writing (pp. 306-325). Brill.

Grammarly. (2022). Grammarly [English writing assistant software]. San Francisco, CA: Grammarly Inc.

Grömping, U. (2006). R package relaimpo: relative importance for linear regression. J. Stat. Softw, 17(1), 139-147.

Guo, H., Zhang, M., Deane, P., & Bennett, R. E. (2019). Writing process differences in subgroups reflected in keystroke logs. Journal of Educational and Behavioral Statistics, 44(5), 571-596.

Hamel, M. J., & Séror, J. (2016). Video screen capture to document and scaffold the L2 writing process. Language-learner computer interactions: Theory, methodology, and applications, 137-162.

Hayes, J. R., & Flower, L. (1981). Uncovering cognitive processes in writing: An introduction to protocol analysis. ERIC Clearinghouse.

Hayes, J. R., & Berninger, V. W. (2014). Cognitive processes in writing: A framework. In B. Arfe, J., Dockrell, & V. W. Berninger (Eds), Writing development in children with hearing loss, dyslexia, or oral language problems (pp. 3–15). Oxford: Oxford University Press.

He, L., & Shi, L. (2012). Topical knowledge in ESL writing. Language Testing, 29, 443–464.

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.

Janssen, D., Van Waes, L., & Van den Bergh, H. (2013). Effects of thinking aloud on writing processes. In The science of writing (pp. 233-250). Routledge.

Jung, J. (2017). Effects of task complexity on L2 writing processes and linguistic complexity: A keystroke logging study. English Teaching, 72(4), 179-200.

Kaufer, D. S., Hayes, J. R., & Flower, L. (1986). Composing written sentences. Research in the Teaching of English, 20, 121-140.

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling, New York, NY: Springer.

Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., ... & Team, R. C. (2020). Package ‘caret’. The R Journal, 223(7).

Leijten, M., & Van Waes, L. (2006). Inputlog: New perspectives on the logging of on-line writing processes in a Windows environment. In Computer key-stroke logging and writing (pp. 73-93). Brill.

Leijten, M., & Van Waes, L. (2013). Keystroke Logging in Writing Research: Using Inputlog to Analyze and Visualize Writing Processes. Written Communication 30(3), 358–392.

Leijten, M., & Van Waes, L. (2020). Designing keystroke logging research in writing studies. Chinese journal of second language writing, 1(1), 18-39.

Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2013). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3).

Lemhöfer, K., & Broersma, M. (2012). Introducing LexTALE: A quick and valid lexical test for advanced learners of English. Behavior research methods, 44, 325-343.

Lindeman, R. H., Merenda, P. F., & Gold, R. Z. (1980). Introduction to bivariate and multivariate analysis. Scott Foresman.

Lindgren, E., & Sullivan, K. P. H. (2003). Stimulated recall as a trigger for increasing noticing and language awareness in the L2 writing classroom: A case study of two young female writers. Language Awareness, 12(3–4), 172–186.

Lindgren, E. & Sullivan K. P. H. (2006). Analyzing on-line revision. In G. Rijlaarsdam (Series Ed.) and K. P. H. Sullivan, & E. Lindgren. (Vol. Eds.), Studies in Writing, Vol.18, Computer Keystroke Logging: Methods and Applications, (157–188). Oxford: Elsevier.

Lindgren, E., Sullivan, K. P. H., & Stevenson, M. (2008). Supporting the reflective language learner with computer keystroke logging. In B. Barber & F. Zhang (Eds.), Handbook of research on computer enhanced language acquisition and learning (pp. 189–204). Hershey, NY: Information Science Reference, IGI Global.

Lindgren, E., & Sullivan, K. (Eds.). (2019). Observing writing: Insights from keystroke logging and handwriting. Leiden, The Netherlands: Brill.

Liu, F., & Stapleton, P. (2014). Counterargumentation and the cultivation of critical thinking in argumentative writing: Investigating washback from a high-stakes test. System, 45, 117–128.

Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.

Medimorec, S., & Risko, E. F. (2016). Effects of disfluency in writing. British Journal of Psychology, 107(4), 625-650.

Michel, M., Révész, A., Lu, X., Kourtali, N. E., Lee, M., & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 307-334.

Miletić, A., Benzitoun, C., Cislaru, G., & Herrera-Yanez, S. (2022, June). Pro-text: An annotated corpus of keystroke logs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732-1739).

Milton, J., Wade, J., & Hopkins, N. (2010). Aural word recognition and oral competence in English as a foreign language. Insights into non-native vocabulary teaching and learning, 52, 83-98.

Muñoz Martín, R., & Apfelthaler, M. (2022). A Task Segment Framework to study keylogged translation processes. Translation & Interpreting, 14(2), 8-31.

Olive, T., Favart, M., Beauvais, C., & Beauvais, L. (2009). Children's cognitive effort and fluency in writing: Effects of genre and of handwriting automatisation. Learning and Instruction, 19(4), 299-308.

Qualtrics. (2022). Qualtrics [Online survey platform]. Provo, UT: Qualtrics.

R Core Team (2020). R: a language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from

Ranalli, Jim, Feng, Hui-Hsien, & Chukharev-Hudilainen, Evgeny. (2018). Exploring the potential of process-tracing technologies to support assessment for learning of L2 writing. Assessing Writing, 36, 77–89.

Ransdell, S., Arecco, M. R., & Levy, C. M. (2001). Bilingual long-term working memory: The effects of working memory loads on writing quality and fluency. Applied Psycholinguistics, 22(1), 113.

Révész, A., Michel, M., & Lee, M. (2017). Investigating IELTS academic writing task 2: Relationship between cognitive writing processes, text quality, and working memory. IELTS Research Reports Online Series.

Révész, A., Michel M., & Lee, M. (2022). Exploring the relationship of working memory to the temporal distribution of pausing and revision behaviors during L2 writing. Studies in Second Language Acquisition. 45(3), 680-709.

Russo, J. E., Johnson, E. J., & Stephens, D. L. (1989). The validity of verbal protocols. Memory & cognition, 17(6), 759-769.

Schoonen, R., Gelderen, A. V., Glopper, K. D., Hulstijn, J., Simis, A., Snellings, P., & Stevenson, M. (2003). First language and second language writing: The role of linguistic knowledge, speed of processing, and metacognitive knowledge. Language learning, 53(1), 165-202.

Schumacher, G. M., Klare, G. R., Cronin, F. C., & Moses, J. D. (1984). Cognitive activities of beginning and advanced college writers: A pausal analysis. Research in the Teaching of English, 169-187.

Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly, 27, 657–675.

Sinharay, S., Zhang, M., & Deane, P. (2019). Prediction of essay scores from writing process and product features using data mining methods. Applied Measurement in Education, 32(2), 116-137.

Stannard, R. (2019). A review of screen capture technology feedback research. Studia Universitatis Babes-Bolyai-Philologia, 64(2), 61-72.

Stevenson, M., Schoonen, R., & De Glopper, K. (2006). Revising in two languages: A multi-dimensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15(3), 201-233.

Strömqvist, S., Holmqvist, K., Johansson, V., Karlsson, H., & Wengelin, A. (2006). What key-logging can reveal about writing. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: Methods and applications (pp. 45-72). Amsterdam, Netherlands: Elsevier.

Strömqvist, S., & Malmsten, L. (1998). ScriptLog Pro 1.04: User’s manual. Technical. Göteborg: University of Göteborg.

Tian, Y., Kim, M., & Crossley, S. (2024). Making sense of L2 written argumentation with keystroke logging. Journal of Writing Research, 15(3), 435-461.

Tian, Y., Kim, M., Crossley, S., & Wan, Q. (2021). Cohesive devices as an indicator of L2 students' writing fluency. Reading and Writing, 1-23.

Van den Bergh, H., & Rijlaarsdam, G. (2007). The dynamics of idea generation during writing: An online study. In Writing and cognition (pp. 125-150). Brill.

Van Waes, L., & Leijten, M. (2015). Fluency in Writing: A Multidimensional Perspective on Writing Fluency Applied to L1 and L2. Computers and Composition, 38, 79-95.

Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of open research software.-2013, currens, 7(30), 1-8.

Van Waes, L., Leijten, M., Van Horenbeeck, E., & Pauwaert, T. (2012). A generic XML-structure for logging human computer interaction. In 13th International EARLI SIG Writing Conference, Porto, Portugal.

Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting writing process feedback in the classroom: Using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12(1), 109-140.

Vandermeulen, N., Van Steendam, E., De Maeyer, S., & Rijlaarsdam, G. (2023). Writing process feedback based on keystroke logging and comparison with exemplars: Effects on the quality and process of synthesis texts. Written Communication, 40(1), 90-144.

Vandermeulen, N., Van Steendam, & E., Rijlaarsdam, G. (2020). DATASET - Baseline data LIFT Synthesis Writing project [Data set]. Zenodo.

Wengelin, Å. (2006). Examining pauses in writing: Theories, methods and empirical data. In K.P.H. Sullivan & E. Lindgren (Eds.), Computer Key-Stroke Logging and Writing: Methods and Applications (pp. 107-130). Amsterdam, the Netherlands: Elsevier.

Wengelin, Å., Frid, J., Johansson, R., & Johansson, V. (2019). Combining keystroke logging with other methods. Towards an experimental environment for writing process research. In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 30–49). Leiden: Brill.

Wengelin, Å., Johansson, R., Frid, J., & Johansson, V. (2024). Capturing writers’ typing while visually attending the emerging text: A methodological approach. Reading and Writing, 37(2), 265-289.

Xu, C. (2018). Understanding online revisions in L2 writing: A computer keystroke-log perspective. System, 78, 104-114.

Yang, W., & Kim, Y. (2020). The effect of topic familiarity on the complexity, accuracy, and fluency of second language writing. Applied Linguistics Review, 11, 79–108.

Yoon, H. J. (2021). Interactions in EFL argumentative writing: Effects of topic, L1 background, and L2 proficiency on interactional metadiscourse. Reading and Writing, 34(3), 705-725.

Zhang, M., Hao, J., Li, C., & Deane, P. (2016). Classification of writing patterns using keystroke logs. In Quantitative psychology research: The 80th annual meeting of the psychometric society, Beijing, 2015 (pp. 299-314). Springer International Publishing.

Zhu, M., Zhang, M., & Deane, P. (2019). Analysis of keystroke sequences in writing logs. ETS Research Report Series, 2019(1), 1-16.






How to Cite

The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation. (2025). Journal of Writing Research.

Similar Articles

1-10 of 250

You may also start an advanced similarity search for this article.