The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation

Authors

  • Yu Tian Arizona State University | USA
  • Scott Crossley Vanderbilt University | USA
  • Luuk Van Waes University of Antwerp | Belgium https://orcid.org/0000-0002-3642-9533

Keywords:

Corpus, Keystroke logging, Writing quality

Abstract

Despite the growing interest in the dynamics of the writing process in writing research, publicly available large-scale corpora of keystroke logs have been rare. We introduce KLiCKe, a freely available collection of keystroke logs for around 5,000 argumentative texts written by adults in the United States. The KLiCKe corpus also includes human-rated holistic scores for the essays as well as writers' demographic details, their typing skills, and vocabulary knowledge. We describe our methods for constructing the corpus and present descriptives for different components of the corpus. To illustrate the use of the KLiCKe corpus, we report a study using a subset of the corpus to investigate whether keystroke features are associated with holistic writing quality for L1 and L2 writers. The study shows that higher writing scores are related to shorter pauses in general, shorter between-word pauses, lower proportion of deletions, higher proportion of insertions, and less process variance. The KLiCKe corpus provides a robust resource for researchers to study the dynamics of text production and revision that will help spur the development of process-oriented tools and methodologies in writing assessment and instruction.

References

Albrechtsen, D., Haastrup, K., & Henriksen, B. (2008). Vocabulary and writing in a first and Second language: Processes and development. Palgrave Macmillan. http://dx.doi.org/10.1057/9780230593404

Allen, L. K., Jacovina, M. E., Dascalu, M., Roscoe, R. D., Kent, K. M., Likens, A. D., & McNamara, D. S. (2016). {ENTER} ing the Time Series {SPACE}: Uncovering the Writing Process through Keystroke Analyses. International Educational Data Mining Society.

Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. ETS Research Report Series, 2012(2), i-61. http://dx.doi.org/10.1002/j.2333-8504.2012.tb02305.x

Alvès, R.A., Castro, S.L., & de Sousa, L. (2007). Influence of typing skill on pause–execution cycles in written composition. In Rijlaarsdam, G. (Series Ed.); M. Torrance, L. van Waes, & D. Galbraith (Volume Eds.), Writing and Cognition: Research and Applications (Studies in Writing, Vol. 20, pp. 55–65). Amsterdam: Elsevier. http://dx.doi.org/10.1163/9781849508223_005

Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241-259. http://dx.doi.org/10.1177/0265532213509810

Barkaoui, K. (2016). What and when second‐language learners revise when responding to timed writing tasks on the computer: The roles of task type, second language proficiency, and keyboarding skills. The Modern Language Journal, 100(1), 320-340. http://dx.doi.org/10.1111/modl.12316

Barkaoui, K. (2019). What can l2 writers’pausing behavior tell us about their l2 writing processes? Studies in Second Language Acquisition, 41(3), 529-554. http://dx.doi.org/10.1017/s027226311900010x

Berninger, V. (2000). Development of language by hand and its connections to language by ear, mouth, and eye. Topics of Language Disorders, 20, 65–84. http://dx.doi.org/10.1097/00011363-200020040-00007

Bowen, N. E. J. A., Thomas, N., & Vandermeulen, N. (2022). Exploring feedback and regulation in online writing classes with keystroke logging. Computers and Composition, 63, 102692. http://dx.doi.org/10.1016/j.compcom.2022.102692

Caporossi, G., Leblay, C., & Usoof, H. (2023) GenoGraphiX-LOG (Version 2.1.0) [Computer software]. HEC Montréal & University of Turku. https://ggxlog.net

Carl, M (2012). Translog-II: a program for recording user activity data for empirical reading and writing research. In Proceedings of the eighth international conference on language resources and evaluation (LREC12), pp 4108–4112.

Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18, 80-98. http://dx.doi.org/10.1177/0741088301018001004

Choi, I., & Deane, P. (2020). Evaluating Writing Process Features in an Adult EFL Writing Assessment Context: A Keystroke Logging Study. Language Assessment Quarterly, 1-26. http://dx.doi.org/10.1080/15434303.2020.1804913

Chukharev-Hudilainen, E. (2019). Empowering automated writing evaluation with keystroke logging. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing writing (pp.125-142). Brill. http://dx.doi.org/10.1163/9789004392526_007

Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H. H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583-604. http://dx.doi.org/10.1017/s027226311900007x

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1), 37-46. http://dx.doi.org/10.1177/001316446002000104

Conijn, R., Cook, C., Van Zaanen, M., & Van Waes, L. (2022). Early prediction of writing quality using keystroke logging. International Journal of Artificial Intelligence in Education, 32(4), 835-866. http://dx.doi.org/10.1007/s40593-021-00268-w

Conijn, R., Rossetti, A., Vandermeulen, N., & Van Waes, L. (n.d.). Phase to phase: Towards an automated procedure to identify phases in writing processes using keystroke data. SSRN. https://ssrn.com/abstract=4993558 or https://doi.org/10.2139/ssrn.4993558

Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differences in L1 and L2 writing. Journal of second language writing, 18(2), 119-135. http://dx.doi.org/10.1016/j.jslw.2009.02.002

Crossley, S., Tian, Y., & Wan, Q. (2022). Argumentation features and essay quality: Exploring relationships and incidence counts. Journal of Writing Research, 14(1), 1–34. http://dx.doi.org/10.17239/jowr-2022.14.01.01

Deane, P. (2014). Using writing process and product features to assess writing quality and explore how those features relate to other literacy tasks. ETS Research Report Series, 2014(1), 1-23. http://dx.doi.org/10.1002/ets2.12002

Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2), 151-177. http://dx.doi.org/10.17239/jowr-2010.02.02.4

Deane, P., & Zhang, M. (2015). Exploring the feasibility of using writing process features to assess text production skills. ETS Research Report Series, 2015(2), 1-16. http://dx.doi.org/10.1002/ets2.12071

Dragsted, B., & Carl, M. (2013). Towards a classification of translation styles based on eye-tracking and keylogging data. Journal of Writing Research, 5(1). http://dx.doi.org/10.17239/jowr-2013.05.01.6

Ferretti, R. P., & Lewis, W. E. (2018). Argumentative writing. In S. Graham, C. A. MacArthur, & J. Fitzgerald (Eds.) Best practices in writing instruction (pp.135-162). Guilford. http://dx.doi.org/10.17239/jowr-2014.06.02.5

Ferris, D. R. (1994). Rhetorical strategies in student persuasive writing: Differences between native and non-native English speakers. Research in the Teaching of English, 45-65. http://dx.doi.org/10.58680/rte199415388

Frid, J., Wengelin, A., Johansson, V., Johansson, R., & Johansson, M. (2012, July). Testing the temporal accuracy of keystroke logging using the sound card. Paper presented at the 13th International EARLI SIG Writing Conference, Porto, Portugal.

Galbraith, D., & Baaijen, V. M. (2019). Aligning keystrokes with cognitive processes in writing. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing Writing (pp. 306-325). Brill. http://dx.doi.org/10.1163/9789004392526_015

Grammarly. (2022). Grammarly [English writing assistant software]. San Francisco, CA: Grammarly Inc. https://www.grammarly.com

Grömping, U. (2006). R package relaimpo: relative importance for linear regression. J. Stat. Softw, 17(1), 139-147. http://dx.doi.org/10.18637/jss.v017.i01

Guo, H., Zhang, M., Deane, P., & Bennett, R. E. (2019). Writing process differences in subgroups reflected in keystroke logs. Journal of Educational and Behavioral Statistics, 44(5), 571-596. http://dx.doi.org/10.3102/1076998619856590

Hamel, M. J., & Séror, J. (2016). Video screen capture to document and scaffold the L2 writing process. Language-learner computer interactions: Theory, methodology, and applications, 137-162. http://dx.doi.org/10.1075/lsse.2.07ham

Hayes, J. R., & Flower, L. (1981). Uncovering cognitive processes in writing: An introduction to protocol analysis. ERIC Clearinghouse.

Hayes, J. R., & Berninger, V. W. (2014). Cognitive processes in writing: A framework. In B. Arfe, J., Dockrell, & V. W. Berninger (Eds), Writing development in children with hearing loss, dyslexia, or oral language problems (pp. 3–15). Oxford: Oxford University Press. http://dx.doi.org/10.1093/acprof:oso/9780199827282.001.0001

He, L., & Shi, L. (2012). Topical knowledge in ESL writing. Language Testing, 29, 443–464. http://dx.doi.org/10.1177/0265532212436659

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons. http://dx.doi.org/10.32614/cran.package.aplore3

Janssen, D., Van Waes, L., & Van den Bergh, H. (2013). Effects of thinking aloud on writing processes. In The science of writing (pp. 233-250). Routledge.

Jung, J. (2017). Effects of task complexity on L2 writing processes and linguistic complexity: A keystroke logging study. English Teaching, 72(4), 179-200. http://dx.doi.org/10.15858/engtea.72.4.201712.179

Kaufer, D. S., Hayes, J. R., & Flower, L. (1986). Composing written sentences. Research in the Teaching of English, 20, 121-140. http://dx.doi.org/10.58680/rte198615612

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling, New York, NY: Springer.

Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., ... & Team, R. C. (2020). Package ‘caret’. The R Journal, 223(7).

Leijten, M., & Van Waes, L. (2006). Inputlog: New perspectives on the logging of on-line writing processes in a Windows environment. In Computer key-stroke logging and writing (pp. 73-93). Brill. http://dx.doi.org/10.1163/9780080460932_006

Leijten, M., & Van Waes, L. (2013). Keystroke Logging in Writing Research: Using Inputlog to Analyze and Visualize Writing Processes. Written Communication 30(3), 358–392. http://dx.doi.org/10.1177/0741088313491692

Leijten, M., & Van Waes, L. (2020). Designing keystroke logging research in writing studies. Chinese journal of second language writing, 1(1), 18-39. http://dx.doi.org/10.1163/9789004392526_005

Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2013). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3). http://dx.doi.org/10.17239/jowr-2014.05.03.3

Lemhöfer, K., & Broersma, M. (2012). Introducing LexTALE: A quick and valid lexical test for advanced learners of English. Behavior research methods, 44, 325-343. http://dx.doi.org/10.3758/s13428-011-0146-0

Lindeman, R. H., Merenda, P. F., & Gold, R. Z. (1980). Introduction to bivariate and multivariate analysis. Scott Foresman.

Lindgren, E., & Sullivan, K. P. H. (2003). Stimulated recall as a trigger for increasing noticing and language awareness in the L2 writing classroom: A case study of two young female writers. Language Awareness, 12(3–4), 172–186. http://dx.doi.org/10.1080/09658410308667075

Lindgren, E. & Sullivan K. P. H. (2006). Analyzing on-line revision. In G. Rijlaarsdam (Series Ed.) and K. P. H. Sullivan, & E. Lindgren. (Vol. Eds.), Studies in Writing, Vol.18, Computer Keystroke Logging: Methods and Applications, (157–188). Oxford: Elsevier. http://dx.doi.org/10.1163/9780080460932_010

Lindgren, E., Sullivan, K. P. H., & Stevenson, M. (2008). Supporting the reflective language learner with computer keystroke logging. In B. Barber & F. Zhang (Eds.), Handbook of research on computer enhanced language acquisition and learning (pp. 189–204). Hershey, NY: Information Science Reference, IGI Global. http://dx.doi.org/10.4018/978-1-59904-895-6.ch011

Lindgren, E., & Sullivan, K. (Eds.). (2019). Observing writing: Insights from keystroke logging and handwriting. Leiden, The Netherlands: Brill. http://dx.doi.org/10.1163/9789004392526

Liu, F., & Stapleton, P. (2014). Counterargumentation and the cultivation of critical thinking in argumentative writing: Investigating washback from a high-stakes test. System, 45, 117–128. http://dx.doi.org/10.1016/j.system.2014.05.005

Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.

Medimorec, S., & Risko, E. F. (2016). Effects of disfluency in writing. British Journal of Psychology, 107(4), 625-650. http://dx.doi.org/10.1111/bjop.12177

Michel, M., Révész, A., Lu, X., Kourtali, N. E., Lee, M., & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 307-334.

http://dx.doi.org/10.1177/0267658320915501

Miletić, A., Benzitoun, C., Cislaru, G., & Herrera-Yanez, S. (2022, June). Pro-text: An annotated corpus of keystroke logs. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732-1739).

Milton, J., Wade, J., & Hopkins, N. (2010). Aural word recognition and oral competence in English as a foreign language. Insights into non-native vocabulary teaching and learning, 52, 83-98. http://dx.doi.org/10.21832/9781847692900-007

Muñoz Martín, R., & Apfelthaler, M. (2022). A Task Segment Framework to study keylogged translation processes. Translation & Interpreting, 14(2), 8-31. http://dx.doi.org/10.12807/ti.114202.2022.a02

Olive, T., Favart, M., Beauvais, C., & Beauvais, L. (2009). Children's cognitive effort and fluency in writing: Effects of genre and of handwriting automatisation. Learning and Instruction, 19(4), 299-308. http://dx.doi.org/10.1016/j.learninstruc.2008.05.005

Qualtrics. (2022). Qualtrics [Online survey platform]. Provo, UT: Qualtrics. https://www.qualtrics.com

R Core Team (2020). R: a language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from https://www.r-project.org/.

Ranalli, Jim, Feng, Hui-Hsien, & Chukharev-Hudilainen, Evgeny. (2018). Exploring the potential of process-tracing technologies to support assessment for learning of L2 writing. Assessing Writing, 36, 77–89. http://dx.doi.org/10.1016/j.asw.2018.03.007

Ransdell, S., Arecco, M. R., & Levy, C. M. (2001). Bilingual long-term working memory: The effects of working memory loads on writing quality and fluency. Applied Psycholinguistics, 22(1), 113. http://dx.doi.org/10.1017/s0142716401001060

Révész, A., Michel, M., & Lee, M. (2017). Investigating IELTS academic writing task 2: Relationship between cognitive writing processes, text quality, and working memory. IELTS Research Reports Online Series. https://www.ielts.org/en-us/for-researchers/research-reports/ielts_online_rr_2017-3

Révész, A., Michel M., & Lee, M. (2022). Exploring the relationship of working memory to the temporal distribution of pausing and revision behaviors during L2 writing. Studies in Second Language Acquisition. 45(3), 680-709. http://dx.doi.org/10.1017/s0272263123000074

Russo, J. E., Johnson, E. J., & Stephens, D. L. (1989). The validity of verbal protocols. Memory & cognition, 17(6), 759-769. http://dx.doi.org/10.3758/bf03202637

Schoonen, R., Gelderen, A. V., Glopper, K. D., Hulstijn, J., Simis, A., Snellings, P., & Stevenson, M. (2003). First language and second language writing: The role of linguistic knowledge, speed of processing, and metacognitive knowledge. Language learning, 53(1), 165-202. http://dx.doi.org/10.1111/1467-9922.00213

Schumacher, G. M., Klare, G. R., Cronin, F. C., & Moses, J. D. (1984). Cognitive activities of beginning and advanced college writers: A pausal analysis. Research in the Teaching of English, 169-187. http://dx.doi.org/10.58680/rte198415678

Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly, 27, 657–675. http://dx.doi.org/10.2307/3587400

Sinharay, S., Zhang, M., & Deane, P. (2019). Prediction of essay scores from writing process and product features using data mining methods. Applied Measurement in Education, 32(2), 116-137. http://dx.doi.org/10.1080/08957347.2019.1577245

Stannard, R. (2019). A review of screen capture technology feedback research. Studia Universitatis Babes-Bolyai-Philologia, 64(2), 61-72. http://dx.doi.org/10.24193/subbphilo.2019.2.05

Stevenson, M., Schoonen, R., & De Glopper, K. (2006). Revising in two languages: A multi-dimensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15(3), 201-233. http://dx.doi.org/10.1016/j.jslw.2006.06.002

Strömqvist, S., Holmqvist, K., Johansson, V., Karlsson, H., & Wengelin, A. (2006). What key-logging can reveal about writing. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: Methods and applications (pp. 45-72). Amsterdam, Netherlands: Elsevier. http://dx.doi.org/10.1163/9780080460932_005

Strömqvist, S., & Malmsten, L. (1998). ScriptLog Pro 1.04: User’s manual. Technical. Göteborg: University of Göteborg.

Tian, Y., Kim, M., & Crossley, S. (2024). Making sense of L2 written argumentation with keystroke logging. Journal of Writing Research, 15(3), 435-461. http://dx.doi.org/10.17239/jowr-2024.15.03.01

Tian, Y., Kim, M., Crossley, S., & Wan, Q. (2021). Cohesive devices as an indicator of L2 students' writing fluency. Reading and Writing, 1-23. http://dx.doi.org/10.1007/s11145-021-10229-3

Van den Bergh, H., & Rijlaarsdam, G. (2007). The dynamics of idea generation during writing: An online study. In Writing and cognition (pp. 125-150). Brill. http://dx.doi.org/10.1163/9781849508223_010

Van Waes, L., & Leijten, M. (2015). Fluency in Writing: A Multidimensional Perspective on Writing Fluency Applied to L1 and L2. Computers and Composition, 38, 79-95. http://dx.doi.org/10.1016/j.compcom.2015.09.012

Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of open research software.-2013, currens, 7(30), 1-8. http://dx.doi.org/10.5334/jors.234

Van Waes, L., Leijten, M., Van Horenbeeck, E., & Pauwaert, T. (2012). A generic XML-structure for logging human computer interaction. In 13th International EARLI SIG Writing Conference, Porto, Portugal.

Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting writing process feedback in the classroom: Using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12(1), 109-140. http://dx.doi.org/10.17239/jowr-2020.12.01.05

Vandermeulen, N., Van Steendam, E., De Maeyer, S., & Rijlaarsdam, G. (2023). Writing process feedback based on keystroke logging and comparison with exemplars: Effects on the quality and process of synthesis texts. Written Communication, 40(1), 90-144. http://dx.doi.org/10.1177/07410883221127998

Vandermeulen, N., Van Steendam, & E., Rijlaarsdam, G. (2020). DATASET - Baseline data LIFT Synthesis Writing project [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3893538

Wengelin, Å. (2006). Examining pauses in writing: Theories, methods and empirical data. In K.P.H. Sullivan & E. Lindgren (Eds.), Computer Key-Stroke Logging and Writing: Methods and Applications (pp. 107-130). Amsterdam, the Netherlands: Elsevier. http://dx.doi.org/10.1163/9780080460932_008

Wengelin, Å., Frid, J., Johansson, R., & Johansson, V. (2019). Combining keystroke logging with other methods. Towards an experimental environment for writing process research. In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 30–49). Leiden: Brill. http://dx.doi.org/10.1163/9789004392526_003

Wengelin, Å., Johansson, R., Frid, J., & Johansson, V. (2024). Capturing writers’ typing while visually attending the emerging text: A methodological approach. Reading and Writing, 37(2), 265-289. http://dx.doi.org/10.1007/s11145-022-10397-w

Xu, C. (2018). Understanding online revisions in L2 writing: A computer keystroke-log perspective. System, 78, 104-114. http://dx.doi.org/10.1016/j.system.2018.08.007

Yang, W., & Kim, Y. (2020). The effect of topic familiarity on the complexity, accuracy, and fluency of second language writing. Applied Linguistics Review, 11, 79–108. http://dx.doi.org/10.1515/applirev-2017-0017

Yoon, H. J. (2021). Interactions in EFL argumentative writing: Effects of topic, L1 background, and L2 proficiency on interactional metadiscourse. Reading and Writing, 34(3), 705-725. http://dx.doi.org/10.1007/s11145-020-10085-7

Zhang, M., Hao, J., Li, C., & Deane, P. (2016). Classification of writing patterns using keystroke logs. In Quantitative psychology research: The 80th annual meeting of the psychometric society, Beijing, 2015 (pp. 299-314). Springer International Publishing. http://dx.doi.org/10.1007/978-3-319-38759-8_23

Zhu, M., Zhang, M., & Deane, P. (2019). Analysis of keystroke sequences in writing logs. ETS Research Report Series, 2019(1), 1-16. http://dx.doi.org/10.1002/ets2.12247

Published

2025-02-05

Issue

Section

Articles

How to Cite

The KLiCKe Corpus: Keystroke Logging in Compositions for Knowledge Evaluation. (2025). Journal of Writing Research. https://www.jowr.org/jowr/article/view/1556

Similar Articles

1-10 of 250

You may also start an advanced similarity search for this article.