Sentence-centric modeling of the writing process
DOI:
https://doi.org/10.17239/jowr-2025.16.03.05Keywords:
writing process, sentence-driven, sentence-centric, writing model, keystroke loggingAbstract
Linguistic modeling of the writing process has gained in importance in recent years. Existing models, both from a linguistic perspective focusing on syntactic analyses as used in natural language processing and from writing research, are insufficient to actually linguistically explain what authors do when writing and revising. Writing is linear in time, but writers are free to move to any point in the text produced so far whenever they want, thus producing specific parts (e.g., sentences) in a non-linear fashion. However, the final product is a linear sequence of sentences. We therefore can interpret writing texts as a sentence-driven process. In this new framework, this article proposes a model of the production of sentences during writing. This sentence-centric model builds on existing considerations of transforming sequences, bursts and revisions, and takes into account aspects of linearity and non-linearity on the sentence level. We present a working implementation (available as open source software) and show which information can be gained by the resulting analyses in a small case study.
References
Alves, R. A., Castro, S. L., Sousa, L. de, & Strömqvist, S. (2007). Influence of typing skill on pause–execution cycles in written composition. In M. Torrance, L. van Waes, & D. Galbraith (Eds.), Writing and cognition (pp. 55–65). Brill. https://doi.org/10.1163/9781849508223_005
Baaijen, V. M., & Galbraith, D. (2018). Discovery through writing: Relationships with writing processes and text quality. Cognition and Instruction, 36(3), 199–223. https://doi.org/10.1080/07370008.2018.1456431
Baaijen, V. M., Galbraith, D., & De Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277.
https://doi.org/10.1177/0741088312451108
Bolter, J. D. (1989). Beyond word processing: The computer as a new writing space. Language & Communication, 9(2–3), 129–142. https://doi.org/10.1016/0271-5309(89)90014-1
Bühler, K. (1918). Kritische Musterung der neuen Theorien des Satzes [Critical examination of the new theories of the sentence]. Indogermanisches Jahrbuch, 6, 1–20.
https://doi.org/10.1515/if-1927-0121
Buschenhenke, F., Conijn, R., & Van Waes, L. (2023). Measuring non-linearity of multi-session writing processes. Reading and Writing, 511–537. https://doi.org/10.1007/s11145-023-10449-9
Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98. https://doi.org/10.1177/074108830101800100
Cislaru, G., & Olive, T. (2018). Le processus de textualisation: analyse des unités linguistiques de performance écrite [The textualization process: analysis of linguistic units of written performance]. De Boeck Supérieur. https://doi.org/10.3917/dbu.cisla.2018.01
Collier, R. M. (1983). The Word Processor and Revision Strategies. College Composition and Communication, 34(2), 149–155. https://doi.org/10.2307/357402
Conijn, R., Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). Automated extraction of revision events from keystroke data. Reading and Writing, 37(2), 483–508.
https://doi.org/10.1007/s11145-021-10222-w
Crossley S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415-443. https://doi.org/10.17239/jowr-2020.11.03.01
Daiute, C. A., & Taylor, R. (1981). Computers and the improvement of writing. Proceedings of the ACM ’81 Conference, 83–88. https://doi.org/10.1145/800175.809841
Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). The effect of automated fluency-focused feedback on text production. Journal of Writing Research, 13(2), 231-255.
https://doi.org/10.17239/jowr-2021.13.02.02
Faigley, L., & Witte, S. (1981). Analyzing revision. College Composition and Communication, 32(4), 400–414. https://doi.org/10.2307/356602
Feltgen, Q., Cislaru, G., & Benzitoun, C. (2022). Étude linguistique et statistique des unités de performance écrite: le cas de et [Linguistic and statistical study of written performance units: the case of et]. 8e Congrès mondial de linguistique française, SHS Web of Conferences 138, 10001. https://doi.org/10.1051/shsconf/202213810001
Feltgen, Q., Lefeuvre, F., & Legallois, D. (2023). Sujet clitique et dynamique de l’écrit: un éclairage par les jets textuels [The clitic subject and the dynamics of writing: a look at textual bursts]. Discours. Revue de linguistique, psycholinguistique et informatique, 32.
https://doi.org/10.4000/discours.12509
Feltgen, Q, & Lefeuvre, F. (2025). Clitic subjects as landmarks in the writing production process: A study based on a keylog-derived corpus of writing bursts. Journal of Writing Research, 16(3), 433-460. https://doi.org/10.17239/jowr-2025.16.03.04
Fitzgerald, J. (1987). Research on Revision in Writing. Review of Educational Research, 57(4), 481–506. https://doi.org/10.2307/1170433
Foulin, J.-N. (1995). Pauses et débits: les indicateurs temporels de la production écrite [Pauses and flows: the temporal indicators of written production]. L’année psychologique, 95(3), 483–504. https://doi.org/10.3406/psy.1995.28844
Gardiner, A. H. (1922). The definition of the word and the sentence. British Journal of Psychology: General Section, 12(4), 352–361. https://doi.org/10.1111/j.2044-8295.1922. tb00067.x
Gilquin, G. (2020). In search of constructions in writing process data. Belgian Journal of Linguistics, 34(1), 99–109. https://doi.org/10.1075/bjl.00038.gil
Haas, C. (1989). How the writing medium shapes the writing process: Effects of word processing on planning. Research in the Teaching of English, 23(2), 181–207.
http://www.jstor.org/stable/40171409 https://doi.org/10.58680/rte198915523
Hayes, J. R. (2009). From idea to text. In R. Beard, D. Myhill, J. Riley, & M. Nystrand (Eds.), The SAGE handbook of writing development (pp. 65–79). SAGE.
https://doi.org/10.4135/9780857021069.n5
Hayes, J. R. (2012). Modeling and remodeling writing. Written Communication, 29(3), 369–388. https://doi.org/10.1177/0741088312451260
Immonen, S., & Mäkisalo, J. (2017). Pauses reflecting the processing of syntactic units in monolingual text production and translation. HERMES – Journal of Language and Communication in Business, 23(44), 45–61. https://doi.org/10.7146/hjlcb.v23i44.97266
Ivaska, I., Toropainen, O., & Lahtinen, S. (2025). Pauses during a writing process in two typologically different languages. Journal of Writing Research, 16(3), 405-431. https://doi.org/10.17239/jowr-2025.16.03.03
Johansson, V., Frid, J., & Wengelin, Å. (2018). ScriptLog – an experimental keystroke logging tool. In R. A. Alves & A. Camacho (Eds.), Proceedings of the 1st literacy summit (p. 51).
Kaufer, D. S., Hayes, J. R., & Flower, L. (1986). Composing written sentences. Research in the Teaching of English, 20(2), 121–140. https://www.jstor.org/stable/40171073 https://doi.org/10.58680/rte198615612
Kollberg, P. (1998). S-notation – a Computer Based Method for Studying and Representing Text Composition [Master’s thesis]. Kungliga Tekniska Högskolan.
Kollberg, P., & Severinson Eklundh, K. (2002). Studying writers’ revising patterns with S-notation analysis. In T. Olive & C. M. Levy (Eds.), Contemporary tools and techniques for studying writing (Vol. 10, pp. 89–104). Kluwer. https://doi.org/10.1007/978-94-010-0468-8_5
Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012). From character to word level: Enabling the linguistic analyses of Inputlog process data. In M. Piotrowski, C. Mahlow, & R. Dale (Eds.), Proceedings of the second workshop on computational linguistics and writing (CL&w 2012): Linguistic and cognitive aspects of document creation and document engineering (pp. 1–8). ACL. https://aclanthology.org/W12-0301/
Leijten, M., Van Horenbeeck, E., & Van Waes, L. (2019). Analysing keystroke logging data from a linguistic perspective. In E. Lindgren & K. Sullivan (Eds.), Observing writing (pp. 71–95). Brill. https://doi.org/10.1163/9789004392526_005
Leijten, M., Van Waes, L., & Van Horenbeeck, E. (2015). Analyzing writing process data: A linguistic perspective. In Writing(s) at the crossroads: The process-product interface (pp. 277–302). John Benjamins. https://doi.org/10.1075/z.194.14lei
Lindgren, E., Westum, A., Outakoski, H., & Sullivan, K. P. H. (2019). Revising at the leading edge: Shaping ideas or clearing up noise. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing writing (pp. 346–365). Brill. https://doi.org/10.1163/9789004392526_017
Lutz, J. A. (1983). A study of professional and experienced writers revising and editing at the computer and with pen and paper [PhD thesis]. Rensselaer Polytechnic Institute.
Mahlow, C. (2015). A definition of “version” for text production data and natural language document drafts. In G. Barabucci, U. M. Borghoff, A. Di Iorio, S. Maier, & E. Munson (Eds.), DChanges 2015: Proceedings of the 3rd international workshop on (document) changes: Modeling, detection, storage and visualization (pp. 27–32). ACM.
https://doi.org/10.1145/2881631.2881638
Mahlow, C., Ulasik, M. A., & Tuggener, D. (2024). Extraction of transforming sequences and sentence histories from writing process data: A first step towards linguistic modeling of writing. Reading and Writing, 37, 443–482. https://doi.org/10.1007/s11145-021-10234-6
Mahrer, R., & Zuccarino, G. (2025). Units of linguistic analysis in written production: From the case of enunciative interruptions. Journal of Writing Research, 16(3), 555-569. https://doi.org/10.17239/jowr-2025.16.03.07
Matsuhashi, A. (1981). Pausing and planning: The tempo of written discourse production. Research in the Teaching of English, 15(2), 113–134. https://doi.org/10.58680/rte198115773
Medimorec, S., & Risko, E. F. (2017). Pauses in written composition: On the importance of where writers pause. Reading and Writing, 30, 1267–1285. https://doi.org/10.1007/s11145-017-9723-7
Miletic, A., Benzitoun, C., Cislaru, G., & Herrera-Yanez, S. (2022). Pro-TEXT: An annotated corpus of keystroke logs. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732–1739). European Language Resources Association. https://aclanthology.org/2022.lrec-1.184
Noreen, A. (1903). Vårt språk: nysvensk grammatik i utförlig framställning [Our language: the new Swedish grammar presented in detail] (Vol. 1). Gleerup.
Olive, T. (2012). Writing and working memory: A summary of theories and of findings. In E. L. Grigorenko, E. Mambrino, & D. D. Preiss (Eds.), Writing: A mosaic of new perspectives (pp. 120–136). Psychology Press. https://doi.org/10.4324/9780203808481
Olive, T., & Cislaru, G. (2015). Linguistic forms at the process-product interface: Analysing the linguistic content of bursts of production. In G. Cislaru (Ed.), Writing(s) at the crossroads (pp. 99–124). John Benjamins. https://doi.org/10.1075/z.194.06oli
Piolat, A. (1991). Effects of word processing on text revision. Language and Education, 5(4), 255–272. http://cogprints.org/3621/ https://doi.org/10.1080/09500789109541314
Serbina, T., Hintzen, S., Niemietz, P., & Neumann, S. (2017). Changes of word class during translation – insights from a combined analysis of corpus, keystroke logging and eye-tracking data. In S. Hansen-Schirra, O. Czulo, & S. Hofmann (Eds.), Empirical modelling of translation and interpreting (pp. 177–208). Language Science Press.
https://doi.org/10.5281/zenodo.1090968
Severinson Eklundh, K. (1994). Linear and nonlinear strategies in computer-based writing. Computers and Composition, 11(3), 203–216. https://doi.org/10.1016/8755-4615(94)90013-2
Ulasik, M. A., & Miletić, A. (2024). Automated extraction and analysis of sentences under production: A theoretical framework and its evaluation. Languages, 9(3), 71. https://doi.org/10.3390/languages9030071
Van Waes, L., & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writing fluency applied to L1 and L2. Computers and Composition, 38, 79–95. https://doi.org/https://doi.org/10.1016/j.compcom.2015.09.012
Weizenbaum, J. (1976). Computer power and human reason: From judgment to calculation. W. H. Freeman & Co.
Published
Issue
Section
License
Copyright (c) 2025 Malgorzata Anna Ulasik, Cerstin Mahlow, Michael Piotrowski

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 Unported License.