Sentence-centric modeling of the writing process


  • Malgorzata Anna Ulasik ZHAW, Zurich University of Applied Sciences |Switzerland
  • Cerstin Mahlow ZHAW, Zurich University of Applied Sciences | Switzerland
  • Michael Piotrowski University of Lausanne | Switzerland



writing process, sentence-driven, sentence-centric, writing model, keystroke logging


Linguistic modeling of the writing process has gained in importance in recent years. Existing models, both from a linguistic perspective focusing on syntactic analyses as used in natural language processing and from writing research, are insufficient to actually linguistically explain what authors do when writing and revising. Writing is linear in time, but writers are free to move to any point in the text produced so far whenever they want, thus producing specific parts (e.g., sentences) in a non-linear fashion. However, the final product is a linear sequence of sentences. We therefore can interpret writing texts as a sentence-driven process. In this new framework, this article proposes a model of the production of sentences during writing. This sentence-centric model builds on existing considerations of transforming sequences, bursts and revisions, and takes into account aspects of linearity and non-linearity on the sentence level. We present a working implementation (available as open source software) and show which information can be gained by the resulting analyses in a small case study.


Alves, R. A., Castro, S. L., Sousa, L. de, & Strömqvist, S. (2007). Influence of typing skill on pause–execution cycles in written composition. In M. Torrance, L. van Waes, & D. Galbraith (Eds.), Writing and cognition (pp. 55–65). Brill.

Baaijen, V. M., & Galbraith, D. (2018). Discovery through writing: Relationships with writing processes and text quality. Cognition and Instruction, 36(3), 199–223.

Baaijen, V. M., Galbraith, D., & De Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277.

Bolter, J. D. (1989). Beyond word processing: The computer as a new writing space. Language & Communication, 9(2–3), 129–142.

Bühler, K. (1918). Kritische Musterung der neuen Theorien des Satzes [Critical examination of the new theories of the sentence]. Indogermanisches Jahrbuch, 6, 1–20.

Buschenhenke, F., Conijn, R., & Van Waes, L. (2023). Measuring non-linearity of multi-session writing processes. Reading and Writing, 511–537.

Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98.

Cislaru, G., & Olive, T. (2018). Le processus de textualisation: analyse des unités linguistiques de performance écrite [The textualization process: analysis of linguistic units of written performance]. De Boeck Supérieur.

Collier, R. M. (1983). The Word Processor and Revision Strategies. College Composition and Communication, 34(2), 149–155.

Conijn, R., Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). Automated extraction of revision events from keystroke data. Reading and Writing, 37(2), 483–508.

Crossley S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415-443.

Daiute, C. A., & Taylor, R. (1981). Computers and the improvement of writing. Proceedings of the ACM ’81 Conference, 83–88.

Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). The effect of automated fluency-focused feedback on text production. Journal of Writing Research, 13(2), 231-255.

Faigley, L., & Witte, S. (1981). Analyzing revision. College Composition and Communication, 32(4), 400–414.

Feltgen, Q., Cislaru, G., & Benzitoun, C. (2022). Étude linguistique et statistique des unités de performance écrite: le cas de et [Linguistic and statistical study of written performance units: the case of et]. 8e Congrès mondial de linguistique française, SHS Web of Conferences 138, 10001.

Feltgen, Q., Lefeuvre, F., & Legallois, D. (2023). Sujet clitique et dynamique de l’écrit: un éclairage par les jets textuels [The clitic subject and the dynamics of writing: a look at textual bursts]. Discours. Revue de linguistique, psycholinguistique et informatique, 32.

Feltgen, Q, & Lefeuvre, F. (2025). Clitic subjects as landmarks in the writing production process: A study based on a keylog-derived corpus of writing bursts. Journal of Writing Research, 16(3), 433-460.

Fitzgerald, J. (1987). Research on Revision in Writing. Review of Educational Research, 57(4), 481–506.

Foulin, J.-N. (1995). Pauses et débits: les indicateurs temporels de la production écrite [Pauses and flows: the temporal indicators of written production]. L’année psychologique, 95(3), 483–504.

Gardiner, A. H. (1922). The definition of the word and the sentence. British Journal of Psychology: General Section, 12(4), 352–361. tb00067.x

Gilquin, G. (2020). In search of constructions in writing process data. Belgian Journal of Linguistics, 34(1), 99–109.

Haas, C. (1989). How the writing medium shapes the writing process: Effects of word processing on planning. Research in the Teaching of English, 23(2), 181–207.

Hayes, J. R. (2009). From idea to text. In R. Beard, D. Myhill, J. Riley, & M. Nystrand (Eds.), The SAGE handbook of writing development (pp. 65–79). SAGE.

Hayes, J. R. (2012). Modeling and remodeling writing. Written Communication, 29(3), 369–388.

Immonen, S., & Mäkisalo, J. (2017). Pauses reflecting the processing of syntactic units in monolingual text production and translation. HERMES – Journal of Language and Communication in Business, 23(44), 45–61.

Ivaska, I., Toropainen, O., & Lahtinen, S. (2025). Pauses during a writing process in two typologically different languages. Journal of Writing Research, 16(3), 405-431.

Johansson, V., Frid, J., & Wengelin, Å. (2018). ScriptLog – an experimental keystroke logging tool. In R. A. Alves & A. Camacho (Eds.), Proceedings of the 1st literacy summit (p. 51).

Kaufer, D. S., Hayes, J. R., & Flower, L. (1986). Composing written sentences. Research in the Teaching of English, 20(2), 121–140.

Kollberg, P. (1998). S-notation – a Computer Based Method for Studying and Representing Text Composition [Master’s thesis]. Kungliga Tekniska Högskolan.

Kollberg, P., & Severinson Eklundh, K. (2002). Studying writers’ revising patterns with S-notation analysis. In T. Olive & C. M. Levy (Eds.), Contemporary tools and techniques for studying writing (Vol. 10, pp. 89–104). Kluwer.

Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012). From character to word level: Enabling the linguistic analyses of Inputlog process data. In M. Piotrowski, C. Mahlow, & R. Dale (Eds.), Proceedings of the second workshop on computational linguistics and writing (CL&w 2012): Linguistic and cognitive aspects of document creation and document engineering (pp. 1–8). ACL.

Leijten, M., Van Horenbeeck, E., & Van Waes, L. (2019). Analysing keystroke logging data from a linguistic perspective. In E. Lindgren & K. Sullivan (Eds.), Observing writing (pp. 71–95). Brill.

Leijten, M., Van Waes, L., & Van Horenbeeck, E. (2015). Analyzing writing process data: A linguistic perspective. In Writing(s) at the crossroads: The process-product interface (pp. 277–302). John Benjamins.

Lindgren, E., Westum, A., Outakoski, H., & Sullivan, K. P. H. (2019). Revising at the leading edge: Shaping ideas or clearing up noise. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing writing (pp. 346–365). Brill.

Lutz, J. A. (1983). A study of professional and experienced writers revising and editing at the computer and with pen and paper [PhD thesis]. Rensselaer Polytechnic Institute.

Mahlow, C. (2015). A definition of “version” for text production data and natural language document drafts. In G. Barabucci, U. M. Borghoff, A. Di Iorio, S. Maier, & E. Munson (Eds.), DChanges 2015: Proceedings of the 3rd international workshop on (document) changes: Modeling, detection, storage and visualization (pp. 27–32). ACM.

Mahlow, C., Ulasik, M. A., & Tuggener, D. (2024). Extraction of transforming sequences and sentence histories from writing process data: A first step towards linguistic modeling of writing. Reading and Writing, 37, 443–482.

Mahrer, R., & Zuccarino, G. (2025). Units of linguistic analysis in written production: From the case of enunciative interruptions. Journal of Writing Research, 16(3), 555-569.

Matsuhashi, A. (1981). Pausing and planning: The tempo of written discourse production. Research in the Teaching of English, 15(2), 113–134.

Medimorec, S., & Risko, E. F. (2017). Pauses in written composition: On the importance of where writers pause. Reading and Writing, 30, 1267–1285.

Miletic, A., Benzitoun, C., Cislaru, G., & Herrera-Yanez, S. (2022). Pro-TEXT: An annotated corpus of keystroke logs. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732–1739). European Language Resources Association.

Noreen, A. (1903). Vårt språk: nysvensk grammatik i utförlig framställning [Our language: the new Swedish grammar presented in detail] (Vol. 1). Gleerup.

Olive, T. (2012). Writing and working memory: A summary of theories and of findings. In E. L. Grigorenko, E. Mambrino, & D. D. Preiss (Eds.), Writing: A mosaic of new perspectives (pp. 120–136). Psychology Press.

Olive, T., & Cislaru, G. (2015). Linguistic forms at the process-product interface: Analysing the linguistic content of bursts of production. In G. Cislaru (Ed.), Writing(s) at the crossroads (pp. 99–124). John Benjamins.

Piolat, A. (1991). Effects of word processing on text revision. Language and Education, 5(4), 255–272.

Serbina, T., Hintzen, S., Niemietz, P., & Neumann, S. (2017). Changes of word class during translation – insights from a combined analysis of corpus, keystroke logging and eye-tracking data. In S. Hansen-Schirra, O. Czulo, & S. Hofmann (Eds.), Empirical modelling of translation and interpreting (pp. 177–208). Language Science Press.

Severinson Eklundh, K. (1994). Linear and nonlinear strategies in computer-based writing. Computers and Composition, 11(3), 203–216.

Ulasik, M. A., & Miletić, A. (2024). Automated extraction and analysis of sentences under production: A theoretical framework and its evaluation. Languages, 9(3), 71.

Van Waes, L., & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writing fluency applied to L1 and L2. Computers and Composition, 38, 79–95.

Weizenbaum, J. (1976). Computer power and human reason: From judgment to calculation. W. H. Freeman & Co.






How to Cite

Sentence-centric modeling of the writing process. (2025). Journal of Writing Research, 16(3), 463-498.

Similar Articles

1-10 of 253

You may also start an advanced similarity search for this article.