Sentence-centric modeling of the writing process

Authors

  • Malgorzata Anna Ulasik ZHAW, Zurich University of Applied Sciences |Switzerland
  • Cerstin Mahlow ZHAW, Zurich University of Applied Sciences | Switzerland
  • Michael Piotrowski University of Lausanne | Switzerland

DOI:

https://doi.org/10.17239/jowr-2025.16.03.05

Keywords:

writing process, sentence-driven, sentence-centric, writing model, keystroke logging

Abstract

Linguistic modeling of the writing process has gained in importance in recent years. Existing models, both from a linguistic perspective focusing on syntactic analyses as used in natural language processing and from writing research, are insufficient to actually linguistically explain what authors do when writing and revising. Writing is linear in time, but writers are free to move to any point in the text produced so far whenever they want, thus producing specific parts (e.g., sentences) in a non-linear fashion. However, the final product is a linear sequence of sentences. We therefore can interpret writing texts as a sentence-driven process. In this new framework, this article proposes a model of the production of sentences during writing. This sentence-centric model builds on existing considerations of transforming sequences, bursts and revisions, and takes into account aspects of linearity and non-linearity on the sentence level. We present a working implementation (available as open source software) and show which information can be gained by the resulting analyses in a small case study.

References

Alves, R. A., Castro, S. L., Sousa, L. de, & Strömqvist, S. (2007). Influence of typing skill on pause–execution cycles in written composition. In M. Torrance, L. van Waes, & D. Galbraith (Eds.), Writing and cognition (pp. 55–65). Brill. https://doi.org/10.1163/9781849508223_005

Baaijen, V. M., & Galbraith, D. (2018). Discovery through writing: Relationships with writing processes and text quality. Cognition and Instruction, 36(3), 199–223. https://doi.org/10.1080/07370008.2018.1456431

Baaijen, V. M., Galbraith, D., & De Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277.

https://doi.org/10.1177/0741088312451108

Bolter, J. D. (1989). Beyond word processing: The computer as a new writing space. Language & Communication, 9(2–3), 129–142. https://doi.org/10.1016/0271-5309(89)90014-1

Bühler, K. (1918). Kritische Musterung der neuen Theorien des Satzes [Critical examination of the new theories of the sentence]. Indogermanisches Jahrbuch, 6, 1–20.

https://doi.org/10.1515/if-1927-0121

Buschenhenke, F., Conijn, R., & Van Waes, L. (2023). Measuring non-linearity of multi-session writing processes. Reading and Writing, 511–537. https://doi.org/10.1007/s11145-023-10449-9

Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98. https://doi.org/10.1177/074108830101800100

Cislaru, G., & Olive, T. (2018). Le processus de textualisation: analyse des unités linguistiques de performance écrite [The textualization process: analysis of linguistic units of written performance]. De Boeck Supérieur. https://doi.org/10.3917/dbu.cisla.2018.01

Collier, R. M. (1983). The Word Processor and Revision Strategies. College Composition and Communication, 34(2), 149–155. https://doi.org/10.2307/357402

Conijn, R., Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). Automated extraction of revision events from keystroke data. Reading and Writing, 37(2), 483–508.

https://doi.org/10.1007/s11145-021-10222-w

Crossley S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415-443. https://doi.org/10.17239/jowr-2020.11.03.01

Daiute, C. A., & Taylor, R. (1981). Computers and the improvement of writing. Proceedings of the ACM ’81 Conference, 83–88. https://doi.org/10.1145/800175.809841

Dux Speltz, E., & Chukharev-Hudilainen, E. (2021). The effect of automated fluency-focused feedback on text production. Journal of Writing Research, 13(2), 231-255.

https://doi.org/10.17239/jowr-2021.13.02.02

Faigley, L., & Witte, S. (1981). Analyzing revision. College Composition and Communication, 32(4), 400–414. https://doi.org/10.2307/356602

Feltgen, Q., Cislaru, G., & Benzitoun, C. (2022). Étude linguistique et statistique des unités de performance écrite: le cas de et [Linguistic and statistical study of written performance units: the case of et]. 8e Congrès mondial de linguistique française, SHS Web of Conferences 138, 10001. https://doi.org/10.1051/shsconf/202213810001

Feltgen, Q., Lefeuvre, F., & Legallois, D. (2023). Sujet clitique et dynamique de l’écrit: un éclairage par les jets textuels [The clitic subject and the dynamics of writing: a look at textual bursts]. Discours. Revue de linguistique, psycholinguistique et informatique, 32.

https://doi.org/10.4000/discours.12509

Feltgen, Q, & Lefeuvre, F. (2025). Clitic subjects as landmarks in the writing production process: A study based on a keylog-derived corpus of writing bursts. Journal of Writing Research, 16(3), 433-460. https://doi.org/10.17239/jowr-2025.16.03.04

Fitzgerald, J. (1987). Research on Revision in Writing. Review of Educational Research, 57(4), 481–506. https://doi.org/10.2307/1170433

Foulin, J.-N. (1995). Pauses et débits: les indicateurs temporels de la production écrite [Pauses and flows: the temporal indicators of written production]. L’année psychologique, 95(3), 483–504. https://doi.org/10.3406/psy.1995.28844

Gardiner, A. H. (1922). The definition of the word and the sentence. British Journal of Psychology: General Section, 12(4), 352–361. https://doi.org/10.1111/j.2044-8295.1922. tb00067.x

Gilquin, G. (2020). In search of constructions in writing process data. Belgian Journal of Linguistics, 34(1), 99–109. https://doi.org/10.1075/bjl.00038.gil

Haas, C. (1989). How the writing medium shapes the writing process: Effects of word processing on planning. Research in the Teaching of English, 23(2), 181–207.

http://www.jstor.org/stable/40171409 https://doi.org/10.58680/rte198915523

Hayes, J. R. (2009). From idea to text. In R. Beard, D. Myhill, J. Riley, & M. Nystrand (Eds.), The SAGE handbook of writing development (pp. 65–79). SAGE.

https://doi.org/10.4135/9780857021069.n5

Hayes, J. R. (2012). Modeling and remodeling writing. Written Communication, 29(3), 369–388. https://doi.org/10.1177/0741088312451260

Immonen, S., & Mäkisalo, J. (2017). Pauses reflecting the processing of syntactic units in monolingual text production and translation. HERMES – Journal of Language and Communication in Business, 23(44), 45–61. https://doi.org/10.7146/hjlcb.v23i44.97266

Ivaska, I., Toropainen, O., & Lahtinen, S. (2025). Pauses during a writing process in two typologically different languages. Journal of Writing Research, 16(3), 405-431. https://doi.org/10.17239/jowr-2025.16.03.03

Johansson, V., Frid, J., & Wengelin, Å. (2018). ScriptLog – an experimental keystroke logging tool. In R. A. Alves & A. Camacho (Eds.), Proceedings of the 1st literacy summit (p. 51).

Kaufer, D. S., Hayes, J. R., & Flower, L. (1986). Composing written sentences. Research in the Teaching of English, 20(2), 121–140. https://www.jstor.org/stable/40171073 https://doi.org/10.58680/rte198615612

Kollberg, P. (1998). S-notation – a Computer Based Method for Studying and Representing Text Composition [Master’s thesis]. Kungliga Tekniska Högskolan.

Kollberg, P., & Severinson Eklundh, K. (2002). Studying writers’ revising patterns with S-notation analysis. In T. Olive & C. M. Levy (Eds.), Contemporary tools and techniques for studying writing (Vol. 10, pp. 89–104). Kluwer. https://doi.org/10.1007/978-94-010-0468-8_5

Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012). From character to word level: Enabling the linguistic analyses of Inputlog process data. In M. Piotrowski, C. Mahlow, & R. Dale (Eds.), Proceedings of the second workshop on computational linguistics and writing (CL&w 2012): Linguistic and cognitive aspects of document creation and document engineering (pp. 1–8). ACL. https://aclanthology.org/W12-0301/

Leijten, M., Van Horenbeeck, E., & Van Waes, L. (2019). Analysing keystroke logging data from a linguistic perspective. In E. Lindgren & K. Sullivan (Eds.), Observing writing (pp. 71–95). Brill. https://doi.org/10.1163/9789004392526_005

Leijten, M., Van Waes, L., & Van Horenbeeck, E. (2015). Analyzing writing process data: A linguistic perspective. In Writing(s) at the crossroads: The process-product interface (pp. 277–302). John Benjamins. https://doi.org/10.1075/z.194.14lei

Lindgren, E., Westum, A., Outakoski, H., & Sullivan, K. P. H. (2019). Revising at the leading edge: Shaping ideas or clearing up noise. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing writing (pp. 346–365). Brill. https://doi.org/10.1163/9789004392526_017

Lutz, J. A. (1983). A study of professional and experienced writers revising and editing at the computer and with pen and paper [PhD thesis]. Rensselaer Polytechnic Institute.

Mahlow, C. (2015). A definition of “version” for text production data and natural language document drafts. In G. Barabucci, U. M. Borghoff, A. Di Iorio, S. Maier, & E. Munson (Eds.), DChanges 2015: Proceedings of the 3rd international workshop on (document) changes: Modeling, detection, storage and visualization (pp. 27–32). ACM.

https://doi.org/10.1145/2881631.2881638

Mahlow, C., Ulasik, M. A., & Tuggener, D. (2024). Extraction of transforming sequences and sentence histories from writing process data: A first step towards linguistic modeling of writing. Reading and Writing, 37, 443–482. https://doi.org/10.1007/s11145-021-10234-6

Mahrer, R., & Zuccarino, G. (2025). Units of linguistic analysis in written production: From the case of enunciative interruptions. Journal of Writing Research, 16(3), 555-569. https://doi.org/10.17239/jowr-2025.16.03.07

Matsuhashi, A. (1981). Pausing and planning: The tempo of written discourse production. Research in the Teaching of English, 15(2), 113–134. https://doi.org/10.58680/rte198115773

Medimorec, S., & Risko, E. F. (2017). Pauses in written composition: On the importance of where writers pause. Reading and Writing, 30, 1267–1285. https://doi.org/10.1007/s11145-017-9723-7

Miletic, A., Benzitoun, C., Cislaru, G., & Herrera-Yanez, S. (2022). Pro-TEXT: An annotated corpus of keystroke logs. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1732–1739). European Language Resources Association. https://aclanthology.org/2022.lrec-1.184

Noreen, A. (1903). Vårt språk: nysvensk grammatik i utförlig framställning [Our language: the new Swedish grammar presented in detail] (Vol. 1). Gleerup.

Olive, T. (2012). Writing and working memory: A summary of theories and of findings. In E. L. Grigorenko, E. Mambrino, & D. D. Preiss (Eds.), Writing: A mosaic of new perspectives (pp. 120–136). Psychology Press. https://doi.org/10.4324/9780203808481

Olive, T., & Cislaru, G. (2015). Linguistic forms at the process-product interface: Analysing the linguistic content of bursts of production. In G. Cislaru (Ed.), Writing(s) at the crossroads (pp. 99–124). John Benjamins. https://doi.org/10.1075/z.194.06oli

Piolat, A. (1991). Effects of word processing on text revision. Language and Education, 5(4), 255–272. http://cogprints.org/3621/ https://doi.org/10.1080/09500789109541314

Serbina, T., Hintzen, S., Niemietz, P., & Neumann, S. (2017). Changes of word class during translation – insights from a combined analysis of corpus, keystroke logging and eye-tracking data. In S. Hansen-Schirra, O. Czulo, & S. Hofmann (Eds.), Empirical modelling of translation and interpreting (pp. 177–208). Language Science Press.

https://doi.org/10.5281/zenodo.1090968

Severinson Eklundh, K. (1994). Linear and nonlinear strategies in computer-based writing. Computers and Composition, 11(3), 203–216. https://doi.org/10.1016/8755-4615(94)90013-2

Ulasik, M. A., & Miletić, A. (2024). Automated extraction and analysis of sentences under production: A theoretical framework and its evaluation. Languages, 9(3), 71. https://doi.org/10.3390/languages9030071

Van Waes, L., & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writing fluency applied to L1 and L2. Computers and Composition, 38, 79–95. https://doi.org/https://doi.org/10.1016/j.compcom.2015.09.012

Weizenbaum, J. (1976). Computer power and human reason: From judgment to calculation. W. H. Freeman & Co.

Published

2025-02-05

Issue

Section

Articles

How to Cite

Sentence-centric modeling of the writing process. (2025). Journal of Writing Research, 16(3), 463-498. https://doi.org/10.17239/jowr-2025.16.03.05

Similar Articles

1-10 of 251

You may also start an advanced similarity search for this article.