How Prior Information from National Assessments can be used when Designing Experimental Studies without a Control Group
DOI:
https://doi.org/10.17239/jowr-2023.14.03.05Keywords:
Prior information, Baseline comparison, Bayesian inferenceAbstract
National assessments yield a description of the proficiency level in a domain while accounting for differences between tasks. For instance, in writing assessments the level of proficiency is typically evaluated with a variety of topics and multiple tasks. This enables generalizations from specific tasks to a domain. In (quasi-)experimental research, however, writing skills are often evaluated with a single task. Yet, conclusions about the effectiveness of the treatment are formulated on the level of the domain, which is, euphemistically put, quite a stretch. Although conclusions drawn about the effect of the treatment are specific to the task administered, they are often generalized to the domain without any form of reservation. This raises the question whether we can use the results of national assessments about differences between tasks in the analyses of experimental studies. In this paper, we demonstrate how the information of a baseline data set can be used as a kind of control condition in the analysis of an experimental study.
References
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-.-J.,Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D.,Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A.,Easwaran, K., Efferson, C., . . . Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10. https://doi.org/10.1038/s41562-017-0189-z
Blok, H. (1986). Essay rating by the comparison method. Tijdschrift voor onderwijsresearch, 11 (4), 169–176.
Bouwer, R., Koster, K., & van den Bergh, H. (2017). Leren schrijven met tekster: Een wetenschappelijk beproefde lesmethode voor het basisonderwijs [Learning to write with tekster: A scientifically proven teaching method for elementary schools]. Pedagogische studiën, 94 (4), 304–329.
Bouwer, R., Béguin, A., Sanders, T., & Van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32 (1), 83–100.
https://doi.org/10.1177/0265532214542994
Braun, H., Kirsch, I., & Yamamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th-grade naep reading assessment. Teachers college record, 113 (11), 2309–2344. https://doi.org/10.1177/016146811111301101
Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80 (1), 1–28. https://doi.org/10.18637/jss.v080.i01
Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological bulletin, 54 (4), 297. https://doi.org/10.1037/h0040950
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of statistical software, 76 (1). https://doi.org/10.18637/jss.v076.i01
De Smedt, F., Van Keer, H., & Merchie, E. (2016). Student, teacher and class-level correlates of flemish late elementary school childrens writing performance. Reading and writing, 29 (5), 833–868. https://doi.org/10.1007/s11145-015-9590-z
Efron, B., & Morris, C. (1977). Stein’s paradox in statistics. Scientific American, 236, 119–127.
Fan, X. (2001). Statistical significance and effect size in education research: Two sides of a coin. The Journal of Educational Research, 94 (5), 275–282.
https://doi.org/10.1080/00220670109598763
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). Bayesian analysis, 1 (3), 515–534.
Graham, S. E., & Harris, K. R. (2014). Conducting high quality writing intervention research: Twelve recommendations. Journal of Writing Research, 6 (2), 89–123. https://doi.org/10.17239/jowr-2014.06.02.1
Hojat, M., & Xu, G. (2004). A visitor’s guide to effect sizes–statistical significance versus practical (clinical) importance of research findings. Advances in health sciences education, 9 (3), 241–249. https://doi.org/10.1023/B:AHSE.0000038173.00909.f6
Klugkist, I., Kato, B., & Hoijtink, H. (2005). Bayesian model selection using encompassing priors. Statistica Neerlandica, 59 (1), 57–69. https://doi.org/10.1111/j.1467-9574.2005.00279.x
Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6 (3), 299–312.
https://doi.org/10.1177/1745691611406925
Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science, 8 (4), 355–362.
https://doi.org/10.1177/1948550617697177
Lynch, S. M. (2007). Introduction to applied Bayesian statistics and estimation for social scientists. Springer.
McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon statistical significance. The American Statistician, 73, 235–245.
https://doi.org/10.1080/00031305.2018.1527253
Morey, R. D., & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16, 406–419. https://doi.org/10.1037/a0024377
R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.
Rietdijk, S., Janssen, T., van Weijen, D., van den Bergh, H., & Rijlaarsdam, G. (2017). Improving writing in primary schools through a comprehensive writing program. The Journal of Writing Research, 9 (2), 173–225. https://doi.org/10.17239/jowr-2017.09.02.04
Rijlaarsdam, G., Van den Bergh, H., & Zwarts, M. (1992). Incidentele transfer bij produktieve taalopdrachten: Een aanzet tot een baseline [Incidental transfer on productive language tasks: An initiation for a baseline.] Tijdschrift voor Onderwijsresearch, 17, 55–66.
Rijlaarsdam, G., Van den Bergh, H., Couzijn, M., Janssen, T., Braaksma, M., Tillema, M., Graham, S., Bus, A., Major, S., & Swanson, L. (2012). Writing. In K. R. Harris, S. E. Graham, T. E. Urdan, A. G. Bus, S. E. Major, & H. Swanson (Eds.), APA educational psychology handbook, Vol 3: Application to learning and teaching. (pp. 189–227). American Psychological Association. https://doi.org/https://doi.org/10.1037/13275-000
Van den Bergh, H., De Maeyer, S., Van Weijen, D., & Tillema, M. (2012). Generalizability of text quality scores. Measuring writing: Recent insights into theory, methodology and practices, 27, 23–32. https://doi.org/10.1163/9789004248489_003
Van den Bergh, H., & Eiting, M. H. (1989). A method of estimating rater reliability.
Journal of Educational Measurement, 26 (1), 29–40. https://doi.org/10.1111/j.1745-3984.1989. tb00316.x
Vandekerckhove, J., Rouder, J. N., & Kruschke, J. K. (Eds.). (2018). Editorial: Bayesian methods for advancing psychological science. Psychonomic Bulletin & Review, 25, 1–4. https://doi.org/10.3758/s13423-018-1443-8
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: an improved R for assessing convergence of MCMC (with discussion). Bayesian analysis, 16(2), 667-718. https://doi.org/10.1214/20-ba1221
Zwarts, M., Rijlaarsdam, G., Janssens, F., Wolfhagen, I., Veldhuijzen, N., & Wesdorp, H. (1990). Balans van het taalonderwijs aan het einde van de basisschool [Balance of language teaching at the end of the elementary school]. Uitkomsten van de eerste taalpeiling einde basisonderwijs. https://doi.org/10.1163/2214-8264_dutchpamphlets-kb2-kb29970
Published
Issue
Section
License
Copyright (c) 2022 Don Van den Bergh, Nina Vandermeulen, Marije Lesterhuis, Sven De Maeyer, Elke Van Steendam, Gert Rijlaarsdam, Huub Van den Bergh
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 Unported License.