Torgerson C (2025). When was randomisation first used in educational research? A brief historical methodological perspective

© Carole Torgerson. Email:carole.torgerson@york.ac.uk


Cite as: Torgerson C (2025). When was randomisation first used in educational research? A brief historical methodological perspective JLL Bulletin: Commentaries on the history of treatment evaluation (https://www.jameslindlibrary.org/articles/when-was-randomisation-first-used-in-educational-research-a-brief-historical-methodological-perspective/)


Introduction

This brief note aims to add to the debate in the literature around the first use of randomisation in educational research, specifically in relation to previously published articles by Forsetlund and colleagues (2007) and Hedges and Schauer (2018) in which they claim the first use of randomisation was in 1931 or 1919. This historical, methodological perspective analyses in detail the 1919 contender for the first use of randomisation by Hedges and Schauer (published in Styles and Torgerson (eds.) 2018). This is the monograph by Cummings: Improvement and the Distribution of Practice (Cummings 1919).

In summary, early milestones in the history of randomised trials in education include possibly the first pragmatic experiment, by Pearson (1911); possibly, the first randomised experiment, by Cummings (1919); probably the first use of alternation, by Remmers (1928); and probably the first two randomised experiments, by Walters (1931 and 1932). Each of these are described below.

Pearson

In 1911, Henry C. Pearson undertook possibly the first pragmatic experiment in education (Pearson 1911). Torgerson and Torgerson uncovered this study, which used a ‘comparative’ type of experimentation, whilst researching for their article on pragmatic experimentation in educational research (Torgerson and Torgerson 2007). It was a controlled experiment in the teaching and learning of spelling of homonyms and recognised the need for a comparative type of experimentation to reflect real world practice. The chief defect of the laboratory or explanatory experiment according to Pearson was that it ‘isolates from its natural setting the issue to be tested’ and should instead be ‘surrounded by the normal accompaniments of its classroom situation’ (Pearson 1911). His experiment aimed to determine which of two methods was more effective in teaching homonyms, by pre-testing children in two classrooms from each of five grade levels (grades III to VII; ten classrooms in all) and then assigning one class of each grade to one of the two methods (teaching homonyms side by side or with an interval of three to four days), followed by post-testing of all children. The homonyms selected for the interventions were increasingly difficult as the age of the children increased. This was a cluster trial (Moberg and Kramer 2015), although it was analysed at the level of the pupil, except in the case of the data for Grade V where the mean, average and average deviation for each class were presented. Although this analysis only presented these values for two classes (two clusters with one class in each cluster, i.e., a two-class experiment), it gave a foretaste of the appropriate analysis for cluster randomised trials by using cluster level data. This method was described in more detail by Lindquist (1940) in his textbook discussing an appropriate analytical technique for data generated by the use of cluster randomisation, which is now largely the norm for educational experiments. It also appears that Pearson used change scores to make the comparison between the two classes in each grade in terms of a) average decrease of errors between pre- and post-test b) average improvement sub-divided by number of initial errors, and c) sub-divided by ‘good’ and ‘bad’ spellers. There is no evidence in this article that randomisation was used to allocate the two intervention classes to condition within each grade. However, despite this limitation, strengths include the fact that Pearson acknowledged the small sample size of the experiment (although he stated that if the experiment were to be replicated with a larger sample, he did not think this would change the results); teacher effect was acknowledged; and a replication trial was undertaken and reported in the same article.

Remmers and Walters

In 2007, Forsetlund and colleagues reported their systematic search of the literature between 1867 and 1948 for the first reported use of randomisation in social and educational studies (Forsetlund et al. 2007). They identified nine potentially eligible studies, including the earliest study by Remmers (1928), and subsequently argued for Jack Walters’ counselling pilot experiment of 1931 and his replication trial of the following year as strong contenders for the first randomised trials in education. This was due to the experiment by Remmers (1928) having clearly used alternation and not randomisation as the method of allocation. Walters, on the other hand, clearly stated that he used random allocation for the pilot and replication experiments (see below).

Cummings

Robert Alexander Cummings’ monograph Improvement and the distribution of practice was published in 1919 (Cummings 1919). It reports a series of pragmatic experiments undertaken at the teachers’ college of Columbia University, USA comparing practice periods Equal in length and practice periods Reducing in length with children and adults in authentic settings in various subjects, e.g., mathematics, history. One of these experiments – the Ohio experiment – was uncovered by Larry Hedges and Jake Schauer in their article Randomised trials in the USA (Hedges and Schauer 2018). This was one of six articles on methodological debates, questions and challenges about randomised trials that were published in the 2018 Special Issue of the Education Research Journal (Styles and Torgerson (eds). 2018). Hedges and Schauer give a succinct overview of the history of educational experiments, from the first recorded randomised trial through five historical periods to the state of the art in the 21st century.

The first of Hedges and Schauer’s five historical periods covered the period of Pearson’s, Cummings’ and Walters’ early experiments – all of which were pragmatic – although most controlled trials in this period were laboratory experiments. In their article, Hedges and Schauer introduced several earlier studies undertaken by Cummings as contenders for the first randomised trial in education (in 1919), which was more than ten years’ earlier than the first Walters’ studies were published in the early 1930s.

In Cummings’ 1919 monograph, there are three pieces of evidence about the method of allocation to intervention and control groups. The first mention of random selection is on page 51 as cited in Hedges and Shauer’s article ‘The factor of the teacher was equalized by a random selection of the classes which made up the two groups’. This could mean that the classes were randomized, or the teachers were randomized to the classes, or the classes in the sample were a random sample of classes from the schools. Cummings then discusses the issue of lack of baseline balance in the factor of previous training and states ‘the random method of selecting the pupils would tend to favour one group as much as the other’. This seems to suggest that the pupils were randomised (which seems unlikely), but it could mean the pupils were a random sample of the pupils in the school because the classes were a random sample of the classes in the school (see previous quotation). However, on the previous page is evidence not cited in Hedges and Schauer’s article but, by inference, is the information used to lead to their conclusion that randomisation was at the level of school ‘The Equal and Reducing groups were made up from the pupils of the seven villages as follows: The Equal group included all the classes at Rocky Ridge, Lakeside, and Greenwich, and grades 3, 5 and 7 from Oak Harbor. The Reducing group included grades 6 and 8 from Oak Harbor and all the classes at Elmore, Waterville, and Weston’. This could mean that randomisation was at the level of school (although the word random is not used here) and only six of the seven schools could have been randomised at this level – the other school was randomised by class. Thus, we cannot be sure that this is a randomised trial.

In another of Cummings’ experiments, he uses the phrase pupils ‘taken at random as ours were’ to justify the assumption that one factor (previous practice) at baseline was equalized. However, it is not clear whether this is randomisation of pupils or of classes. Also, if this was randomisation of classes the fact that all but one of the classes are in alphabetical order suggests that this was not randomisation.

As an aside, in these early controlled experiments Cummings appears not to have used correct methods of analysis. On page 12, we learn that in the Lyndhurst experiments, intention to treat (ITT) analysis (the correct analysis) (Chalmers et al. 2023a, 2023b) was not used. In his experiment, Cummings took two groups unequal on one factor and used a method of removing outliers to equalize the groups on this factor (initial ability) and he called this the ‘pairing off method’.

Summary and conclusions

It is likely that randomisation was used in educational research more widely than suggested by the references cited here. However, it is difficult to identify published research that unambiguously states randomisation was used to produce equivalent groups. In Linquist’s book, some of the language used implies that randomisation was widely used; however, there is little evidence to support that contention. Early trials tended to use the phrase ‘random sampling’ to mean randomisation, which may mean some studies have been overlooked. For instance, Walters (1931) states ‘Freshman were divided into two groups by random sampling’ rather than using current terminology where we might say ‘Freshman were randomised into two groups’. It is likely that Walters did use randomisation to form the two groups as the population of interest (‘delinquent’ Freshman) were specified and the whole population was allocated using random sampling, thereby producing two equivalent groups. The Cummings’ terminology, whilst it could have been referring to randomisation, left sufficient doubt in the description to mean it is likely to be a quasi-randomised trial. Nevertheless, the literature cited in this paper shows that the design of modern trials owes a great deal to these pioneers of controlled experiments. The use of alternation, whilst somewhat frowned upon today, is a legitimate technique to form comparable groups if properly used; however, experience shows that its implementation is often sub-optimal, which has the potential to introduce selection bias. Pearson’s desire to undertake an experiment in classrooms rather than in the confines of a laboratory paved the way for large pragmatic experiments, or field trials, replicating the real-life educational experience of students and teachers to become the norm of educational experiments today. To conclude, many educational researchers were exploring techniques to reduce the role of bias on the interpretation of the results of experimental research in the first half of the 20th Century.

References

Chalmers I, Matthews R, Glasziou P, Boutron I, Armitage P (2023). Trial analysis by treatment allocated or by treatment received? Origins of ‘the intention-to-treat principle’ to reduce allocation bias: Part 1. Journal of the Royal Society of Medicine 116(10):343-350.

Chalmers I, Matthews R, Glasziou P, Boutron I, Armitage P (2023). Trial analysis by treatment allocated or by treatment received? Origins of ‘the intention-to-treat principle’ to reduce allocation bias: Part 2. Journal of the Royal Society of Medicine 116(11):386-394.

Cummings RA (1919). Improvement and the distribution of practice. New York: Teachers College, Columbia University. Available from: https://www.loc.gov/resource/gdcmassbookdig.improvementdistr01cumm/?st=gallery [Accessed 5th August 2025]

Forsetlund L, Chalmers I, Bjorndal A (2007). When was random allocation first used to generate comparison groups in experiments to assess the effects of social interventions? Economics of Innovation and New Technology 16(5):371-384.

Hedges L, Schauer J (2018). Randomised trials in education in the USA. In Styles B, Torgerson C. Randomised controlled trials (RCTs) in education research – methodological debates, questions, challenges, Educational Research 60(3):265-275.

Lindquist EF. (1940) Statistical Analysis in Educational Research. Boston: Houghton-Mifflin. https://www.jameslindlibrary.org/lindquist-ef-1940/

Moberg J, Kramer M (2015). A brief history of the cluster randomised trial design. Journal of the Royal Society of Medicine 108(5):192-198. https://www.jameslindlibrary.org/articles/a-brief-history-of-the-cluster-randomized-trial-design/

Pearson HC (1911). The Scientific Study of the Teaching of Spelling. Journal of Educational Psychology 2:241–252.

Remmers HH (1928). A Diagnostic and Remedial Study of Potentially and Actually Failing Students at Purdue University. Bulletin of Purdue University: Studies in Higher Education, IX, 28, No.12.

Styles B, Torgerson C (2018). Randomised controlled trials (RCTs) in education research – methodological debates, questions, challenges. Educational Research 60(3):255-264.

Torgerson CJ, Torgerson DJ (2007). The need for Pragmatic Experimentation in Educational Research. Economics of Innovation and New Technology 16(5):323-330

Walters JE (1931). Seniors as Counselors, The Journal of Higher Education 2(8):446-448.

Walters JE (1932). Measuring effectiveness of personnel counseling, Personnel Journal 11(4):227-236.