We track anonymous visitor behavior on our website to ensure you have a great experience. Learn more about our Privacy Policy.


Virtual Comparison Groups: Understanding What Worked, How, and Why

Overhead shot of student typing on laptop while sitting on the floor

Saro Mohammed and Mike Fauteux

The Learning Accelerator, Leadership Public Schools

During the 2016-2017 school year, TLA worked with Leadership Public Schools (LPS) to help them better understand which parts of their Navigate Math personalized learning initiative were working and worth scaling to other schools. You can read more about that researcher-practitioner partnership here.

Part of this process involved LPS asking NWEA for an analysis of their Navigate Math MAP Growth scores using a virtual comparison group. Getting the analysis was easy enough, but understanding the report they got back wasn’t as straightforward. In this Insight, we outline some of the approaches we used to glean actionable insights from the data used in this report.

Can a report like this tell us if something "works?"

Mostly, yes. Research has shown that if a well-matched comparison group is used in an analysis, the decisions that we make based on the analysis are likely to be the same as if we had done a full Randomized Control Trial (RCT). An RCT minimizes potential bias in our results, and a well-matched comparison group reduces potential bias almost as well.

Finding as “focal” and “local” a group as possible is a critical step in ensuring the comparison group is “well-matched”. In other words, a virtual match is a good match if:

  1. The matching criteria are highly related to the outcome of interest, and
  2. The matching criteria include contextual characteristics that are likely to be highly related to the outcome of interest.

In this case, the virtual comparison group was generated by combing NWEA’s national dataset for all students who took the Math 6+ MAP Growth tests during the same semesters as the Navigate Math students. From this national set of students, a pool of students who were attending schools that had the same eligibility rate for free/reduced price lunch and the same urbanicity as LPS was generated. For each Navigate Math student, a match group of non-Navigate Math students who started off with a very similar RIT math score and who had the same amount of instructional time between their beginning and ending MAP tests was formed.


Notes: In the above fictional illustration, “length of instruction” is reported in months.
Rows highlighted grey show the LPS Navigate Math students for whom a virtual match group is being formed.
Rows highlighted green show the non-Navigate Math students who are included in the virtual match group.
Bolded numbers highlight differences between the virtual match student characteristics and the characteristics of the Navigate Math student for whom the match group is being formed.

Instead of a single average effect across all the Navigate Math students, this report allows us to see if Navigate Math is working differently for different subgroups of students. This information is contained in the flexible groups that are generated by matching students within subgroups based on different criteria -- like beginning RIT score. For example, we can see if students who were struggling at the beginning of the year (with low RIT scores) improved more or less over the instructional period compared to their matched group than those who were performing typically at the beginning of the year or those who were excelling at the beginning of the year.

What are the “effects” this report documents?

There are a few ways we can define improvement in math, and each of these definitions is dependent on the measure we’re using. For this particular project, the MAP assessment was used to determine students’ math outcomes. In the report, we focused on three types of growth:

  1. Absolute or “raw” RIT score growth from the initial to the final MAP test. This is the simple difference between the ending RIT score and the beginning RIT score. For the RIT score, a difference of 1 point represents the same growth across the scale (i.e., growth of 10 points from beginning to end is comparable for low-performing, average-performing, and high-performing students). The average growth is provided both for Navigate Math students as well as for the virtual comparison groups. Thus, the difference in growth between Navigate Math students and the comparison groups can be compared across all subgroups.
  2. The proportion of Navigate Math students who grew at least as much as the average growth in the virtual comparison groups. This tells us how many of the Navigate Math students were achieving more growth than should be expected if they were not in Navigate Math.
  3. The difference in growth between Navigate Math and comparison students, accounting for differences in the variation or spread of scores within the virtual comparison group. Every time a sample of scores is drawn from a population, the resulting spread of scores can differ from the the spread in the population. This can lead to calculated differences in growth that are larger or smaller than the underlying difference in the population itself. Calculating an effect size involves correcting the difference between two groups for the “typical” spread in the samples (in this case, in the virtual comparison groups), in order to standardize the effect size and enable us to compare (and, in some cases, combine) effect sizes from different samples.

What did we find?

As always, care must be taken not to overinterpret results. That being said, the overall results from the LPS Navigate Math virtual comparison group analysis were exciting and aligned with trends from previous years. Overall, Navigate Math students grew more than their matched comparison counterparts. In cases where Navigate Math students did not exceed virtual comparison group students’ growth, their growth was statistically equal to the comparison group.


Looking at specific subgroups, positive growth was found largely for students whose beginning RIT scores were below average, and equal growth (statistically the same as the comparison group) was generally found for students whose beginning RIT scores were at or above average. Similarly, positive growth was found for all students of minority ethnicities with the one exception of Asian/Pacific Islander students, whose growth was equal to the virtual match group.


The numbers included in the charts above are the effect sizes, which ranged from -0.12 for the students with the highest beginning scores to 0.88 for Black students. Most of the effect sizes were clustered between 0.45 and 0.55, which we interpreted as moderate, important effects.

Across the whole group and subgroups, at least 64 percent of Navigate Math students met or exceeded the average growth in the virtual comparison groups with one exception. For the students starting off with the highest beginning RIT scores (those in the 80th percentile or higher), 50 percent of them met or exceeded the average growth in the group of matched students who also had the highest (80th percentile or greater) beginning RIT scores. Put another way, the same proportion of Navigate Math students with the highest beginning scores had the same average growth as their matched virtual comparison group.

What does this mean?

In order to figure out the “how” and “why” from the findings above, we need to consider how Navigate Math itself was implemented and consider the sample of LPS students it was implemented with.

Until this year, LPS was able to compare their own average growth to the national average on the same test (one year’s growth in one year). Historically, Navigate Math students have shown an average of 2.5 years’ growth each year, a trend which was confirmed again this year in this report.

Another interesting bit of context is that a broad range of teachers have implemented the program over the years. Navigate Math was intentionally designed to be flexible so that new teachers -- or those new to blended learning -- could access it and start to change their practice. This year, both teachers at one campus were new to teaching. Even in their first few years as teachers, they were able to implement Navigate Math in ways that produced results that matched the national average.

LPS has an above-average number of students who are English learners (18%). They also have a sizeable proportion of students who are eligible for Special Education services (8%). This year at one campus, LPS also experienced an influx of newcomer English learners, as well as students who were experiencing more severe trauma than in previous years. The above report does not provide effects specifically for those subgroups (although it does provide effects by campus -- so that one campus with unique circumstances could be isolated to see if effects differed there). However, it may be the case that larger groups of those students are enrolled in Navigate Math at different campuses. If this is the case, the provided breakdown of effects by school may shed some light there.

Similarly, LPS may be interested in digging deeper into why effects are not equally positive for students with average and high beginning scores. Navigate Math was designed for students who have not yet mastered math skills typically acquired in 2nd - 7th grade and offers components around social-emotional learning and 21st-century skills. It is probable that students beginning the course with average or above-average RIT scores are exposed to less new material during the course because they have already demonstrated mastery of more math skills than students beginning the course with below-average scores. The finding reported here led LPS to consider the alignment of course content and placement with student need. Among the many options on the table, LPS may choose to alter the Navigate Math curriculum to allow students access to additional content when they have mastered the existing content, and/or change the placement criteria for Navigate Math so that students with above-average beginning RIT scores are placed in a different course altogether.

Overall, when looking at a report like this through the lens of learning to inform and improve practice, the need for a randomized control trial is less important than the need for accurate, timely, and relevant information. Therefore, we focused less on statistical significance and large “enough” sample sizes, and more on trends that could help up determine what we should consider doing differently and what we should consider scaling to other schools. Ideally, we would be able to look at data from multiple semesters or courses to identify trends that hold over time, but that would require the trade-off of not changing anything about the program from year to year and potentially missing opportunities to improve the learning experience for new students.

Being able to have an analysis like this one (with a comparison group that contextualizes the sizes of effects as well as subgroup comparisons) provides educators with an additional tool for understanding what is working, as well as how and why, so that we can continue to do better for all students.

Overhead shot of student typing on laptop while sitting on the floor

Saro Mohammed and Mike Fauteux

The Learning Accelerator, Leadership Public Schools

Saro Mohammed is a Partner at The Learning Accelerator.

Mike Fauteux is the Director of Innovation at Leadership Public Schools, which includes LPS Richmond. There he manages personalized learning innovations and partnership development.