By: Rachel Brown, Ph.D., NCSP
In order to know if a student is making progress toward specific learning goals, a comparison with some type of standard or benchmark of success is needed. The term benchmark is widely used in education to indicate grade-level learning goals for all students. These benchmarks are scores on certain assessments that have been validated through research to predict that a student will meet later learning goals. This blog will explain how benchmark scores are developed, what they predict, and how to use them to provide additional instruction for students at risk of not meeting benchmark goals.
Benchmark Goal Development
Benchmark goals are developed from data sets containing information about student learning at different grades and over time. Benchmarks are typically based on two types of data: concurrent and predictive validity. Concurrent validity refers to how well a certain assessment provides information that is similar to other assessments of the same skill. Predictive validity refers to how well a certain assessment predicts a student’s future performance on a different assessment of the same skill.
Concurrent Validity. In order to determine whether an assessment measures students’ skills as well as another assessment given at the same time, both assessments are given and the scores are compared. When students’ scores on both assessments are very similar they are highly correlated. Such correlations are indicators of concurrent validity.
Predictive Validity. Another way to examine the accuracy of a new assessment is to see how well it predicts performance on a later assessment. For example, students could complete the new assessment in the Fall and then the established assessment in the Spring. If the Fall assessment provides data that predicts how students perform on the later spring assessment, the measure can be understood to have predictive validity.
Both concurrent and predictive validity are used as indicators of what an assessment measures. With valid evidence in hand, a test publisher can then establish norms. Norms are created from the scores of large numbers of students who have completed the assessment. Typically, norms are drawn from students from diverse backgrounds that represent the total population of students at each grade level that the assessment can be used. The norms can be organized into score ranges and these ranges are used to develop benchmarks. First, the scores in the norms are rank ordered for each grade level of students. Then, percentile rankings are applied. Finally, the scores corresponding to specific percentile ranks are identified.
What Benchmarks Predict
The percentiles selected for specific score ranges are often based on the data about the measure’s predictive validity. For example, researchers will identify what score a student needs to earn in the fall in order to have a strong likelihood of meeting the grade level goal (e.g., proficiency) in the spring of the same school year. In the FastBridge system, scores at the 40th and 15th percentile ranks have been identified as the default benchmark levels. The resulting score ranges are described in relation to different levels of risk.
Low risk. Students whose screening score is at or above the 40th percentile on FastBridge assessments are very likely to meet later learning goals. In this regard, such students are described as being at low risk of learning problems. Keep in mind that there is no such thing as no risk because many factors influence a student’s school performance and no one assessment can predict with 100% certainty if a student will succeed in the future.
Some risk. This descriptor applies to students with scores between the 15th and 40th percentiles. Such scores are predictive of some current or later learning difficulty. Students identified as being at some risk are likely not to meet a later learning goal unless an instruction is changed in some way. Such changes could include additional lesson time or a different type of lesson.
High risk. Students who score below the 15th percentile on FastBridge assessments are likely to be at high risk of current and later learning difficulties. These students are very unlikely to meet later learning goals unless intensive instructional support is provided immediately. In some cases, students whose scores are in the high-risk range will not be able to catch up and meet grade-level learning goals in the current school year, but they can catch up if intensive intervention is provided over a longer period of time.
It is worth noting that school districts using the FastBridge system can select different benchmark levels than the above defaults. Such a change must be made by the FastBridge District Manager. If changed, the benchmark levels set by the District Manager will be applied to all schools and students in the district for the entire school year.
Using Benchmarks to Help Students
Benchmark scores are most commonly used in relation to interpreting universal screening data. Indeed, in some schools, such screening is called “benchmark screening.” Once screening data for all students are available, school teams can examine the Group Screening Report and learn which students scored at each risk level: Low, Some, or High. Knowing each student’s benchmark score level is the first step in providing effective support for all students. The next step is to confirm that the screening scores are accurate. This is most easily done by having one or more teachers who know each student review the scores and confirm whether they make sense. Most of the time, the screening scores will be consistent with the student’s classroom performance. If a student’s score does not make sense to a teacher, first check to see if there was a data entry error. If the score is confirmed to be the student’s actual score but still does not make sense, it is best to re-screen the student to learn if that score is reliable. Instructional decision-making should be done only after each student’s score has been confirmed as accurate.
Benchmark indicators are helpful for grouping students for different levels of support. Students whose scores are above the 40th percentile — or district-set low-risk level — usually do not need additional instruction. They should continue with the general curriculum. Students whose scores are between the 15th and 40th percentiles are likely to benefit from some level of support. The team will need to consider factors such as what types of errors the students made in order to determine what instructional groups and lessons make sense. Ideally, students with similar learning needs will be grouped together in small groups for additional instruction.
Students whose scores are below the 15th percentile usually need more intensive support and sometimes it might need to be provided individually, especially if no other students require the same type of instruction. The most effective way to support students whose scores are in the high-risk range is to provide additional instruction that is direct and systematic. For example, fourth graders whose screening scores are below the 15th percentile and indicate that they do not yet know all addition facts should be provided with immediate intensive intervention to learn addition facts.
It is very important to keep in mind that screening scores and benchmark levels should always be used as a starting point for conversations about student performance. They should never be used in isolation as the only source of information when making instructional decisions. Instead, additional indicators of students’ skills such as classroom work, homework, scores from other tests, and teacher judgment must also be taken into consideration. All data have some amount of error and the only way to mitigate error is to compare multiple sources of information.
Summary
Benchmarks are a widely used system for understanding student performance in schools. Benchmarks are score ranges that correspond to predicted student performance on later assessments. They are based on research about the accuracy of an assessment to indicate student learning in a specific skill area. Illuminate Education has established default benchmark levels at the 15th and 40th percentile ranks based on a national database of student scores. School districts can select different benchmark levels to be applied to all students over an entire school year. Screening scores and benchmark levels should always be used as a starting point for conversations about individual student learning needs and in conjunction with other sources of information about student performance.
Dr. Rachel Brown is Illuminate Education's Senior Academic Officer. She previously served as an Associate Professor of Educational Psychology at the University of Southern Maine. Her research focuses on effective academic assessment and intervention, including multi-tier systems of support, and she has authored several books on Response to Intervention and MTSS.