Dr. Theodore J. Christ and the research and development team at FastBridge consulted with Dr. David J. Weiss about the recommended size and quality of an item bank for computer adaptive tests (CAT). Dr. David J. Weiss is an internationally recognized expert on computer adaptive testing. He consults with national and state testing programs. He was the academic advisor of the founding psychometrician for MAP and STAR programs. He also advised and contributed to the development of aReading and aMath, which emerged from 30 years of his research and more than a decade-long specific program of study on those measures. As noted by Weiss:
"That is one of the most common questions I get. It is more about quality than quantity. If you have a 30 item test, then you will use only 30 items...so if your bank is 3,000 or 30,000 it does not matter, if your bank has good quality items. Focus on the precision of measurement. Also, a three parameter model often provides more precision with fewer items as compared to the Rasch model, which assumes all items have medium discrimination. The three parameter model used in aReading and aMath ensures that your higher quality items are retained in the bank and are used. It is about the quality of the item bank and not the size of the bank." (D. Weiss, personal communication, July 12, 2016).
FastBridge Item Banks for Computer Adaptive Tests
FastBridge publishes two CATs: aReading and aMath. Each of these assessments have in excess of 2,500 items with 1,000s of items in routine field testing. The field items are used to continuously update, rotate, and optimize the item banks. Items are routinely deactivated and replaced with refreshed items. FastBridge Learning’s goal is not to create an excessively large item bank, but rather to maintain a high-quality item bank that provides an efficient and precise measure of student achievement.
Although the aReading and aMath item banks are large, some students may receive similar items or the same items in variable sequences. This occurs when students are at similar grade and achievement levels. That is a good thing because the best items for those students are chosen consistently.
Consistent with the guidance from Dr. Weiss, there are no agreed upon criteria for the size of an item bank. However, there are some published recommendations that can help users understand why aReading and aMath have item banks of the size FastBridge Learning chooses to maintain. Published recommendations in highly regarded and peer-referred publications indicate that the banks should support 5 to 12 alternate test forms with non-overlapping items at each grade level (Brennan, 2006; Davey & Pitoniak, 2006; Stocking, 1994; Way, 1998). The banks for each aReading and aMath meet or exceed those standards.
Quality of aMath and aReading Item Banks
Multiple studies reporting the empirical basis for aReading and aMath have been conducted and presented at national conferences (e.g., Van Norman, Kiss, Newell & Christ, 2014) and within the technical materials for aReading and aMath. For example, sub-sample data from 2009 to 2013 were analyzed (N = 22,800 students) to estimate the precision of measurement. Hybrid simulations found that a 20-item CAT yielded precise estimates, which are comparable to those observed and reported by other vendors (SEM < .20 logits). The average standard error of measurement (SEM) for a 30-item fixed length test was .11 for aReading and .16 for aMath as reported on the logit IRT scale, which is common across vendors. These results were observed prior to the expansion of the current item banks, which at that time had approximately 1,500 items per assessment and have since been refreshed, rotated, and expanded.
Item discrimination: Quality (not quantity) matters. Available evidence indicates that aReading and aMath have highly discriminating items. Over 85% of aReading and aMath items have item discrimination above 1.00. Other vendors that use a one-parameter Rasch model have discriminations of 1.00 for all items. This finding indicates that the underlying measurement model used for aReading and aMath is superior to those of other vendors. Lower quality, less-discriminating items require longer tests with more items. Higher quality items are superior for efficient screening.
Davey, T., & Pitoniak, M. J. (2006). Designing computerize adaptive tests. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of Test Development (pp. 543 –574). Lawrence Erlbaum Associates.
Stocking, M. L. (1994). Three practical issues for modern adaptive testing item pools (ETS RR-94-5). Educational Testing Services.
Van Norman, E. R., Kiss, A. J., Newell, K. W., & Christ, T. J. (2014, February). The effect of test length on computer adaptive test results. Poster presented at the meeting of the National Association of School Psychologists, Washington, DC.
Way, W. D. (1998). Protecting the integrity of computerize testing item pools. Educational Measurement: Issues and Practice, 17 – 27.
Weiss, D. J. (2011). Item banking, test development, and test delivery. In K. F. Geisinger, The APA Handbook on Testing and Assessment. American Psychological Association.