Preface |
|
xxi | |
Contributors |
|
xxv | |
|
|
xxix | |
|
|
xxxv | |
|
I Test Design, Item Pool, and Maintenance |
|
|
1 | (84) |
|
1 Overview of Computerized Multistage Tests |
|
|
3 | (18) |
|
|
|
|
1.1 Linear Tests and Computerized Adaptive Tests (CATs) |
|
|
3 | (1) |
|
1.2 Multistage Tests (MSTs) |
|
|
4 | (3) |
|
1.3 MST Designs for Different Purposes |
|
|
7 | (1) |
|
1.4 Implementation Schemes |
|
|
8 | (1) |
|
|
9 | (3) |
|
|
10 | (1) |
|
|
11 | (1) |
|
1.5.3 Number of Modules per Stage |
|
|
12 | (1) |
|
1.6 Content Balance and Assembly |
|
|
12 | (2) |
|
|
14 | (1) |
|
|
14 | (1) |
|
1.9 Scoring, Linking, and Equating |
|
|
15 | (1) |
|
1.10 Reliability, Validity, Fairness, and Test Security |
|
|
16 | (1) |
|
1.11 Current and Future Applications |
|
|
17 | (1) |
|
|
18 | (2) |
|
|
20 | (1) |
|
2 Multistage Test Designs: Moving Research Results into Practice |
|
|
21 | (18) |
|
|
|
2.1 The MST Design Structure |
|
|
22 | (2) |
|
2.2 The State of Research: MST Development and Design Considerations |
|
|
24 | (12) |
|
2.2.1 Design and Design Complexity |
|
|
25 | (3) |
|
2.2.2 Test and Module Length |
|
|
28 | (2) |
|
2.2.3 Item Banks, Statistical Targets, and Test Assembly |
|
|
30 | (2) |
|
2.2.4 Routing and Scoring |
|
|
32 | (2) |
|
2.2.5 Security and Exposure |
|
|
34 | (2) |
|
2.3 Conclusions and Next Steps |
|
|
36 | (3) |
|
3 Item Pool Design and Maintenance for Multistage Testing |
|
|
39 | (16) |
|
|
3.1 Designing an Item Pool Blueprint |
|
|
40 | (6) |
|
3.1.1 The Concept of a Design Space |
|
|
41 | (1) |
|
3.1.2 Models for Blueprint Design |
|
|
42 | (1) |
|
3.1.3 General Model for Integer Programming |
|
|
43 | (1) |
|
3.1.4 Integer Programming Blueprint Design for MST |
|
|
43 | (2) |
|
3.1.5 Overlapping Modules |
|
|
45 | (1) |
|
3.2 Applications in Item Writing |
|
|
46 | (6) |
|
|
47 | (2) |
|
3.2.2 Generating the Modules |
|
|
49 | (3) |
|
|
52 | (1) |
|
|
53 | (2) |
|
4 Mixed-Format Multistage Tests: Issues and Methods |
|
|
55 | (14) |
|
|
|
4.1 Literature Review on Design Components in Mixed-Format MST |
|
|
56 | (5) |
|
|
56 | (1) |
|
|
57 | (1) |
|
4.1.3 MST Panel Structure |
|
|
58 | (3) |
|
4.2 Comparing Other Testing Approaches |
|
|
61 | (1) |
|
4.3 Issues and Future Research Suggestions for Mixed-Format MST |
|
|
62 | (4) |
|
|
66 | (3) |
|
5 Design and Implementation of Large-Scale Multistage Testing Systems |
|
|
69 | (16) |
|
|
5.1 MST Design and Implementation Considerations |
|
|
71 | (10) |
|
5.1.1 Test Purpose and Measurement Information Targeting |
|
|
72 | (3) |
|
5.1.2 Item Bank Inventory Issues |
|
|
75 | (1) |
|
|
76 | (1) |
|
5.1.4 Exposure and Item Security Issues |
|
|
77 | (2) |
|
5.1.5 Scoring and Routing |
|
|
79 | (1) |
|
|
80 | (1) |
|
5.1.7 System Performance and Data Management Issues |
|
|
80 | (1) |
|
5.2 Conclusions: A Research Agenda |
|
|
81 | (4) |
|
5.2.1 MST Panel Design and Assembly Issues |
|
|
81 | (1) |
|
5.2.2 Item Banking Issues |
|
|
82 | (1) |
|
5.2.3 New MST Applications |
|
|
82 | (3) |
|
|
85 | (66) |
|
6 Overview of Test Assembly Methods in Multistage Testing |
|
|
87 | (14) |
|
|
|
|
|
|
87 | (1) |
|
|
88 | (1) |
|
6.3 Automated Assembly for MST |
|
|
89 | (5) |
|
6.3.1 Early Test Assembly Methods |
|
|
89 | (1) |
|
6.3.2 The 0-1 Programming Methods |
|
|
90 | (1) |
|
|
91 | (3) |
|
|
94 | (1) |
|
6.4 Setting Difficulty Anchors and Information Targets for Modules |
|
|
94 | (1) |
|
6.5 "On-the-Fly" MST (OMST) Assembly Paradigm |
|
|
95 | (3) |
|
6.5.1 The On-the-Fly MST Assembly Paradigm |
|
|
96 | (2) |
|
6.5.2 Future Research in On-the-Fly Test Assembly |
|
|
98 | (1) |
|
6.6 MST, CAT, and Other Designs---Which Way to Go? |
|
|
98 | (3) |
|
7 Using a Universal Shadow-Test Assembler with Multistage Testing |
|
|
101 | (18) |
|
|
|
7.1 Solving Shadow-Test Assembly Problems |
|
|
103 | (1) |
|
7.2 Basic Design Parameters |
|
|
104 | (3) |
|
7.2.1 Alternative Objectives for the Shadow Tests |
|
|
105 | (1) |
|
7.2.2 Alternative Objectives for the Selection of Items from the Shadow Tests |
|
|
105 | (1) |
|
7.2.3 Number of Shadow Tests per Test Taker |
|
|
106 | (1) |
|
7.2.4 Number of Test Takers per Shadow Test |
|
|
107 | (1) |
|
7.3 Different Testing Formats |
|
|
107 | (4) |
|
|
107 | (1) |
|
|
108 | (2) |
|
|
110 | (1) |
|
7.4 Relative Efficiency of Formats |
|
|
111 | (2) |
|
|
113 | (3) |
|
7.5.1 Test Specifications |
|
|
113 | (2) |
|
7.5.2 Setup of Simulation |
|
|
115 | (1) |
|
|
115 | (1) |
|
|
116 | (3) |
|
Appendix: Test-Assembly Constraints in Empirical Study |
|
|
117 | (2) |
|
8 Multistage Testing by Shaping Modules on the Fly |
|
|
119 | (16) |
|
|
|
|
122 | (3) |
|
8.2 MST-S versus MST-R versus CAT |
|
|
125 | (6) |
|
|
125 | (2) |
|
|
127 | (1) |
|
8.2.3 Results for Measurement Performance |
|
|
127 | (3) |
|
8.2.4 Results for Item Pool Utilization |
|
|
130 | (1) |
|
8.3 Discussion and Conclusion |
|
|
131 | (4) |
|
9 Optimizing the Test Assembly and Routing for Multistage Testing |
|
|
135 | (16) |
|
|
|
9.1 Optimizing MST Assembly: A Nonexhaustive Search |
|
|
135 | (11) |
|
|
136 | (1) |
|
|
137 | (8) |
|
9.1.3 Optimal Routing Module Length |
|
|
145 | (1) |
|
9.2 Limited Item Pools, Two- and Three-Parameter Models |
|
|
146 | (4) |
|
|
150 | (1) |
|
III Routing, Scoring, and Equating |
|
|
151 | (98) |
|
10 IRT-Based Multistage Testing |
|
|
153 | (16) |
|
|
|
153 | (5) |
|
10.1.1 Item Response Model |
|
|
153 | (1) |
|
10.1.2 Likelihood Function |
|
|
154 | (1) |
|
|
155 | (1) |
|
10.1.4 Information and Error |
|
|
156 | (1) |
|
10.1.5 Classification Decision |
|
|
156 | (2) |
|
10.2 Motivation for Tailored Testing |
|
|
158 | (4) |
|
|
162 | (5) |
|
10.3.1 Static Routing Rules |
|
|
163 | (2) |
|
10.3.2 Dynamic Routing Rules |
|
|
165 | (1) |
|
10.3.3 Special Considerations for Routing in Classification Tests |
|
|
166 | (1) |
|
10.4 Scoring and Classification Methodologies |
|
|
167 | (1) |
|
|
168 | (1) |
|
11 A Tree-Based Approach for Multistage Testing |
|
|
169 | (20) |
|
|
|
|
|
169 | (1) |
|
11.2 Tree-Based Computerized Adaptive Tests |
|
|
170 | (1) |
|
11.3 Tree-Based Multistage Testing |
|
|
171 | (2) |
|
|
173 | (3) |
|
11.4.1 Definition of Module Scores |
|
|
173 | (1) |
|
11.4.2 Definition of Cut Scores |
|
|
173 | (1) |
|
11.4.3 Minimizing Mean Squared Residuals |
|
|
174 | (1) |
|
11.4.4 Procedure and Evaluation |
|
|
175 | (1) |
|
|
176 | (11) |
|
|
176 | (1) |
|
|
177 | (1) |
|
|
178 | (2) |
|
|
180 | (1) |
|
|
181 | (4) |
|
|
185 | (2) |
|
|
187 | (1) |
|
11.7 Limitations and Future Research |
|
|
187 | (2) |
|
12 Multistage Testing for Categorical Decisions |
|
|
189 | (16) |
|
|
|
12.1 Computer-Mastery Methods |
|
|
189 | (3) |
|
12.1.1 Sequential Probability Ratio Test (SPRT) |
|
|
190 | (1) |
|
12.1.2 Adaptive Mastery Testing |
|
|
190 | (1) |
|
12.1.3 Computer-Mastery Test |
|
|
190 | (1) |
|
12.1.4 Adaptive Sequential Mastery Test |
|
|
191 | (1) |
|
12.2 Information Targeted at Cut Versus at Ability |
|
|
192 | (1) |
|
12.3 Influence of Multiple Cut Scores |
|
|
193 | (1) |
|
12.4 Factors That Can Reduce Optimal Solutions |
|
|
194 | (1) |
|
12.4.1 Cut Score Location |
|
|
194 | (1) |
|
12.4.2 Satisfying Content and Statistical Specifications |
|
|
194 | (1) |
|
12.4.3 Administering Blocks of Items Versus Individual Items |
|
|
195 | (1) |
|
12.5 Example Based on Smith and Lewis (1995) |
|
|
195 | (10) |
|
13 Adaptive Mastery Multistage Testing Using a Multidimensional Model |
|
|
205 | (14) |
|
|
|
205 | (1) |
|
13.2 Definition of the Decision Problem |
|
|
206 | (3) |
|
13.2.1 Multidimensional IRT Models |
|
|
206 | (1) |
|
13.2.2 Compensatory Loss Models |
|
|
207 | (2) |
|
13.2.3 Conjunctive Loss Models |
|
|
209 | (1) |
|
13.3 Computation of Expected Loss and Risk Using Backward Induction |
|
|
209 | (2) |
|
13.4 Selection of Items and Testlets |
|
|
211 | (2) |
|
|
213 | (4) |
|
13.5.1 Compensatory Loss Functions |
|
|
213 | (3) |
|
13.5.2 Conjunctive Loss Functions |
|
|
216 | (1) |
|
13.6 Conclusions and Further Research |
|
|
217 | (2) |
|
14 Multistage Testing Using Diagnostic Models |
|
|
219 | (10) |
|
|
|
14.1 The DINA Model and the General Diagnostic Model |
|
|
219 | (3) |
|
14.2 Experience with CD-CATs |
|
|
222 | (3) |
|
|
225 | (2) |
|
|
227 | (2) |
|
15 Considerations on Parameter Estimation, Scoring, and Linking in Multistage Testing |
|
|
229 | (20) |
|
|
|
|
229 | (3) |
|
15.2 The Item Response Model |
|
|
232 | (4) |
|
15.2.1 The Conditional Distribution of Each Response Score |
|
|
233 | (1) |
|
15.2.2 Local Independence |
|
|
234 | (1) |
|
|
235 | (1) |
|
15.2.4 The Distribution of the Latent Variable |
|
|
235 | (1) |
|
|
236 | (5) |
|
15.3.1 Maximum Likelihood Estimation |
|
|
236 | (2) |
|
15.3.2 Expected A Posteriori Estimation |
|
|
238 | (1) |
|
15.3.3 Modal A Posteriori Estimation |
|
|
238 | (1) |
|
|
239 | (1) |
|
|
240 | (1) |
|
15.3.6 Routing Rules and Estimated Scores |
|
|
240 | (1) |
|
15.4 Approaches to Parameter Estimation |
|
|
241 | (4) |
|
15.4.1 Concurrent Calibration |
|
|
242 | (1) |
|
15.4.2 Separate Calibration |
|
|
243 | (1) |
|
15.4.3 Sequential Linking |
|
|
244 | (1) |
|
15.4.4 Simultaneous Linking |
|
|
244 | (1) |
|
|
245 | (4) |
|
|
246 | (1) |
|
|
246 | (3) |
|
IV Test Reliability, Validity, Fairness, and Security |
|
|
249 | (52) |
|
16 Reliability of Multistage Tests Using Item Response Theory |
|
|
251 | (14) |
|
|
|
252 | (5) |
|
16.1.1 Test Reliability in Classical Test Theory |
|
|
252 | (1) |
|
16.1.2 Standard Error of Measurement in CTT |
|
|
253 | (1) |
|
16.1.3 Test Reliability in IRT |
|
|
253 | (3) |
|
16.1.4 Information Functions |
|
|
256 | (1) |
|
16.2 Application: IRT Reliability for MST in NAEP |
|
|
257 | (6) |
|
|
258 | (1) |
|
|
258 | (5) |
|
|
263 | (2) |
|
17 Multistage Test Reliability Estimated via Classical Test Theory |
|
|
265 | (6) |
|
|
|
17.1 The Estimation Procedure |
|
|
266 | (2) |
|
17.2 Testing the Accuracy of the Estimation Procedure |
|
|
268 | (1) |
|
17.3 How Accurate Were the Estimates? |
|
|
269 | (2) |
|
18 Evaluating Validity, Fairness, and Differential Item Functioning in Multistage Testing |
|
|
271 | (14) |
|
|
|
|
272 | (1) |
|
18.2 Opportunities for Item Review and Answer Changing |
|
|
272 | (1) |
|
|
273 | (1) |
|
18.4 MST Routing Algorithms |
|
|
274 | (1) |
|
|
275 | (2) |
|
18.6 Comparability of Computer Platforms |
|
|
277 | (1) |
|
18.7 Accommodations for Students with Disabilities and English Language Learners |
|
|
278 | (1) |
|
18.8 Differential Item Functioning Analysis in MSTs |
|
|
278 | (2) |
|
18.9 Application of the Empirical Bayes DIF Approach to Simulated MST Data |
|
|
280 | (4) |
|
18.9.1 Root Mean Square Residuals of DIF Estimates |
|
|
280 | (1) |
|
18.9.2 Bias of EB and MH Point Estimates |
|
|
281 | (1) |
|
18.9.3 DIF Flagging Decisions for the EB Method |
|
|
281 | (1) |
|
18.9.4 Application of CATSIB to MSTs |
|
|
281 | (2) |
|
18.9.5 DIF analysis on the GRE MST |
|
|
283 | (1) |
|
|
284 | (1) |
|
19 Test Security and Quality Control for Multistage Tests |
|
|
285 | (16) |
|
|
|
|
19.1 An Overview of a Three-Component Procedure |
|
|
286 | (1) |
|
19.2 Tools to Evaluate Test Security and Quality Control |
|
|
287 | (9) |
|
19.2.1 Short-Term Detection Methods |
|
|
287 | (6) |
|
19.2.2 Long-Term Monitoring Methods |
|
|
293 | (3) |
|
19.3 A Simulation Study Using CUSUM Statistics to Monitor Item Performance |
|
|
296 | (4) |
|
|
300 | (1) |
|
V Applications in Large--Scale Assessments |
|
|
301 | (120) |
|
20 Multistage Test Design and Scoring with Small Samples |
|
|
303 | (22) |
|
|
|
|
|
304 | (1) |
|
|
304 | (2) |
|
20.3 Various MST Module Designs |
|
|
306 | (6) |
|
|
306 | (1) |
|
20.3.2 Module Difficulty Levels |
|
|
306 | (1) |
|
20.3.3 Biserial Correlation (rbi) |
|
|
307 | (1) |
|
20.3.4 Module Difficulty Ranges |
|
|
307 | (4) |
|
20.3.5 Characteristics of Modules |
|
|
311 | (1) |
|
|
312 | (1) |
|
|
312 | (2) |
|
20.5 Comparisons of the Six MST Designs |
|
|
314 | (9) |
|
|
314 | (2) |
|
|
316 | (2) |
|
|
318 | (1) |
|
20.5.4 Cronbach's α for All Designs in the Application Sample |
|
|
319 | (4) |
|
|
323 | (1) |
|
20.7 Limitations and Future Research |
|
|
324 | (1) |
|
21 The Multistage Test Implementation of the GRE Revised General Test |
|
|
325 | (18) |
|
|
|
|
|
327 | (1) |
|
|
328 | (7) |
|
21.2.1 Test Specifications |
|
|
328 | (1) |
|
|
329 | (2) |
|
|
331 | (3) |
|
|
334 | (1) |
|
|
335 | (2) |
|
|
335 | (2) |
|
|
337 | (1) |
|
|
337 | (4) |
|
|
338 | (1) |
|
|
339 | (1) |
|
|
340 | (1) |
|
|
341 | (2) |
|
22 The Multistage Testing Approach to the AICPA Uniform Certified Public Accounting Examinations |
|
|
343 | (12) |
|
|
|
|
22.1 Research on Multistage Testing |
|
|
343 | (6) |
|
22.2 Item Bank Development for MST |
|
|
349 | (1) |
|
22.3 Content Security Monitoring for MST |
|
|
350 | (2) |
|
22.4 Inventory Exposure Planning for MST |
|
|
352 | (2) |
|
|
354 | (1) |
|
23 Transitioning a K--12 Assessment from Linear to Multistage Tests |
|
|
355 | (16) |
|
|
|
|
23.1 Administering CTP Items Online |
|
|
356 | (2) |
|
23.2 Creating a New MST Scale Using IRT |
|
|
358 | (5) |
|
23.2.1 Vertical Linking Item Sets |
|
|
358 | (1) |
|
23.2.2 Evaluation of Linear Online Data |
|
|
359 | (1) |
|
23.2.3 IRT Calibration and Item Fit Analysis |
|
|
359 | (1) |
|
23.2.4 Vertical Linking of Grades within a Content Area |
|
|
360 | (1) |
|
23.2.5 Evaluation of the Vertical Scales |
|
|
361 | (2) |
|
23.3 Multistage-Adaptive Test Development |
|
|
363 | (4) |
|
23.3.1 Choosing the MST Design |
|
|
363 | (1) |
|
23.3.2 Assembling the MSTs |
|
|
364 | (3) |
|
23.3.3 Selecting Router Cut Scores |
|
|
367 | (1) |
|
|
367 | (2) |
|
|
369 | (2) |
|
24 A Multistage Testing Approach to Group-Score Assessments |
|
|
371 | (20) |
|
|
|
|
|
|
372 | (2) |
|
|
374 | (1) |
|
|
375 | (7) |
|
24.3.1 Design, Sample, and Instrument |
|
|
375 | (1) |
|
24.3.2 Routing and Item Selection |
|
|
376 | (3) |
|
|
379 | (2) |
|
|
381 | (1) |
|
|
382 | (5) |
|
|
382 | (3) |
|
|
385 | (1) |
|
|
386 | (1) |
|
|
387 | (4) |
|
|
388 | (1) |
|
24.5.2 Recommendations and Further Research |
|
|
389 | (2) |
|
25 Controlling Multistage Testing Exposure Rates in International Large-Scale Assessments |
|
|
391 | (20) |
|
|
|
|
25.1 Item Exposure Rate Control for Multistage Adaptive Assessments |
|
|
394 | (3) |
|
25.2 Method: How to Compute and Adjust the Item Exposure Rates |
|
|
397 | (6) |
|
25.2.1 PIAAC Routing Diagram |
|
|
397 | (3) |
|
25.2.2 Observed Score Distribution |
|
|
400 | (1) |
|
25.2.3 Cutting Curves for Stage Test Booklets |
|
|
401 | (2) |
|
|
403 | (1) |
|
|
404 | (4) |
|
25.4.1 Stage 1 Exposure Rates |
|
|
404 | (1) |
|
25.4.2 Stage 2 Exposure Rates |
|
|
405 | (3) |
|
|
408 | (3) |
|
26 Software Tools for Multistage Testing Simulations |
|
|
411 | (10) |
|
|
|
|
411 | (3) |
|
|
411 | (1) |
|
|
412 | (1) |
|
26.1.3 Input and Output Examples |
|
|
412 | (1) |
|
26.1.4 Performance, Availability, and Support |
|
|
412 | (2) |
|
|
414 | (6) |
|
|
414 | (3) |
|
26.2.2 Using R for Simulating MST |
|
|
417 | (2) |
|
26.2.3 Availability and Support |
|
|
419 | (1) |
|
|
420 | (1) |
|
|
421 | (18) |
|
27 Past and Future of Multistage Testing in Educational Reform |
|
|
423 | (16) |
|
|
|
426 | (4) |
|
27.2 A Model-Based Three-Stage Design |
|
|
430 | (3) |
|
27.3 Item Generation and Automated Scoring and Broadly Accessible Test Content |
|
|
433 | (4) |
|
27.3.1 Producing Items and Test Forms More Efficiently |
|
|
434 | (2) |
|
|
436 | (1) |
|
|
436 | (1) |
|
27.4 Summary and Conclusions |
|
|
437 | (2) |
Bibliography |
|
439 | (50) |
Index |
|
489 | |