Back in the beginning of April, 8th graders all over Texas took the STAAR math and reading tests. Since I happen to be a math teacher, the following discussion will relate specifically to the 8th grade math test, but it certainly relates to other grades and subjects.
According to the TEA website (See grade 8 mathematics blueprint: http://www.tea.state.tx.us/student.assessment/staar/blueprints/), the ‘scored’ portion of the test consisted of 56 questions. Also according to the TEA (‘Test Design and Setting of Student Performance Standards Chapter 2 pg 7), there were up to 8 additional multiple choice questions being field tested. Also, “[g]riddable field-test items are embedded in mathematics and science tests, as appropriate.” What does this mean for the student? Let’s see…
– approximately 18% more questions that the students had to complete in the same 4 hours.
– approximately 15% less time per question (~3 min 45 seconds per question as opposed to ~4 minutes 26 seconds per question if there were no field test questions).
As a math guy myself, and after having spoken with a friend of mine who is nearing completion of a PhD in Educational Psychology, I understand the need to collect data and have ‘item exposure’ so that the veracity of the question can be determined. So from a purely empirical stance, I am fine with the inclusion of field test questions.
But 13-14 year old students should not be treated empirically.
Question 1: How can I console a kid that may not have been able to finish the last 8+ questions because they ran out of time? Granted, nobody knows which questions were the field test questions, but the fact remains that, due to the high stakes nature of these tests, those field questions SIMPLY DID NOT MATTER.
Question 2: How can I console a kid that may have missed meeting the passing standard by one or two questions? Statistically speaking, those students would have answered at least a couple of those meaningless questions correctly. Not only were they not given credit for those correct answers, the time spent getting those correct answered were wasted. This in turn could have easily added to ‘test fatigue’, possibly leading them to miss a question later in the test that they normally would have answered correctly.
While “[f]ield-test data are analyzed for reliability, validity, and possible bias” (TEXAS ASSESSMENT PROGRAM TEST DEVELOPMENT PROCESS pg 2), I could find no specific methodology that could be independently vetted. To be clear, I understand the need for test security and analysis of new ‘product’. But the lack of transparency by testing companies (Pearson, in this case) is troubling to say the least.
Most other companies I can think of have a specific division devoted to the design and testing of new things. Drug companies finance trials of new medications. Fast food joints do slow roll outs of new things in test markets. Heck, we have all received those tiny boxes of cereal in the mail to try out. Those companies budget for those tests and collection of data. Pearson, on the other hand, essentially gets paid (by the taxpayers) to collect millions of data points.
If these tests were purely diagnostic in nature, I would have no issue with the inclusion of field test items. Of course, if the tests were diagnostic in nature, even having field test items would be rendered moot since the tests themselves would be released each year.
Just another reason why our current regime of ‘Test and Punish’ (with apologies to John Kuhn for the plagiarism) must be done away with.