Part of my Calculus procedure has been taking some benchmark data on my kids throughout the years. Other than improving student attitudes about Calculus, the second big priority is making sure students have an informed opinion about how they might do on the AP Exam. Kids are always free to do what they want, but I want to make sure if they're going to spend the money on that thing that they have a shot. Our results have been creeping upwards, and we are poised for a breakthrough, at least I hope so.
My data collection schemes have been problematic though. I think I've been a little too aggressive, giving questions that students probably aren't ready for in December. With the significant hurricane delay, we weren't even ready for what I've tried in the past, so I needed a new scheme. And with 75 students in AB, I needed something that'd be efficient to process so students could get feedback quickly.
A public version of the activity can be found here: Calculus Gauntlet Public
I wanted to test three things: Fundamentals (trig values, limits, continuity), Interpretation (curve sketching), and Skills (derivatives rules, Riemann sums). Roughly 12-20 items per section. I wasn't going to belabor any skills, if you can do it once you can do it ten times I figure. I sketched out what I wanted in each section:
To gather all the information, I was going to use Desmos Activity Builder. I didn't want to juggle a lot of papers, and I wanted a better idea of what items were causing problems. With previous benchmarks I had a vague idea of which questions didn't go well, this time I wanted to know for sure.
I included a mix of items: entering answers, typing short answers, multiple choice, plucking data off a graph, sketching on top of a graph, and some screens where problems were presented that students would complete on little cards they'd hand in. I wanted to assess their ability to determine a limit/derivative without making math entry fluency a limiting factor.
I initially planned 3 versions with 6 codes, but the reality of sifting through all the dashboards made me reconsider. I settled on 3 versions with 3 codes, randomly distributed among my class periods. There were 44 screens total, and 25 kids on each code.
The "version numbers" are just arbitrary hexadecimal numbers (go ahead, convert them, see how dumb I am) designed to obscure the number of versions. I was giving this to a lot of kids all day long over multiple days, I knew they were going to discuss it, but I wanted to make it a tiny bit less likely that they could figure out who they were sharing versions with.
Again for data collection simplicity, kids would access the same activity across multiple days. We use Chromebooks with school Google accounts, so linking their accounts to Desmos took 2 seconds and was done earlier in the year. I used pacing to restrict them to the section of the day:
This was one of those features I knew was going to come through, but didn't totally trust until I saw it in action. There wasn't a single technical issue over the three days. Each day the Activity Builder remembered the kid had previously accessed the activity and jumped them right to the section of the day. It was really elegant. Sketch slides with a trackpad still kinda stink, but I was not super critical of the results.
At the completion of each day, I did some right/wrong (I was pretty unforgiving here) tallying in a spreadsheet, and determined raw scores for the various sections. I also tallied up incorrect answers to see how questions performed. I would eventually throw out the worst performing questions in each section:
After three days of data collection, I set out to determine my final product. What were students going to get about their performance on this giant activity?
The intent of the activity was not to assign a grade based on their raw performance, merely to give them a snapshot of where they stood on December 11-13. Yes, this assessment would factor into their course grade somehow, but I wasn't going to cackle in delight as I failed tons of them, that's not what this was here to do.
Being able to click through dashboard screens and tally results was quicker than I thought, maybe 1 hour a day. Generating something meaningful from the data and formatting it nicely took another couple hours.
The other nice thing about this collection method is I could quickly check for version bias. Each of the three versions had questions that were identical, but others that were modified. Codes were distributed at random, and for whatever reason one version registered a higher average raw score. I curved the other two versions up, roughly 1.16x (normal College Board is 1.20x), so that the group average was the same as the highest average. I took the resulting adjusted score, divided it by max points available, which gave each student a percentage and quartile.
From left to right: class period, fundamentals raw (max 20), curve sketching raw (max 13), skills raw (max 18), raw total (max 48, three questions were deleted), version adjusted total (if required), percentage, quartile. Average was right under 70%. Seven students earned 100%.
I told the students about 10 times, that the percentage was NOT their grade on the assessment, it was merely a tool to see where they landed in the overall population. My message was this is one data point in a series of many and that we would be doing these again. I also wanted to communicate that 1st and 2nd quarter implied you were doing a good job, 3rd quarter meant you needed to study more, and 4th quarter should have been a little wake up call.
After handing out the slips I floated around and had a quick chat with each kid, affirming their work or letting the lower ones know that this number was not a personal judgement, but that something more is required of them.
This went pretty well. The kids took it seriously, the majority of students did well, and I think all of them got useful information out of it. More importantly, this activity was easy to build, easy to manage, and easy to score.
A great experience start to finish.