Tango on Baseball Archives

Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)

It's not written in this story, but the statistical panel ran some "Probability of Incorrect Scores Matching Nearby" or somethign to that effect, to indicate that the student cheated based on similar incorrect answers to the students sitting next to her. Rather fascinating, really.
--posted by TangoTiger at 10:04 PM EDT

Posted 10:49 p.m., January 21, 2004 (#1) - tangotiger (homepage)
This link has the full story, along with this:

The testing service looked at Soriano's answers and those of the students sitting near her. Using a measure called the Probability of Matched Incorrect Answers, the review panel determined Soriano's incorrect answers on the test were similar enough to those of a student at a neighboring desk to indicate she had cheated.

Posted 11:41 p.m., January 21, 2004 (#2) - James Fraser
At McGill before every multiple choice exam we are warned that "The Exam Security Monitoring Program is in effect - this program compares student seating plans with unusual matched distributions of correct and incorrect answers. Evidence from this program can be used to corroborate or initiate a charge of cheating."... although I doubt it often gets used.

Posted 8:24 a.m., January 22, 2004 (#3) - Scoriano
Thanks for not entitling this "Soriano's Rise Attributable to Cheating," or something like that. I woulda had a heart attack.

I wonder what the error rate on the program might be. It might be highly reliable, but if it is wrong even occasionally, it causes some terrific scars. It would seem very difficult to balance protecting the integrity of the tests and fairness to each individual test taker.

Posted 9:21 a.m., January 22, 2004 (#4) - mathteamcoach
I think that this allegation is similar to the situation surrounding Stand and Deliver , the movie highlighting the somewhat-true story of Jaime Escalante's Advanced Placement calculus students. They were accused of cheating on the AP calculus exam because most of the students' incorrect answers where the same. It turned out that Escanlante taught his students various "shortcuts" to solving particular type of problems that didn't work in certain situations. So, the algorithms that they learned produced the same wrong answers for certain problems.

By the way, I am a high school teacher, and have proctored the SAT at our school. By my impressions, it seems that there are different forms of the SAT given to students in the same room. Then, I think, the seating plan for the students ensures that one particular form is not near another form, so that if someone does try to copy answers, they would be wrong.

There is a good book indicting the SAT and all of the conspiracy theories surrounding it called None of the Above

Posted 9:32 a.m., January 22, 2004 (#5) - tangotiger
Having multiple tests (or even the same tests but differetn ordered questions) in the same room seems to be the easiest way to control this. Not sure why they wouldn't just make it a rule.

Posted 9:55 a.m., January 22, 2004 (#6) - Brad of this Nation
This is a very old story. Not for Soriano, but for other students like her. The students who make these kinds of jumps always were sick the day they took the first test, etc.

There is a certain random element in SAT testing. Even without feedback/ prep tests, a student is not going to get the exact score every time.

So shouldn't regression to the mean be considered? Improving from 750 to 1100 would be a lot easier than going from 1100 to 1450.

My advice to Soriano: Fight the power! And stop swinging at ouside sliders.

Posted 11:12 a.m., January 22, 2004 (#7) - Sam M (homepage)
My alma mater, Bradley University, had a basketball recruit, Daniel Ruffin, declared ineligible to play as a freshman by the NCAA because it refused to accept numerous ACT scores he'd achieved. The reason? His first score was too low to qualify for NCAA eligibility, and then his score jumped seven points the next time. This from the Bradley Scout (homepage):

The Peoria Journal Star reported in October that the ACT did not certify any of the three qualifying scores that Ruffin posted on the exam.

In his first attempt at the exam, according to the Journal Star, the Peoria native received a non-qualifying score, only to jump seven points on his next attempt, which caused ACT officials to render it invalid. The testing company did the same for Ruffin’s next two scores.

Even if you can see the reason they'd be suspicious of a jump from one test to the next, it seems to me it starts to get ridiculous if the student validates it with two additional scores consistent with the increase.

Posted 11:15 a.m., January 22, 2004 (#8) - mathteamcoach (homepage)
Apparently, ETS, the organization that administers the SAT, has lost cases like this that have been tried in court. See the homepage link for a 1995 case in NY.

Posted 11:27 a.m., January 22, 2004 (#9) - Scoriano
I am at standstill on how to start evaluating schools and school districts for my daughter. SAT scores might be a part of the analysis. Does Mathteamcoach/anyone know of a source for determining what the historical average SAT scores have been? What the breakdowns might be by region, school districts, etc. Lefty vs. righty splits :)

Do private schools generally publish their student's average scores? Sources?

Of course, it is ultimately a Moneyball question. One has to compare the costs (including pesky taxes) and benefits. I may be a Yankee fan, but I may have to spend my educational dollars like Oakland.

Posted 1:49 p.m., January 22, 2004 (#10) - Mark Field
Small world dept:

My parents both went to Bradley. Sam, are you from Peoria?

Posted 2:23 p.m., January 22, 2004 (#11) - Sam M
My parents both went to Bradley. Sam, are you from Peoria?

Nope. Born in New York, raised in Miami. Before I went to Bradely, I'd never been to Peoria!

Posted 3:46 p.m., January 22, 2004 (#12) - mathteamcoach
Does Mathteamcoach/anyone know of a source for determining what the historical average SAT scores have been?

This is a difficult question to answer because, in the last 10-15 years, the College Board has tinkered with the SAT and its scoring system so many times I lost track of what is significant and what isn't. First, before say 1995 (approximately), the SAT itself changed very little. At the inception of the SAT, the scores were "centered" at 500 (meaning the average score on either the verbal or math section was 500 on a scale of 200 to 800). Over time, as more and more students took the exam, the average scores continued to fall until in the late 80's when the average scores were in the low to mid 400's. In 1995 (or so), the college board decided to re-center the scores so that the average returned back to 500. So, in effect, a scaled score of say 450 in 1985 was now equivalent to a 500 in 1995. In addition to toying with the scaled scores, a lot of the exams content has been changed. Writing portions have been added, certain types of multiple choice questions have been removed, etc... All of this makes it difficult to compare SAT across different periods of time. There is a book that talks about this in much more depth entitled A Manufactured Crisis , published in about 1994 or 1995 (pre-re-centering of the SAT). The book is a very liberal look at education in America, but it does contain an enlightening chapter discussing the history of the SAT.

What the breakdowns might be by region, school districts, etc. Lefty vs. righty splits :)

What state are you living in? In Massachusetts, every high school publishes a school profile containing all kinds of data: SAT scores, # of National Merit Finalists, % of students who go onto college, etc. To get these profiles, you need only inquire to the school.

Do private schools generally publish their student's average scores? Sources?

Yes. I assume that individual private schools would be happy to give you their "profile" as well. In Massachusetts, there are many public school that are MUCH better than some of the private and parochial schools (in my opinion -- I have taught in a private, catholic-affiliated high school and also in a high school in an adjacent town. I would actaully prefer to send my child to a public school before a private school). So, be careful; don't always assume that a private school is a better option. The exceptions are the schools that charge college-like tuition fees (Exeter, Choate, Deerfield Acad., etc.)

Of course, it is ultimately a Moneyball question. One has to compare the costs (including pesky taxes) and benefits. I may be a Yankee fan, but I may have to spend my educational dollars like Oakland.

Yup. There is a strong correlation between socio-economic status of a community and their quality of education. I haven't studied the issue, but the towns in Massachusetts that have the highest average SAT scores are certainly the towns with the highest median home price. Taxes are through the roof, and even when tax rates don't go up here, property taxes still go up because real estate in Massachusetts is like gold and the assessments are increasing like crazy.

Posted 3:47 p.m., January 22, 2004 (#13) - mathteamcoach
are the italics gone?

Posted 3:48 p.m., January 22, 2004 (#14) - mathteamcoach
I would like to announce that this is the first time I have screwed up the italics in any post! And on Chinese New Year to boot! Sorry guys

Posted 3:54 p.m., January 22, 2004 (#15) - Sam M
here I am, to save the day!

That ought to do it.

[an error occurred while processing this directive] Posted 4:25 p.m., January 22, 2004 (#17) - Mark Field
Born in New York, raised in Miami. Before I went to Bradely, I'd never been to Peoria!

Must have been quite the culture shock.

Posted 4:41 p.m., January 22, 2004 (#18) - Scoriano
Mathteamcoach, thanks. I appreciate a practitoner's insights.

Posted 4:56 p.m., January 22, 2004 (#19) - Sam M
Must have been quite the culture shock.

Well, put it this way. Dorothy going from Kansas to Oz had nothin' on me. ;-)

Actually, I liked Peoria very much, and liked Bradley even more.

Posted 6:58 p.m., January 22, 2004 (#20) - Mark Field
When my parents went to Bradley, Chick Hearn was the basketball announcer there (that was before the gambling scandal -- Bradley had top quality teams). Then they moved to CA and Hearn was broadcasting the Lakers. They thought he just did ALL basketball games.

And some people say Peoria's a small town.

Posted 7:12 p.m., January 22, 2004 (#21) - Sam M
Hey, they had top quality teams after Chick Hearn left.

I will forever believe that the 1986 edition of the Braves basketball team was one of the best 10 teams in the country. They went (IIRC) 31-2 in the regular season, including going undefeated in the MVC and winning the conference tournament. They were ranked 11th in the final polls -- only to be seeded seventh by the frigging NCAA selection committee! Not only that, but they got put in the same regional with Louisville (where I now work, speaking of small worlds), the eventual national champion. Bradley smashed UTEP (by 18) in the first round, making a statement about the seeding. Then, in the second round against Louisville, they were tied with 10 minutes to go before fading down the stretch, eventually losing 82-68.

That team had a backcourt of Hersey Hawkins (two years later, the national Player of the Year) and Jim Les, won won the Naismith Award as the country's best player under 6'. Had they received the seeding they deserved (4-5), they'd have made the Sweet 16 at least, and maybe better.

I feel better now -- venting is healthy!

Posted 9:11 p.m., January 22, 2004 (#22) - Alan Jordan
One of the problems with identifying people as cheaters is what's commonly called the false positive rate, If a test labels someone as a cheater, what are the odds that the person is not a cheater. This is a real problem when you are trying to identify rare events.

Let's say hypothethically that I have a test that identifies cheaters. It correctly classifies cheaters as cheaters 99% of the time and correctly classifies non cheaters as non cheaters 99% of the time. What are the odds that a person who is classified as cheater is actually a cheater?

You need Bayes theorem and an estimate of what percentage are cheating.

If you assume that 1% of testtakers cheat then with the assumptions that I listed above the odds are 50% that the person classified by the test is not a child molester.

Suppose that it's ability to correctly classify cheaters or non cheaters were less than .99 then with an estimated 1% cheaters, it would actually be more likely that a person classified as a cheater was NOT a cheater.

Posted 3:42 a.m., January 23, 2004 (#23) - Michael
Yeah, but Alan Jordan where did you get the "child molester" part? I don't think you mean cheater == child molester.

At school we actually had some pretty nifty cheat detection programs in computer science where it would test student's computer program submissions and find numerous cases where students had copied other student's work (even from previous years). Amazing that even when you warn students that this type of thing is happening they *still* end up cheating.

Speaking of SAT type problems, a common problem that I've heard to illustrate the Bayes theorem problem is the taxi accident problem. Imagine that on some mystery island witnesses identify the color of a car that was in a hit and run correctly 80% of the time, and the other 20% they are incorrect. Now imagine that you have 95 yellow taxis and 5 green taxis on the island. There is a hit and run involving a taxi and there is a single witness who says the taxi in question was green. To the nearest percentage, what is the percent chance that the taxi really was green?

A. 17%
B. 20%
C. 50%
D. 63%
E. 80%

Many, many people (incorrectly) say 80%. It's almost as bad as the let's make a deal problem.

Posted 9:12 a.m., January 23, 2004 (#24) - Confused
"If you assume that 1% of testtakers cheat then with the assumptions that I listed above the odds are 50% that the person classified by the test is not a child molester."

I have read this sentence about 15 times now. Anyway I slice it, odds don't look so hot for Soriano.

Posted 9:41 a.m., January 23, 2004 (#25) - mathteamcoach
Again, I teach high school math, but not prob. and stats. However, in every class that I teach, I try to fit in two topics: the Monty Hall problem and Bayes Theorem. Regardless of my presentation, 1/4 of the students will "get it" and think that its the neatest thing they have ever learned about mathematics (and I'll get responses such as, "why can't every math class teach things like this!"), about 1/2 won't have any clue about the ideas, and the final 1/4 will disagree with what I have to say and try to find ways to refute it or simply just not believe it.

I hate to make a blanket statement, but there really is an "innumeracy" problem in the US. In reference to the "what statistics should matter to the average fan post" in clutch hits, if you ask the "average" fan how to compute ERA, I bet you'll get a lot of "I dunno"s, or some response that has elements of the right method. But, if you ask them if a particular ERA is good or bad, they'll know immediately. What's my point? I guess its not that people don't inuitively understand numbers, they just do not know how to look beyond the numbers.

Posted 10:12 a.m., January 23, 2004 (#26) - Alan Jordan
Michael,

"Yeah, but Alan Jordan where did you get the "child molester" part? I don't think you mean cheater == child molester"

Yes "Child Molester" should have been "cheater".

"At school we actually had some pretty nifty cheat detection programs in computer science where it would test student's computer program submissions and find numerous cases where students had copied other student's work (even from previous years)."

What were the sensitivities and specificities for this and what was the estimated percentage of cheaters?

The answer is A.

Confused,

Quite a screwup wasn't it.

Posted 11:28 a.m., January 23, 2004 (#27) - Josh (homepage)
See homepage for nifty little calc for Bayes (I guess this would be considered 'cheating').

Posted 12:29 p.m., January 23, 2004 (#28) - J Cross
using Josh's cheat program I'll say a) 17% (17.4 to be exact)

now to figure out how it got that...

Posted 12:58 p.m., January 23, 2004 (#29) - J Cross
ok, got it. one car one witness gives 4% chance the car is green and said to be green (.8*.05) and 19% it's yellow and said to be green (.95*.20). We know that it was said to be green so the chance it is green is 4%/(4%+19%) = 17.4%

Posted 1:07 p.m., January 23, 2004 (#30) - J Cross
I think it's interesting that if one witness says green and one witness says yellow then you're back to your priors. 95% chance the car is yellow.

Posted 2:16 p.m., January 24, 2004 (#31) - Alan Jordan
Actually I figured out how to do that problem a couple of years ago without realizing I was using Bayes rule or theorem. I probably didn't even know what it was at the time.

1. Start off with the number of Green and yellow taxis. In other problems where they give a percent (prevelence) instead of counts, then you can arbitrarily pick a number for the total like 100 or 1,000 and then multiply your prevelence by your arbitrary total.

2. Figure out how many greens (positives) are correctly classified. We have only 5 green taxis and 80% of 5 is 4.

3. Figure out how many yellows are incorrectly classified. We have 95 yellows and 20% of 95 is 19.

4. Divide the number of Correctly classified greens by the sum of correctly classified greens and incorrectly classified yellows. Remember that both incorrectly classified yellows and correctly classified greens will be the ones identified as greens. As J. Cross pointed out above, 4/(4+19) is .174 or 17.4%

In this example the percentage of greens correctly classified is equal to the percentage of yellows correctly classified. In most situations that isn't true. If you are trying to figure out whether a person will get into college based on there SAT, then there are literaly 1599 cut points that you could use to group people into high or low. The higher you pick your cut point, the better the correct classification for the high group, but the worse classification for the low group. Because of that there are two terms for correct classification rate that depend on whether it is a positive or a negative.

Sensitivity - the percent of positives (green taxis) correctly classified.

Specificity - the percent of negatives (yellow taxis) correctly classified.

the probability of subject classified as a positive actually being a positive is

Prevelence*sensitivity / (Prevelence*sensitivity+[1-prevelence]*Specificity)

where prevelence is the percentage of positives (greens).