Thursday, October 20, 2011

Moneyball and Education

I read a number of education-oriented publications and blogs, but have rarely come across articles that I thought would be of much interest to folks who come here. But this one by Rick Hess in the EdWeek blog made a point worth passing on.

I haven't seen the movie Moneyball, but I understand the premise. Baseball is as much a playground for numbers geeks as it is for athletes. All kinds of statistics are kept, reported, and memorized by the most left-brained of baseball fans.

The storyline in Moneyball is that one particular numbers geek suggests to a major league manager that the statistics that get all the attention, like batting average, home runs, and RBIs, are perhaps not the ones that are good predictors of future success in winning games. So he proposes using a different set of statistics to predict which players would be most valuable to the team, and how they should best be utilized.

Hess suggests that the growing infatuation with value-added measures and test scores might lead to the same kind of misguided assessment of effectiveness as do the high-visibility statistics in baseball. He's suggesting that in the effort to develop effective measurement systems for education system performance - whether we're talking kids, teachers, or schools - we need to accept that this kind of statistical analysis in the education domain is still in its infancy, and that we have a way to go before this body of research evolves to really meaningful statistics, like the ones the numbers geek derived in Moneyball.

Lots of folks would like to use the standardized test scores and other existing measures to determine all kinds of very important stuff, in particular the allocation of resources and the evaluation of teachers. SB5, if it withstands the repeal initiative, mandates merit-based evaluation of teachers, but conveniently doesn't say how it should be done.

Most teachers I've talked to don't have a problem with the theory of a merit-based system, they just don't trust that it will be administered fairly. Hess's article points out another potential flaw - using the wrong statistics to measure effectiveness.

I spent my career in cahoots with some pretty tremendous sales folks. One of the main challenges of my colleague, the Sales VP, was to come up with an annual commission plan for his sales team that motivated them to sell the right set of products for the right set of terms so as to meet the strategic goals of the company. The sales folks - as smart as they were - would most assuredly figure out how to maximize their compensation given whatever rules the Sales VP set, regardless of whether or not their efforts contributed to meeting the company's strategic goals. So the Sales VP had to put a great deal of thought into how to design the plan to get the behavior and results he was looking for.

We have to take the same kind of care if and when a merit-based system is put into use to determine teacher compensation. Otherwise there will be some unintended and expensive consequences, and we still might not achieve our strategic goals.


  1. There are many potential flaws in a merit-based system, but right now there's zero merit-basing going on so I'm content with some margin of error.

    Regarding teachers who don't trust that it will be administered fairly, they need to suck it up. In the private sector life isn't always fair, and there are innumerable examples of people being promoted in business who shouldn't be. But does that mean we should retire the whole merit-based system in the private sector? No. Does the fact that some umpires make wrong calls mean we should abolish umpires? No. The "umpires" in this case are the principals.

    What we see in Moneyball is kind of humorous. In baseball, a game in which objectivity would seem to be the absolute rule (every day we know how many hits/errors they made), we find that there really isn't "complete" objectivity even there! So how much less objectivity is there going to be in evaluating teachers. It's not going to be perfect since it isn't even perfect for baseball players. Life and work can't always be reduced to numbers on a page, but we must make the attempt.

    So I would argue that SB5, as half-arsed as it may be, is much better than what we're doing now, which is nothing. And surely we'll be able to tweak things over time to improve the quality of the judging, assuming the rating system doesn't cost loads of money (which we don't have).

  2. Don't get me wrong - I'm a big fan of merit based evaluation systems (and free markets for setting prices, for that matter). All I'm trying to point out is that a merit-based system which measures the wrong things, and rewards accordingly, is worse than no merit-based system at all.

    For example, what measure do we use to figure out whether a kindergarten teacher is been about to instill a love of learning in her/his pupils, or has been able to successfully identify a kid as having a learning disability vs a behavior problem? How do we figure out whether the AP Calculus teacher is actually a good teacher, or is mediocre but really smart and motivated students?

    If we had 80% of it right, and only needed to tweak the 20%, then I'm with you - let's get going and tweak the system as we go.

    But I'm concerned that we may have only 20% right.

  3. Let's see - automatic raises versus performance-based raises. Are you kidding me????

    I'll take my chances on teachers being rewarded for reaching certain measures (even if they are sometimes the wrong measurements, or lead to unintended consequences), rather than someone being rewarded for uh.....showing up and breathing.

    Are there faults in in performance based system? Yes. But, c'mon, where is your criticism of the automatic raise system? Why this article on the eve of Issue 2 and not a counterbalancing criticism of automatic raises?

    And to think, you are our most conservative board member. Yikes!

  4. Maybe I'm a little like OSU vs Illinois last week, or the Davidson football team for that matter - I want to win the game, but my strategy is one of ball control, not flash. Remember how Woody used to say that there's three things that can happen when you pass the ball, and two of them are bad?

    I think SB5 is a Hail Mary - everybody go deep. Odds are it won't stick. More likely it will be intercepted, flip the momentum, and make it tons harder to win the game, which is to reform the way performance is measured and rewarded.

    The purpose of this blog, and the reason I ran for the school board, is because I believe there are many level-headed, reasonable people in this school district - taxpayers and teachers alike - who, once they understand the economics at play, will help develop a sustainable solution. It won't be easy, and it won't be quick.

    Oscillating back and forth between the extremes will just waste time and money.

  5. @Paul

    You asked: "How do we figure out whether the AP Calculus teacher is actually a good teacher, or is mediocre but really smart and motivated students?"

    You're making the same mistake the district employees are. You are assuming everything happens in a one-year vacuum.

    It's called analytics; you compare the calculus teacher to the other teachers, compare the kids to the other kids, look at the teacher's results with other classes, in other years; look at the kids' results to previous years.


    A good analytics will give you an answer in about 30 seconds.

    Now, that doesn't mean that's the end of the process; the key is that the analytics will show you trends, and will show you if a given employee is working harder, or not, in a given year.

    And, let's face it, that's the key as to how much of a merit pay raise an employee should receive.

  6. I don't doubt that it's a problem that can be solved with sufficient data, but I won't presume it's at all as trivial as you might be painting it.

    I know from my personal experience that a bad grade in math in one year followed by a great grade in math the next year (in a harder course) might be less a reflection of the two teachers than it is about the mental development of the kid, and maybe a change in study habits.

    With a constantly changing population of students, teachers, and course contents, it's not as simple as controlling one set of variables while observing changes in others. There are virtually no controllable variables.

    And its hard to trust an evaluation system which is so complicated that it takes a PhD in statistics or systems engineering to understand.

    Even baseball, with all its statistics, is ultimately settled by a simple number - how many runs were scored.

    It's one thing to use complex statistical analysis to develop strategy and choose tactics. It's another thing to use such analysis to define success and failure. That ignores the human element in the system, the one where performance is a function of not only "can do," but also "will do," also known as motivation.

  7. Great post and discussion. If you like baseball at all, I would suggest reading Moneyball. The movie may be good in its own right, but the book is outstanding.

    The use of statistics in evaluating educational outcomes may yet prove to be valuable, but right now I'm afraid there are still improvements to be made. Further, I don't understand the proposal to transfer what are admittedly inefficient methods used in the private sector to the public sector. Undoubtedly our current system is not working as well as it could, but to simply replace it with a system that may or may not be an improvement would be a mistake in my opinion. Fairness to teachers is part of the discussion for certain. But it's also about a greater need to identify the best in the profession, reward them for their work, and encourage others to strive to reach that level. A system that opens the door to potential nepotism and cronyism is a steps backwards. (For better or worse, set salary schedules and tenure do go a long way towards protecting against those problems.)

    I'm speaking about the system as a whole here and not Hilliard in particular. I stumbled across this blog sort of randomly as I have no ties to Hilliard or Central Ohio. The discussions about educational issues here go well beyond what I regularly find in my neck of the woods, so I appreciate it. Definitely makes me look at the issues from some new angles.

    -A Reader From Northeast Ohio

  8. Thanks for weighing in, and hope you come back for future discussions. I think it's a good thing to get perspectives from outside our community.

  9. A Reader From Northeast Ohio states:

    "A system that opens the door to potential nepotism and cronyism is a steps backwards."

    First of all, from what I have seen, nepotism and cronyism already exists in the current system. How else do family members of employees fill so many open positions when there are hundreds of applicants? Look through the staff roster... numerous family members and alumni fill the ranks (including those of Board members and Administrators). Purely on talent and skills?

    I knew someone who interviewed for a Speech position in Hilliard in the 90's. She was told she was by far the most qualified candidate. However, there was a niece of a Secretary in the Central Office that was applying, with no experience, that was getting the job.

    Secondly, and I have tried to state this before, the current system does not contain any micro-level accountability for making a bad hire or continue to retain a poorly performing employee. The people who make the hiring decisions are so far removed from the day-to-day position of the staff member and do not suffer any direct consequences for making a bad hire (nor the benefits of making a good hire).

    I have five staff members and a contractor on my team. If one departs, and I leave my hiring decision to be influenced by nepotism or cronyism, irregardless of skills and abilities, guess who pays the price? Same goes for my judgments for raises, promotions, and assigning work.

    There are always examples of cronyism at play in a purely merit-based system. It is human nature. But, more often than not, my experience is that the majority of the time, it fosters the old adage that "the cream rises to the top".

  10. @Paul

    "I know from my personal experience that a bad grade in math in one year followed by a great grade in math the next year (in a harder course) might be less a reflection of the two teachers than it is about the mental development of the kid, and maybe a change in study habits."

    This is true, but you missed the point I think. We're not talking about one kid. We're talking about the whole class.

    If one or two kids show improvement then you're probably right; if the entire class shows it (or, shows a drop off) then it's more likely to be the teacher.

    I also want to state something for the record: I don't want to see a system like this used to simply fire teachers (unless the data shows they are exceptionally bad...); I want, and would expect to see, teachers with performance issues get the opportunity to become better teachers. That's what happens in the private sector -- we give employees a lot of support to become better workers. I would expect nothing less from our district for its employees as well.

  11. These things I am sure of:

    Runs and RBIs are all that matter.
    Your boss decides if you get a raise or promotion.
    Everything else is unnecessary complication

  12. M: Yes, I gave a one kid (personal) example. But that's the point in a way.

    One of the criticisms of our current system is that it's designed around a post-WWII industrial model that ignores the unique abilities and interests of each kid, and at what individual rate they best absorb teaching.

    One of the hallmarks of that industrial model is the statistical process control we're talking about here. But the goal of this approach is to drive out inconsistencies from the ideal standard. I'd hate think we're wanting to do that to our kids.

    Every August, the teachers get a new mix of kids. Kids move into the district, and kids move out. Some will have had physical and/or mental growth spurts. Others seem not to have developed at all in the past year, but just might develop at double the normal rate this year.

    In very large populations, these variances might yield to statistical smoothing. But in a classroom of 25 kids, the differences are in your face.

    The school district has just acquired a tool (with Race to the Top dollars) from a company called Performance Matters that might help us sort some of this stuff out. The data load is underway.

  13. Ugh. How much did we spend on that? In a matter this complex, intuition trumps analytics every time.

  14. T: It was around $150K, but again was paid with Federal Race to the Top funds that we could either use or lose.

  15. @T - You're partially correct, but analytics shows trends that intuition simply never will. I want these tools to be used to help our teachers become better at what they do by identifying problem areas.

    And to address Paul's response to my point (I still think he's missing it!) -- analytics highlights outliers which enables us to recognize the work of a teacher that's doing a really good job even if the kids they're in charge of that year aren't necessarily delivering results. (And highlight the one that by all appearances is doing a fine job but in reality the whole class is underperforming...)

    A lot of this analytics stuff is very new. Today's computing power allows us to crunch data in such a small period of time now that we can get answers to questions that previously took far long to obtain than was worth the effort. Until you've actually worked with this stuff, it can be difficult to understand just how much information it can provide.

  16. M: I do get your point. I'm the left-brained computer geek who delights in trying to torture meaningful correlations from the CUPP Report, the Five Year Forecast, and the CAFR.

    I'm just not so confident that we have yet figured out what data to capture, and what it means when we do.

    I remember a few years back being fascinated by the story of the search for the so-called "Top Quark," a subatomic particle that the physicists believed had to exist in order for the family of particles to be symmetrical and complete.

    Both CERN in Europe, and FermiLab here in the US wanted to be the first to find it. So they designed experiments they could run on their huge particle accelerators, and an important part of the experimental design was having sensors that could collect a enormous amount of data in a tiny fraction of time. Think of it as an incredibly high speed camera, with a shutter speed of billionths of a second. And they would have to capture millions of 'frames' to have any chance of capturing an event that threw off the distinctive trail that a Top Quark should make.

    So both labs ran the experiment, and it was over in the blink of an eye. Now they had a mountain of data, and the trick was to find one 'frame' out of millions that had the evidence of a Top Quark.

    Both labs formed groups to devise strategies for sifting through the data. They'd say things like "We should be able to eliminate all observations in which X was true," and then debate for weeks as to whether that would throw the baby out with the bathwater.

    Eventually they got down to tens of thousands of observations, and weren't confident that they knew how to pick out the handfuls that had the most promise.

    The FermiLab guys devised an interesting approach. The they designed a computer app to display on a screen a three-dimensional representation of key data points from a single frame. Then they had a group of grad students each look at a few hundred screens, trying to find something that looked something like what the physicists thought a top quark signature should be. Meanwhile the CERN folks kept trying to use computer-based analysis to find the evidence.

    The FermiLab team won. Their strategy to give up on the computer analysis and to use the power of the human brain to recognize 'looks like' patterns was the better approach.

    By the way, it took a year or more for them to get to this point. The experiment took milliseconds, the data analysis years - by some of the smartest and most experienced scientists on the planet - whose specialty was running exactly this kind of experiment and doing the data analysis.

    We have nothing like that in the public schools. Yes, the computing power has increased many orders of magnitude since the Top Quark experiments. IBM's Watson might well have been able to do the analysis in hours, after a few months of adapting its software to the nature of this challenge.

    The real challenge is the data gathering. We can't put big subatomic particle detectors in each classroom, able to collect vast numbers of data points over a long period of time. So how do we, as Heisenberg observed, capture the many data points produced by the system without having the measurement process itself distort the results (ie "teaching to the test")?

    An old friend of mine, and a pioneer in the computing industry, said in the early 1970s: "computers are devices created to allow men to make mistakes at ever increasing rates."

    That's what I fear - misusing the vast computing power available to us these days to generate cool looking, but absolutely false conclusions. We need to spend a lot more time on the experimental design before trying to draw conclusions.