I tip up front. Actually, I know a lot of people who occasionally do this, but to my knowledge I’m the only one with a managerial principle of statistical validity to back up ‘why’.
It stems, originally, from a an article by W. Edwards Deming called “On Probability As a Basis For Action”, (The American Statistician, November 1975, pg. 150).
Attractive title huh? Well, it’s profound, if misunderstood. In it Deming explains how data can be gathered in different ways, and used to different ends, and how people are not well enough versed in statistics to understand when they are using one type of data to show something that cannot be shown. He calls these two types of data ‘enumerative’ and ‘analytic’.
Enumerative data is the information you gain after the fact. The census is a great example. We survey people after an event and then tabulate the data to show what they said.
Analytic data is information about a process. It shows, often with math, ‘how’ something is done, how the process works, what steps cause effects, and what those effects then cause in a chain of reasoning.
In my own words I like to think of enumerative data as showing ‘what’ happened, and analytic data as showing ‘how’ and/or ‘why’ it happened.
Back to the census. To make my point I did a quick google search of 1999 census. I immediately found that 12.4 percent of American people, (33.9 million) reported income in 1999 that was below the poverty line. I scrolled down and quickly found a chart that shows States and Regional Poverty Rates: 1989 and 1999. The poverty rate fell the most in Mississippi, -5.3%. The biggest rise in poverty was in Washington D.C., +3.3%.
Interesting numbers to be sure. But what can we glean from these numbers? Here are a few examples of silly conclusions: [These conclusions are false!]
1. Poor people from Mississippi must have moved to the D.C. area in large numbers.
2. Poor people in D.C. must be killing the wealthy people in very large numbers thus resulting in a shift in the ratio of those in poverty.
3. Mississippi’s new job creation initiative must be working for them to show such substantial gains.
Now of course, these conclusions are patently false. But what if we find some ‘other’ set of data that ALSO relates to this issue. Then can we tie the two together and show they support each other?
No. The fact that they don’t make sense is not ‘why’ the conclusions are false. Rather, they are false because they are using enumerative data to show the ‘cause’ of poverty, analytic conclusion.
In other words, we used data from type A, to make a conclusion of type B. This CANNOT be done. It means the conclusions above are false on their face, without any need of analysis. The same kind of error can be made in reverse. We can perform an analytic study which we then use to claim an enumerative conclusion. We can’t do that either, but we won’t discuss it here as it doesn’t relate to ‘why I tip up front’.
Okay, we’ve shown how enumerative data, collected after the fact, CANNOT be used to show the ‘cause’ of the data.
What if we collected the data more often? THEN could we use it to show a trend of causes? No. No matter how often we collect enumerative data it CANNOT show the cause of the results. It is not simply ‘difficult’, it’s actually impossible.
So, in what other arenas do we collect data after the fact and then blame the result on a cause?
Public schools of course. We stick little Johnny in a class, ask him to perform, then we test him at the end. He gets a score, and we say “Wow, Johnny did great, or “Oh my, Johnny is not too bright is he?” You see? We attribute his ‘score’ to his ‘intelligence’. Directly. Without question.
In December of 2001, Valerie Strauss, writing for the Washington Post, wrote an article called ‘Revealed: School board member who took standardized test’.
It describes the experience of a school board member who took the test and scored dismally despite having 3 degrees and educational experience. At the end of the article he mentions how the questions on the test are simply not valid for today’s ‘working life’. The test is not testing skills and knowledge actually used in real life.
He may be right. But only to a point. Even if the test had been perfectly aligned with knowledge and skills kids actually need in order to function, the test is still not valid. Why? Because it is attributing ‘cause’ to data collected ‘after the fact’. It’s claiming to be ‘analytic’ when the data gathered are ‘enumerative’.
This cannot be overstated.
We ‘blame’ the kid taking the test, 100%, for his/her performance on the test.
Recall, perhaps, the same author’s article on principals and staff evaluations which tie pay to student performance.
Here is the article showing how New York educators are up in arms about a system of evaluation that ties teacher AND principal pay to student performance on standardized tests.
Hang on to your booties, here is where it gets interesting.
The staff justifiably, as we now know, argue that connecting pay to performance in this way cannot be correct. Student performance is affected by so many different factors that blaming staff and administrators for the outcome is just not reasonable in any way.
Did you get that? Student performance varies so widely, from factors out of the control of staff and administrators that educators cannot be expected to be evaluated on how their students perform on the test.
And yet, correct me if I’m wrong, these very same teachers and administrators say nothing of subjecting these students to these tests, and then evaluating the students on their performance. Hmmm. Johnny did great, see? It says so right here! Oh my, Johnny lost some ground this year compared to last year, so he’s really going to need to buckle down this coming year to make up HIS losses.
That CAN’T be right.
They are essentially saying that the school has little or nothing to do with how well Johnny did on his test. OR, at best, they are saying that it is impossible to tell which part of Johnny’s grade had to do with the school, and which had to do with Johnny’s effort.
That last part’s true. The differentiation of A-school, B-Johnny’s effort, his C-home life, the D-bus ride home, the E-weather, or any sort of F-learning difficulty he might have, it’s impossible to tell which factors play how much of a role in his academic success.
Why is it is impossible? Let’s see, on the face of it it’s easy to grasp. We are trying to determine why Johnny is successful. Let’s say his success is a result of the equation:
A+B+C+D+E+F = 100%
Now B is Johnny’s effort. No one can take an equation with two or more variables and solve the value of the variables. Let’s simplify this some to make the point really clear.
Lets call X Johnny’s effort, and Y the school’s effort to help Johnny. So:
X+Y+XY = 100. That is, Johnny’s effort plus the school’s effort, plus the result of Johnny working with the school’s system, equals the output, or level of Johnny’s success.
Okay, now we have one equation with two variables. How do we figure out X and Y? We can’t. It’s an algebraic law, you cannot solve an equation with two variables if you only have one equation. No one can. (Even my 9th grader interrupted me to point this out).
Quickly, let’s reflect. The staff in New York then are claiming something like this. You can’t evaluate us on Johnny’ performance because you can’t solve for Y. Yep. That’s true.
But they then turn around and pretend there is no Y. They blame Johnny for 100% of his success or failure.
Put another way. If a teacher has a year when her students do more poorly than usual, she can say to herself, ‘my, this year the kids just weren’t as smart as last year’. Almost plausible. It even sounds typical. But why couldn’t the teacher say, ‘Oh my, my system did not work as well as I’d hoped. I’d better see if I can improve the way I do things to better meet their needs’?
In one she’s attributing success or failure to the students. In the other she attributes it to herself. Two completely different invalid conclusions based on exactly the same data.
Wait! Invalid? Why invalid?
Well, remember that thing about ‘enumerative’ data being used to show analytic results? That can’t work. Here we are doing that same thing, either way she looks at it she’s using enumerative data to support an analytic conclusion. That CAN’T be correct.
Now, remember the school board member who suggested that new and better questions were needed in order for the testing to be valid? He obviously does not grasp the meaning of the difference between enumerative data and analytic studies. If he did, he’d say ‘Standardized tests are not valid evaluations of achievement because they are not VALID.’
It is impossible for the tests to show the ‘cause’ of the student’s success. Did the student do well because he was smart? Or because he went to a great school? Or because his parents both went to college? (This is the number one most accurate predictor of academic success by the way).
Or did the student do badly because his family’s first language is not English? He received insufficient help with his disability (as mandated by law?). Was he improperly evaluated early in school and then passed on to grade after grade without the skills to succeed due to a broken system? Did his parents complain too loudly about a teacher who then chose not to spend the time with him he/she could have? Is he suffering from an economic or racial bias from the staff that may not be measurable but may be having an effect anyway?
The same argument can be made against performance evaluations at work. It is difficult, at the very best, to form an evaluation that has any meaning, and whose time it takes to perform could not have been better used in providing training, coaching and leadership to help employees improve their work.
Oh? Really? People at work, who get evaluated, often in order to get a raise? Those are invalid too? Absolutely invalid.
And that brings us right round, doesn’t it?
Why do I tip up front?
Because judging my waitress cannot be valid. I cannot correctly blame my waitress for the quality of my service.
So, I calculate 20%, an estimate that is, and I pay him/her that many dollars when I sit down. I tell them, ‘It doesn’t really matter ‘why’ I do it this way. If you’re really curious later you can ask me and I’ll explain.’ They smile, quirky crazy guy on table 6 already tipped me. Then they go about their business.
I can hear it being said, (actually I’m remembering all my friends), “What if you get crappy service? Then what?”
If you’re asking this, then you didn’t really grasp the point. I don’t do it to get better service. I do it because it ‘helps’ the system. I am going to tip her. If I wait, she has the ‘threat’ that I might tip her badly hanging over her the whole meal. If I tip her up front, at least she knows she doesn’t have that to worry about. She’s then a little bit less stressed out about her tips, and everybody’s service improves, a teeny bit.
More importantly, I release her from the fear that as her customer, ‘I’ am going to make an impossible (and unfair), and mathematically flawed judgment of her performance and then punish her for it (unjustly).
Try it a few times. See what happens. Regardless, let go of the idea that you are ‘rating’ your server for her performance. Because that’s just not possible.
For more information on performance appraisals:
Or grading in schools: