Teacher Evaluations – The good, the bad and the ridiculous

On May 5, 2014, the Texas Education Agency posted some details on the new Teacher Evaluation and Support System (TESS) that has been submitted to the US Department of Education. This submission was required under the terms of Texas receiving a waiver from many of the more extreme requirements of No Child Left Behind.

This new evaluation system is slated to replace the Professional Development and Appraisal System (PDAS) that has been in use by over 80% of the districts since 1997. Commissioner Michael Williams justified this change calling PDAS ‘outdated’ and of little value of providing real feedback to educators.1 While I have no opinion on the ‘outdated’ nature of PDAS, my concerns with either PDAS or TESS will be outlined below.

According to the details posted on the TEA website, 80% of TESS will be rubric based evaluations consisting of formal observations, self assessment and professional development across six domains. These six domains have innocuous titles and each contain anywhere from three to six sub-categories that, in turn, have multiple bullet points.2 I do not have an issue with having some kind of rubric or instrument…indeed I do not believe most dedicated teachers would have any concern over having an evaluation instrument. As professionals, we tend to crave constructive feedback and are anxious to continually become better at our craft. The concern, in my opinion, lays not in the what, but in the who.

The administrators who are tasked with conducting the evaluations must have adequate training – and not just in how to do the evaluation itself. Ideally, the evaluators should be knowledgeable in the subject they are observing in addition to having a strong background in pedagogy, preferably in the form of extensive experience in the classroom. In my early years as a teacher, I was observed and evaluated by my principal who had an advanced degree in science and had spent many years as a classroom teacher. Some of those evaluations were tough to discuss, but they provided valuable feedback and helped me become the teacher I am today. In the last couple years, I have received ‘exceeds expectations’ on formal observations from two different administrators. That must mean I have markedly improved my teaching from those first couple years, right? Well, yes, I have…but I don’t think it tells the whole story.

I have had administrators tell me that they had no idea specifically what I was teaching that day but that it ‘sounded good’. I have had an administrator give me ‘exceeds expectations’ after observing a math lesson when the same administrator had previously asked a colleague how to find a percentage given two numbers. My apologies if that ‘exceeds expectations’ didn’t mean a whole lot to me personally. Don’t get me wrong, I’ll certainly put those evaluations on every job application I ever fill out. But I would sure like to know how an administrator that was highly qualified in math would evaluate those same lessons.

The remaining 20% of TESS ‘will be reflected in a student growth measure at the individual teacher level that will include a value-add score based on student growth as measured by state assessments.’ These value added measures (VAMs) will only apply to approximately one quarter of the teachers – those that teach testable subjects/grades. For all the other teachers, local districts will have flexibility for the remaining 20% of the evaluation score.1 While the lack of consistency between districts for the 75% percent of teachers not under the VAM umbrella is a point of concern, the following discussion will focus on core teachers that will have a part of their evaluation come from VAMs.

In assessing the veracity of using VAMs to evaluate teachers, Jane David, with the Association for Supervision and Curriculum Development (ASCD) analyzed several studies. Ms. David addresses several concerns with regards to fairness of VAMs such as if “test score gains are biased because students are not randomly assigned to teachers”, if different assessment instruments could lead to dramatically different effectiveness ratings and the overall stability of effectiveness ratings over longer periods. Research seems to conclude that each of these concerns is valid and should be considered when making a final decision on whether to include VAMs in teacher evaluations. Ms. Davis also notes two different studies that suggest VAMs do not do a substantially better job of evaluating effectiveness when compared to standard, subjective evaluations.

On April 8, 2014, the American Statistical Association issued a statement strongly cautioning against the use of VAMs for high-stakes decisions in the education realm:

The American Statistical Association (ASA) makes the following recommendations regarding the use of VAMs:

• The ASA endorses wise use of data, statistical models, and designed experiments for improving the quality of education.

• VAMs are complex statistical models, and high-level statistical expertise is needed to develop the models and interpret their results.

• Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.

o VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.

o VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.

o Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.

• VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.4

The ASA statement is careful to note that there has been consistent research that growth that is both measurable and related to the classroom teacher actually makes up a small percentage of total variation. It is also established that VAMs have large standard errors, making “rankings unstable, even under the best scenarios for modeling,”4 even when multiple years of data are taken into account. Finally, while VAMs may provide families, schools and districts with general areas that are in need of improvement, they do not provide a way to actually make that improvement a reality.

The underlying mathematics of VAMs would be considered esoteric by the vast majority of people. Heck, I have a degree in Mathematics, consider myself to be a pretty intelligent guy, and I don’t fully understand the intricacies of these things. I am, however, aware of the substantial error that is inherent in complex models. This error is only exacerbated by adding multiple levels to the model (student to classroom to school). A friend and colleague of mine who is nearing completion of his PhD in Education Psychology with a concentration in Measurement, put it this way: “The models are sound, but, in my opinion, only in the natural sciences and perhaps some of the social sciences. We are talking about human beings as it relates to standardized testing, and it just isn’t as easy to predict.”5 Because of that I would not feel comfortable being put into a position of making high stakes personnel decisions based, even partially, on VAMs. Imagine a Human Resources Director, with little to no upper level math or statistical background, making those same decisions to retain or terminate a teacher.

I remember having a conversation with Karen Lewis, President of the Chicago Teachers Union, at the national conference of the Network for Public Education that was held in Austin, Texas, in March of this year. She agreed that the vast majority of educators want constructive feedback, almost to a fault. As long as the administrator is well trained and qualified, a rubric based evaluation should be sufficient to assess the effectiveness of a teacher. While the mathematical validity of value added models are accepted in more economic and concrete realms, they should not be even a small part of educator evaluations and certainly not any part of high-stakes decisions as to continuing employment. It is my hope that, as Texas rolls out TESS in pilot districts in the 2014-2015 school year, serious consideration will be given to removing the VAM component completely.



1 http://www.tea.state.tx.us/index4.aspx?id=25769811000

2 http://txcc.sedl.org/our_work/tx_educator_evaluation/index.php

3 http://www.ascd.org/publications/educational_leadership/may10/vol67/num08/Using_Value-Added_Measures_to_Evaluate_Teachers.aspx

4 http://www.amstat.org/policy/pdfs/ASA_VAM_Statement.pdf

5 Text message from colleague May 12, 2014



  1. James, thanks for parsing all this out. I think you’ve captured some of my concerns, too. As a former administrator, I used PDAS for years, and now provide some limited training for evaluators. I’ve seen it really poorly used (just check marks, no pre- or post-conference, no critiques or constructive criticism), and I’ve seen it (more rarely) used well. What I found somewhat disingenuous in the TEA letter today was the charge that T-TESS would provide good feedback, while PDAS was “outdated,” “not focused on student learning,” and evidently incapable of supporting good feedback cycles. Outdated, perhaps–I mean, nothing policywise in Texas lasts much over 10 years for any reason. Not focused on student learning… well, most of the domains in PDAS are linked to learning and engagement; now, if the Commissioner means “learning” as synonymous with “test scores” then no, that was not a significant piece (1 of 54 indicators). If “learning” means “aligned with curriculum,” “engaged,” “giving & receiving feedback,” etc., then I’d say PDAS was at least up to the task (irrespective of whether those using it were). Feedback? That again depends on the users.

    You also make a great point about content and pedagogical knowledge. We spend relatively little time ensuring that great teachers become great administrators who can then mentor and supervise (instructionally) those in similar fields. If the evaluator really doesn’t know what she’s looking for/at, then that is really problematic.In this vein, VAM REALLY unnerves me, because someone who is ignorant of what good science instruction is like can just lean on a number–not even understanding issues like error, or stability, or whether a class is stacked in some way–and think she made a good employment decision, when the decision could have been terrible.

    Like you, I think teachers (and administrators) need good, strong evaluation. I used to kind of scoff at the ivory tower and tenure, and now that I’m in that world, I see that sure, there are some folks who get tenure then coast. But most (in my experience, which is limited) continue to work really hard. And the PROCESS–my Lord, the process. Each year I have to reflect intently in multiple domains and make my case to all my tenured colleagues (whom I work with daily). I have to provide evidence, which includes not only how I taught (plans, syllabi) but also student feedback, peer feedback from observations in my graduate courses, materials from presentations, manuscripts, etc. I actually wish I’d had something like this in EC-12 where I could know the expectations, then work toward those under both pressure and support from proven veterans. If TCU keeps me after six years, it’ll have been the most protracted dating process ever, and they’ll know exactly who they’re deciding to keep long term. Until then, I work under one year contracts, but with little fear, because I get so MUCH feedback, and because norms have been established to make that feedback constructive. I believe they want me to succeed, and will push me to do so. I wish we had a little more of that and a little less VAM talk for my friends still working in EC-12 public schools.

    Thanks for all your work beyond the classroom. Looking forward to more idea exchanges and to learning from and with you!

  2. This school year I watched an administrator conduct a PDAS evaluation on a teacher. It included nothing but negative comments. There was not any constructive criticism, just negativity. The odd part was that prior to this formal observation the same administrator had made positive comments in walk through about the same thing that are negative in the formal observation. There must be consistency and an effort to help teacher improve, not just a fake witch hunt to satify a quota that there must be certain percentage of “bad” teachers to weed out each year.

    It is time to help our teachers.


  3. I taught for thirty years (1975 – 2005) and to think that my teaching might have been judged by the standardized test results of the students who flowed through my classroom makes me shudder.
    For instance, a third to half of my students were usually failing the English classes I taught because they didn’t do most of work (and some did none) and that includes classwork and home work. In addition, most of the students I worked with hated reading and never read outside of class and some even resisted reading in class deliberately coming to class without books, paper or pens to write with. In the street gang culture that dominated the streets around that high school anyone who did that was considered a “schoolboy” and that was an insult.
    But there were always kids in every class that I taught who did do the work and earned A’s and B’s. These few also were the few who usually did well on the Standardized tests.
    For seven of those thirty years, I ended each day with one period of journalism where every student did all the work producing a national, international and regional award winning high school newspaper. Most of the grades earned in that class were A’s with a few B’s. The work load for those students was horrendous with the kids showing up as early as 6 AM and leaving the campus after the alarms were turned on at 10 PM. These kids were readers and writers and they wore the invisible insult of “schoolboy” as if it were the Medal of Honor and almost every one of them would end up going to college. Every school has high performing students but every school also has a ratio of at risk kids.
    And when the school has a lot of kids who live in poverty, then the ratio tilts heavily toward the at risk population—the most difficult to teach becasue it is this group that often refuses refuse to learn what teachers teach.

  4. Gustavo Gonzalez-Contreras · · Reply

    I’m glad a had you as a teacher, Mr. Hamric. Hope we both see each other someday in the future. It has felt like a very long summer so far, but it sure was a good one so far. I was always thinking of how I should contact you, but I could never find a way how. One way was to call, but my mom laundry washed the paper I wrote your number in. I decided to reply on your blog, so I may contact you that way instead. I haven’t gotten a new phone, but my mom sure did. When you called last time my moms calling records didn’t seem to save the numbers. The miscalled phone numbers would just simply disappear. So my mom told me to ask you if there was any way to let you give her your number.
    NOTE: By the time you finish please text at 540-435-5039
    Gustavo Gonzalez-Contreras
    P.S. Thank you for being a great Teacher , Advice Giver , and a great FRIEND!

  5. […] A fellow blogger, James Hamric and author of Hammy’s Education Reform Blog, emailed a few weeks ago connecting me with a recent post he wrote about teacher evaluations in Texas, titling them and his blog post “The good, the bad and the ridiculous.” […]

  6. […] A fellow blogger, James Hamric and author of Hammy’s Education Reform Blog, emailed a few weeks ago connecting me with a recent post he wrote about teacher evaluations in Texas, titling them and his blog post “The good, the bad and the ridiculous.” […]

  7. NY has burned through $46 million putting in an eval system based 60% on observations and 20% on state tests and 20% local measures (more prefab tests).

    The governor declared the system a failure after 94% of teachers ended up effective or better. Cuomo wanted the ineffective rate higher, but a study revealed districts were inflating the 60% to compensate for low scores on the other 40%.

    This is because school district reputations affect property values and no one wants their teachers to be considered poorly. One Westchester district gave all teachers 60/60 points in protest and forbade teachers from seeing their other 40%.

    Though this is technically cheating, superintendents just want a chance to discuss the inaccuracy and absurdity of the evaluation system.

    One hilarious provision is the requirement for teachers who do not teach Math or English. Those who teach Gym, Art, Music or a foreign language are told to pick Math or ELA scores for their ratings. Can anyone explain how test scores in Math demonstrate the quality of the music teacher? This is real money being spent on this…

  8. I am a fellow blogger that is just now extending into the world of other bloggers and educational reform to see if I can contribute. Personally, I am not an advocate of school administrators evaluating teachers for a number of compelling reasons.

    Texas is one of the few states I am aware of whose payroll hierarchy compels teachers to think upward mobility means leaving teaching and moving into administration. Instead of pay increases for the purpose of incentivizing honing ones craft while staying in the classroom, pay increases for more education is negligible so teachers leave teaching and move into an office.

    At a doctors office, the office manager would never assume to direct patient care or advise the physician.

    There are of course many administrators that were previously teachers that are excellent administrators. However, if you are currently not in the class room, the disconnect of the reality is not something a person stays in touch with when they leave teaching. In my experience often times administrators that were teachers seem to forget what it is like.

    There is an argument that 10,000 dedicated hours are required to attain professional excellence and assumes sustained devotion and quality time. There are an incredible number of school administrators that attained their advance degree in approximately five years while teaching at the same time. Many of those do not teach the same subject or grade the entire time. 10,000 hours can be perceived as five years of full time.

    I do not know that I believe an administrator who left the classroom after five years would be evaluating professional educators because they knew pretty early on that they did not want to teach. Furthermore, many were probably were not in one teaching environment the entire time further undermining the achievement of professional educator.

    A system whereby master teachers are recognized and evaluators, advanced education is rewarded, peer tutoring and evaluating is the norm, and administrators are removed from educator evaluations and moved toward being primarily facilitators would be my choice.

    1. I saw the same thing in California. Most administrators only taught as long as they needed to add classroom experience to their resume and then they left the classroom as soon as possible to lead schools and districts, and most of the teachers I knew who did this also had trouble managing students and maintaining a learning environment in the classroom. Then there were the teachers who escaped the challenges of managing a classroom of children by moving up and in to administration. They didn’t start out to become administrators but changed their minds when they discovered that the challenges of the classroom teacher never go away or diminish.

      Teachers who can’t manage and control a classroom will be devoured by the most challenging, at risk children and those children will set the mood in the classroom to the point that it is a challenge for the good students to learn.

      I think all administrators should be required to teach one or more academic classes a day, but not just any class. They should teach a class of students who are the most at risk and challenging students to work with.

      In fact, whenever I read that a candidate running for public office lists in their campaign literature that they taught in the public schools, I want to know how long they taught before they left the classroom. Five years or less and I don’t think of them as a veteran teacher. Ten years or more of full time teaching, and that usually wins my respect.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Dad Gone Wild

My thoughts on public education and the world we live in.

Teachers' Letters to Bill Gates

Educators from the US and beyond: please share your teaching stories with Mr. Bill Gates. How have the policies of the Gates Foundation influenced your classroom, your students, your teaching, your schools, and your communities?

Crazy Normal - the Classroom Exposé

An insider's look at education, teaching, parenting and coming of age.

David R. Taylor-Thoughts on Education

My thoughts about public education in the State of Texas

Cloaking Inequity

A blog focused on education and social justice

The Mafia of Good Intentions

Addressing the Root Causes of America's Underperforming Pubic Schools

Dog in Balance

the DOG blog that can help YOU

The Mind of Hammy

Thoughts and opinions on just about anything

The WordPress.com Blog

The latest news on WordPress.com and the WordPress community.

%d bloggers like this: