Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes

Duckworth, A.L., & Yeager, D.S. (2015). Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher, 44(4), 237-251.


The challenge of educational assessment is finding ways to “capture” the things we want to measure in valid, reliable, and ethical ways. That is especially challenging in education because most of the things we are trying to capture are impossible to observe directly; our assessments only provide an imperfect reflection. How do you really observe things like learning and motivation? Those things are difficult enough to capture, but what about the even more elusive types of personal qualities, such as “self-control”, that seem to relate to learning and success?

This article is recommended by me, not because I agree with everything in it, but because it provides a clear, thorough discussion of the limitations of commonly used assessments. In this case, two kinds of assessments used to measure what the authors call “personal qualities” are discussed, namely, self-report questionnaires (both student-completed and teacher-completed) and performance tasks. There is an excellent discussion here of the potential pitfalls of both of these kinds of assessments. I particularly appreciated the clear treatment of the problem of reference bias—that is, the problem that different respondents to questionnaires (or possibly assessment rubrics?) may hold different conceptions of what the various response choices mean. For example, respondents might hold different concepts of “Frequently” or “Sometimes”. Different scorers of performance tasks may have different ideas of what a score at each level of a rubric might mean. The discussion here helped me understand better why some of the assessments we use at my university don’t seem to be as reliable as we would like them to be. The next time I use those assessments, I will think about what this article helped me understand about why.

Following the discussion of assessment limitations, there is an excellent discussion about how the purposes we use assessment data for can affect whether an assessment is a good choice or not. This discussion needs to be had in all teacher preparation programs, and this article would be a good place to start. As always, the answer is multiple assessments; when we need to capture a difficult-to-observe construct, it is best to look at it in more than one way. The combined data from several assessments may to some degree balance out the limitations of each assessment.

There are some areas of the article that gave me pause. The authors spend a good deal of time at the beginning of the article talking about the words that are often used to describe personal qualities (e.g., noncognitive factors, character skills, social and emotional learning, traits, dispositions, to name a just a few). In the end, they settle upon the term “personal qualities”, and in effect, say that finding the right term is not really that important, since the “attributes of interest” are “generally accepted as beneficial to the student and to others in society” (p. 239). Though I do agree that we can get bogged down in terminology and never get to the real issues, the words we use to describe the things we measure are very important. Moreover, I am not sure that some of the personal qualities that are discussed here are really “generally accepted as beneficial”. The authors here focus on the personal quality of self-control. Can we all really agree about what self-control is, and what kinds of behavior and attitudes are part of that? What might be a “beneficial” kind of self-control to one person might be seen as “suppression” by someone else.

What’s more, the article does not even really deal with the idea that definitions of “beneficial” qualities such as self-control are highly dependent upon one’s cultural background, and perhaps on one’s social class. For example, a middle class, white, female teacher may see the behavior of the young African American males in an urban high school as lacking in self-control because that behavior is different from how that teacher conceives self-controlled behavior. To say that we have consensus on qualities like this simplifies the issue, and I worry a lot about discrimination here. Sometimes I wonder whether measuring personal qualities is an area we really should be entering as educators.

Even if we set aside the cultural bias problem, it is clear that those who are being assessed in our assessment-driven society soon learn how to “game” the system and produce responses that comply with what their assessors think are “beneficial” behaviors. That issue is discussed in this article, though it does need to be discussed in more depth than it was here. When the stakes are high, people will learn to produce compliant responses, whether on a questionnaire or on a performance task. We cannot really see personal qualities like self-control; we cannot measure them like height or weight. The issue gets even more complicated when we think about whether some people may have the tools to “game the system” better than others do. The problem of discrimination once again rears its head.

Assessment is a difficult thing. There is no way that these writers could have fully discussed everything that needs to be discussed; that discussion may never be finished for good. However, they do provide us with a starting point for thinking about what we do with assessments and how we use that information.

This quote from the article’s first page summarizes the core of what Duckworth and Yeager are trying to say: “In this essay, our claim is not that everything that counts can be counted or that everything that can be counted counts” (p.237). Their point is more that when we do decide to count something, we need to find the best ways (note the plural) to do that. That is an ongoing quest in education.

No comments:

Post a Comment