25 August 2011

On surrogates and assessment

I'm starting off with two observations about quite different things.

The first is this one from Ben Goldacre's brilliant "Bad Science" blog at the Guardian. His topic concerns a press release about a potential new treatment for Duchenne's Muscular Dystrophy.
"...this story is also a reminder that we should always be cautious with "surrogate" outcomes. The biological change measured was important, and good grounds for optimism, because it shows the treatment is doing what it should in the body. But things that work in theory do not always work in practice, and while a measurable biological indicator is a hint something is working, such outcomes can often be misleading."
And later...
"...improvements on surrogate biological outcomes that can be measured in the body are a strong hint that something works – and I hope this new DMD treatment does turn out to be effective – but even in the most well-established surrogate measures, and drugs, these endpoints can turn out to be misleading."
A fairly basic point, of course, but it did set me thinking about how the extent to which we have become obsessed with measurement in many fields, including teaching and learning, had led to increasing reliance on fairly dubious surrogates.

And then I came across this commentary on Standard and Poor's revision of the the US credit rating:
"[Ratings agencies] ..are human enterprises, fallible institutions—and like other institutions, they have procedures, interests, and histories. Their records deserve inspection. In the scientific spirit, in the spirit of show me, they deserve scrutiny."
A credit rating is a complex construct (I presume). Since it is supposed to have predictive value (other than merely being part of a self-fulfilling prophecy), it must be put together from a raft of surrogate measures, presumably of directly observable factors which co-vary with an institutions credit-worthiness. But it is only as good as the choice of those surrogates*.

Which led to some general thoughts in relation to education.

Today is the day when GCSE results are published (the exams taken at age 16 by practically all pupils in the UK). The press stories are predictable, suggesting grade inflation and the exams being dumbed-down. (Or of course if the pass-rates were not an improvement on last year, there would be jeremiads about further decline in educational standards...) The press discussion will not be sophisticated, but it will at least acknowledge what the politicians and the educational establishment will deny, namely that the examinations are not realistic proxies for educational achievement.

This is leaving aside the issue of the tail wagging the dog, of "teaching to the test" without ever asking whether the test is valid or reliable. Beyond that, the logistics and practicalities of mass assessment distort the process, and it has ever been thus. When Liam Hudson (1967) discussed convergent and divergent thinking styles, he noted that convergent thinking was privileged in school at least in part because its testing could be standardised.

But these artificial surrogate assessments are increasingly separating the formal educational system from the "real world", particularly that of communities of practice. This is not an original observation; while I'm on "golden oldies", I'll refer to Becker's wonderful 1972 paper School is a Lousy Place to Learn Anything In, which is based inter alia on a similar argument.

It is out of an awareness of the intrinsic limitations of such surrogacy that a course on teaching with which I have long been involved has attempted to develop a more authentic assessment strategy. Of course, teaching courses have always routinely involved direct observation of teaching, but not everything is amenable to direct observation. The traditional solution on most** other courses has been set assignments; our course moved away from that to negotiated submissions based on a learning contract. Learning outcomes are specified and students decide, in consultation with a tutor, of course, what evidence they will submit to demonstrate that the outcomes have been met. This is a step closer to reality, but of course only insofar as the specified learning outcomes correspond to the real world.

The course has just been internally reviewed for routine reasons, and it is apparent that the bureaucrats hate the assessment scheme. Work is not graded, for example. The scheme is not suited to anonymous submission, because the students are talking about their own practice and work setting (it is an in-service course). Not all work is suited to electronic submission via Turnitin.... the list of complaints goes on.

The real problem is that validity, reliability and fairness--the traditional requirements of an assessment scheme--are now subordinated to standardisation, administrative convenience, and security***.

These are considerations for the legitimation of surrogates and proxies--the same kind of consideration as applies to the regulation of second or third-order derived financial instruments which no longer bear any relation to buying and selling stuff which is any actual use.

* I am not relying entirely on a single blog-post here! See also Dan Gardner's excellent and accessible Future Babble: Why expert predictions fail and why we believe them anyway. (London; Virgin Books, 2011) It's a great corrective to all the doom and gloom surrounding us. Incidentally, he draws a lot on the work of Philip Tetlock, the subject of this interview by Jonah Lehrer in Wired.

** Most but not all, our approach owes much to work at the University of Huddersfield, particularly in the early '90s.

*** Security in the sense of not being vulnerable to plagiarism, although the emphasis on discussion of one's own practice and production of examples and resources means that the approach is fairly protected in any case.

Becker H (1972) “School is a Lousy Place to Learn Anything In” American Behavioral Scientist (1972):85-105, reproduced in R G Burgess (ed.) (1995) Howard Becker on Education Buckingham: OU Press
(Update later today: Many thanks to David Stone, who writes; "I was happy to discover that my institutional subscription gave me access to the original Becker article. Just in case others should be as lucky, here is the DOI link:
http://dx.doi.org/10.1177/000276427201600109 ")

Hudson L (1967) Contrary Imaginations; a psychological study of the English Schoolboy Harmondsworth: Penguin

1 comment:

  1. Andrew Keir12:07 am

    validity, reliability, fairness; and in the red corner, standardisation, convenience, and security. Bring it on!
    I set questions at 'A' level, and mark them; I have some experience with the compromises that drive the 'agreed' mark schemes; and I've been a spear-carrier at some epic battles between experts (and they mostly were) on the subject matter, examiners (PBI trying to set an effective yet fair paper) and the exam board personnel, trying to avoid excoriation in tomorrow's Daily Mail. Not a pretty sight. But when push came to shove, there has usually been someone sensible to force through something honourable yet, on the whole, effective. But the system relies (as all systems do) on honourable people. Words are never enough - especially in mission statements and guidelines.

    I think I'm going to have to subscribe to your blog - you have revivified my interest in that form of research that involves real people. Thank you for that.


Comments welcome, but I am afraid I have had to turn moderation back on, because of inappropriate use. Even so, I shall process them as soon as I can.