My Journey to Compassionate Assessment

Thanks to funding from the QAA, Vikki Hill, Liz Bunting and myself are setting up a network of colleagues interested in compassionate assessment. The aims of the network are to support each other in bringing about more compassionate practices and policies in assessment in the HE sector. We want to share good practice, resources and policy innovations. If you are interested in joining us please join the JISCmail list https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=COMPASSIONATE-ASSESSMENT&A=1 As a starter, I thought I would share some thoughts on compassionate assessment and why I think it is important.

Fairness or equity

The problem with HE assessment policy is that we apply blanket, identical rules that need to cover a wide range of diverse students. When university students were largely homogeneous (many years ago now) this was fine but this inflexible approach based on fairness disadvantages many. One of the examples I have seen often in my career relates to giving feedback on draft work. There are often institutional or local policies that dictate the number of drafts and the need to apply this ‘fairly’ across the cohort. For example,

“Feedback should be constrained by a specific word limit…unit tutors must consistently apply the agreed approach.”

“Formative feedback on students’ learning is an integral part of the curriculum and its assessment, and contributes to ensuring the integrity of the assessment process. However, only one instance of feedback on any final piece of work for submission is permissible.”

Some students arrive at university with a good grounding in academic writing and they are likely to need far less help than the many students who lack confidence and skill in academic writing. Is it really fair that everyone gets once chance at feedback regardless of actually need?

Where is our time and limited resource best spent? I think this is an example of where fairness gets in the way of equity and consistency overrides compassion. A compassionate approach to assessment would recognise difference and be flexible to the needs of students.

Stress, anxiety and wellbeing

A second concern is about the the stress and anxiety caused by assessments and how we might mitigate this. The purpose of awarding degrees is that we are certifying that students have learnt certain knowledge and skills but I feel that HE assessment regimes have lost sight of the human element of learning.

In research we conducted at the University of the Arts London, assessment stress and anxiety was by far the most prominent feature described by student when talking about assessment and grading. This being at a university which didn’t have exams! Exams seem to create an extra level of stress and anxiety. The recent exam issue at Bath University gives an insight into this. Note the contrast in the student concerns and the university concerns:

Student: “I think the university needs to understand the stress and anxiety, performances are definitely going to fall”

University: “To ensure quality standards are met…to uphold the quality and integrity of their degree.”

There seems to be no acknowledgement of the human cost and distress. Given the growing concerns of the mental wellbeing of students and data that shows that university students have a higher incidence of mental health conditions than the general population, this seems callous. However, this is no surprise, assessment policies and practices are usually completely divorced from wellbeing and mental health initiatives. Surely it is about time we addressed this with a compassionate approach to assessment?

Is this too radical?

What I have just written seems fairly common sense and humane. Yet, I get the sense that introducing the compassionate, human element to assessment policy and process to be on an equal footing to the quality and standards element seems a radical step too far for the sector. Please join us in trying to convince our colleagues otherwise!

Assessment Complexity

AI generated image by Dream.ai for keyword ‘complexity’

Much of the literature on assessment in higher education over the last few decades has focused on three broad strands. The first strand is a quality assurance strand that focuses on questions such as the reliability and validity of assessments, assessment standards and summative assessment. The second strand has focused on assessment design, the types of assessment used and how they impact learning. In this area there has been a shift to diversify assessment types and embrace authentic, work-relevant assessments. The third strand has focused on student assessment literacy and, latterly, feedback literacy. This relates to how students understand assessment standards and assessment types to learn and progress their learning. These last two strands have shifted the debate towards formative assessment and the interaction between formative and summative assessment.

The first and second strands have resulted in assessment regimes that are extremely complex. Students have to somehow integrate and consider learning outcomes, assessment briefs, multiple assessment types and assessment criteria / rubrics when producing their work. Then once the work has been graded students need to understand the grade, understand their feedback and deal with the emotional impact of assessment and their own expectations and motivations. The recognition of this has resulted in the third strand on assessment and feedback literacy. Assessment and feedback literacy is a necessary requirement in order to help students grapple with the complexities of assessment regimes.

All of these strands have emphasised the need for transparency in assessment processes so that students can understand them and demonstrate their learning. However, the complexity of the learning we expect in higher education can often be hard to make explicit and transparent. All too often implicit criteria are used to make judgments and grade students that are not specified in the learning outcomes, briefs or assessment criteria (Bloxham, Boyd & Orr 2011). Students and staff often expect some ‘reward’ in the grading process for effort or for progress made during a unit of learning (Brookhart et al 2016). Effort is rarely ever specified as an explicit criteria and tends to go against the purpose of learning outcomes designed to focus on what students can do not on how much time and effort they put in. Expression is often another implicit criteria. In written work students can be ‘punished’ for poor writing style or grammar even when neither of these aspects are explicit criteria. In art and design, if the student’s aesthetic or approach does not resonate with the marker this can result in lower grades even when the explicit learning outcomes and criteria have been met.

It seems that the complexity of assessment regimes are increasing. As we try to be more transparent assessment briefs get longer, rubrics become more prominent and it can be hard to resist the proliferation of learning outcomes without writing them in such a way as they become meaningless, empty statements. (As an aside, I was once asked to convert 5 learning outcomes into 2 for each unit I taught. As you can imagine there was only two ways to do this: very long outcomes with multiple clauses or shorter, generic outcomes largely devoid of meaning. Neither of which approach was for the benefit of the students.)

One solution that seems to be often overlooked is pass / fail or even ungrading. I will focus on pass / fail here as an easier first step from the dominant letter or numerical grading. I think that pass / fail as opposed to a letter or numerical graded system reduces the complexity of the assessment regime and should allow students to focus more on their learning and work. Firstly, pass / fail removes the need for complex assessment criteria / rubrics and the need to help students understand what they mean. Instead students can focus on meeting the learning outcomes. Secondly, pass / fail removes the letter or numerical grade. This should free up academics from having to spend time deciding and justifying fine-grained grading decisions and give them more time to focus on feedback. It could also help students deal with the emotionally impacts of grading, which can often be a cause of stress and anxiety. Thirdly, pass / fail should reduce or better mitigate the use of implicit criteria. With more focus on whether the learning outcome has been met, fine grained judgments about expression or subjective judgments about effort should be reduced.

By increasing the use of pass / fail and reducing the amount of letter or numerical grading there is a chance that we reduce assessment complexity, better support student learning whilst still being able to maintain authentic assessments and the rigour and validity of the assessment regime.

References:

  • Bloxham , S., Boyd, P. & Orr, S. (2011) Mark my words: the role of assessment criteria in UK higher education grading practices, Studies in Higher Education, 36(6), 655-670.
  • Brookhart, S.M. et al (2016) A Century of Grading Research: Meaning and Value in the Most Common Educational Measure, Review of Educational Research, 86(4), 803-848.

Bin assessment criteria

a line of wheelie bins

This is the third post in my reflections on pass / fail grading and the QAA funded ‘Belonging Through Assessment‘ project. In the previous posts I argued that grading in higher education is not done against objective-criteria but against a socially-agreed community standard. However, when grading, this standard is not a single standard but multiple standards that have to be agreed against each grade criteria. This complexity means that consistency across the UK university sector is practically implausible. Having identified these particular problems with grading, I want now to offer a solution: bin assessment criteria, specifically bin generic marking rubrics.

We should use learning outcomes in the way they are meant to be used, i.e. met or not met, in other words pass / fail. I have never been able to get my head around simultaneously using assessment criteria that specify performance at different grades and learning outcomes. The epiphany moment was when I marked with a colleague who was mainly using the generic assessment criteria statement to grade and I was mainly using the learning outcome statement to grade. Although the two were notionally mapped to each other, the actual wording is different and results in different decisions. Here is an example:

Learning outcome: Critically evaluate your approach to planning, teaching and assessment using self-reflective frameworks and observations/reviews of practice.

Assessment Criteria: Process- Experiment and critically evaluate methods, results and their implications in a range of complex and emergent situations.

They are similar but not quite the same and I think this makes a complex task more complex.

KISS (Keep it simple, stupid)

As someone working in an Art & Design university, I prefer to subscribe to the KISS principle that learning outcomes are either met or not and that we only need pass / fail grading with no assessment rubrics. However, I can’t see this being a practical solution for much of the sector any time soon so in the meanwhile lets bin generic assessment criteria and use bespoke assessment rubrics based on the specific learning outcomes of that unit of study.

Addendum

After posting this I had an interesting conversation with Rachel Forsyth on a similar post she wrote called ‘Should we get rid of Grades?‘ Rachel talks about the ‘double duty’ of grades that serve as both measures of student learning and for accreditation / quality purposes. As I have previously discussed, this accreditation purpose of grading is used by employers to make quick, easy judgments about potential employees by being seen as some objective measure of differentiating one student from another.

Ultimately for me, I think we need to challenge those multiple purposes and say ‘No that is not the purpose of assessment’ and focus on the main educative purposes.

I was recently listening to Malcolm Gladwell’s Revisionist History podcast. He has two episodes addressing some of the purposes of assessment. He is discussing in the context of Law degrees in the US but I think his logic applies more generally. In Episode 1 (https://www.pushkin.fm/podcasts/revisionist-history/puzzle-rush) he highlights that the admissions test and subsequent assessments in law school favour a particular way of thinking that might not match up to what is required in the profession. Those who get the best grades do not make the best lawyers. In episode 2 (https://www.pushkin.fm/podcasts/revisionist-history/the-tortoise-and-the-hare) he talks about how the best law firms only draw from the ‘top’ 14 law schools as a proxy for ‘the best and the brightest’ even though there is good evidence to suggest going to the best law schools does not mean you become a better lawyer. Gladwell concludes that employers should adopt a ‘don’t ask, don’t tell’ policy. In others words they should not be allowed to know where graduates went to university and how well they did on university assessments. Instead, they should base hiring decisions of other factors that do correlate with better job performance. In other words, assessment should not be used in HE to differentiate performance amongst students because it does not correlate well with performance in the workplace.

Different standards, different grades

This is the second post in a series based on my work on pass / grading and the QAA funded ‘Belonging Through Assessment‘ project. in The first post I argued that assessment criteria are highly problematic. Assessment criteria and rubrics that are often used in higher education are not really criteria. That grading is not the act of judging student work against objective criteria but against a shared community agreement of what constitutes a certain grade.

Within, close-knit teaching and discipline communities, where grading and student work is regularly discussed, there may be close agreement about what constitutes each grade. However, when this is expanded out to the whole UK HE sector then this starts to break down. The mechanism in place to address standards across the sector is the external examining system. This system has been criticised for failing to address the issue of inconsistent standards (See for example Akerman 2016 for a brief history of external examining and a critque). Akerman argues that the size and diversity of the UK HE system makes it almost impossible to assure standards across universities. She suggest a complete rethink of the process rather than tinkering with it.

I agree that the system needs to change but it has nothing to do with the external examining process. The issue of comparability is made complex by grading. When the difference between a 59 and a 61% grade average (or a B- & and C+) is so important to the student (see my previous post on grade cliff-edges), how is it rationale to be able to make such fine and consistent distinctions on a national level in multiple communities. Or to put it another way Bloxham and Price (2015) argue against the assumption that “external examiners can represent community standards reliably and consistently”.

To illustrate my thinking and how I got here, I want to share a personal dilemma. I have worked on a similar PG course at 6 different universities. My understanding of the standards upheld with the disciplinary community in which I work have inevitably evolved over time. As a novice member of the community, it was more about absorbing the standards from more experienced colleagues. Now, as a more more experienced member of the community, I feel I have (or at least should have) a good understanding of those standards. Yet, my current dilemma is that I am having to apply different standards on two very similar courses in different universities (one a pre-92 university with a wide range of disciplines and one a post-92 with a narrower range of disciplines taught). This dilemma primarily focuses around grade distinctions. I am confident that those who pass should pass and vice versa but when it comes to allocating grades, I have never felt so unsure. You might be thinking that maybe there is a flaw in my understanding or that my current institutional processes are flawed but I don’t think either is true. What is true is that the institutional contexts are very different. The shared community that I graded in before (at the university with a full range of disciplines) only partially overlaps with the shared community of which I am part of now (at the university with a narrower range of disciplines). Similar course, different community. Those related but different communities seem not to share a common understanding of standards on the PG course I teach on. The net result is that students could get a very different grade at one university compared to if they had done the course at another university. I find myself then in an uncomfortable ethical position of having to absorb and apply the new community’s standards even though I feel they are outwith the wider disciplinary standards.

The point of this personal story is to illustrate just how hard it is to have consistent standards when student work is graded and how hard it is to do anything about inconsistencies between communities across the sector. The problem of consistent standards is made more pressing and complex by grading. Rather than one single threshold standard (Pass or fail) there are multiple standards and grade boundaries. What is the standard for an A? What is the standard for a B? etc.

Arbitrary Grading Cliffs

credit: Point Loma Nazarene University, @edtechsteve & Michael Fisher (CC BY-NC 2.0)

In UK education we have a range of grades available to recognise student performance at different levels. However, certain grades act as cliff-edges and receive far more attention and institutional focus than others. When I was a teacher of GCSE students the focus was on the C grade and making sure as many students got a C or above. With the new GCSE system the same has happened with Grade 5. At university, that cliff-edge is a 2.1 degree classification which often translate to a 60% or a B grade. Why does this matter and what impact does it have on our educational practices?

Firstly, let’s be clear that these grade cliff-edges are entirely arbitrary. Someone decided that schools would be measured by % of pupils who achieved a grade 5 or above. This was the metric to be used and because of this school behaviour changes in order to get the best result in that metric. The same at university, at some point in history graduate employers decided (completely arbitrarily) to set a 2.1 degree classification as a simple metric to weed out having too many applicants. Now, universities spend an awful amount of time and effort on this grade cliff.

As a school teacher, it was made very clear that our focus should be first and foremost on C/D grade (now Grade 4/5) boundary students. They were to get the most attention. The more capable students were fine and at no risk of going below a C, so they could get on with it themselves. As for the E and below students, they were a lost cause! They were not going to help with the metric! (Maybe, I am being overly cynical here but you get the idea).

At university, the 2.1 degree classification cliff has become the de facto measure of a so called ‘good degree’, it impacts employment prospects, it impacts whether students can study for a Masters degree, it has become THE marker of university success. Regardless of the fact that for many students just getting a degree represents a remarkable achievement, not getting a 2.1 carries a social stigma. This in turn creates enormous stress for students worried that they might not be able to achieve a 2.1. And of course, has raised concerns about biased teaching and assessment practices in universities particularly against students of colour. When the boundary is so important the impact of even small biases gets magnified.

What concerns me here is that an arbitrary grade determined by those outside the educational system has so much impact on staff and students within the system. There is no good evidence that having a 2.1 makes you a better or more successful employee than having a 2.2. Getting a C grade or grade 5 at GCSE is hugely influenced by family circumstances (e.g. this study) and not the school or the individual pupil. Yet this arbitrary boundary has huge impacts on the individual student.

Of course, it is easy to criticise but less easy to offer a solution. What is my solution? Firstly, we should recognised the arbitrary nature of such grade cliffs and think about why and whether they should exist and what impacts they have our educational system. At the moment, these boundaries are taken for-granted and treated as if they have some sort of absolute reality that can not be changed rather than acknowledged as socially constructed, arbitrary boundaries that can be changed. Ultimately, I would suggest a simpler pass / fail grading system, especially at university (with its supposed criterion referenced system of assessment). More radically, we might even consider ‘ungrading‘!