What sounded cleverest was considered best – a gold standard method all should follow. But what sounds clever isn’t always wise.
A lot of the criticism of RCTs is actually very funny. Michael Scriven dramatically declared a decade ago that RCTs have ‘essentially zero practical application in the field of human affairs’. And earlier this year, Lant Pritchett delivered the funniest presentation you’ll ever see on evaluation, in which he declared ‘the debate about RCTs in development is over. We won. They lost.’
There are many technical advantages and disadvantages of RCTs, but the questions that really matter for accountability practitioners who want to learn and adapt, are: How appropriate are they for assessing complex human behaviour? And, how important is context in explaining change?
Complexity and context are probably the two most fashionable buzzwords in the governance field at the moment. The big deal in complexity is that cause and effect aren’t necessarily proportionate. And you wouldn’t expect to get the same effect from the same policy or tool in different places. Things don’t always work. One size doesn’t fit all. There are also likely to be many different pathways to the same effect. And many different actors contributing to that effect.
So, governance and accountability work is hard to measure. But, it’s not impossible.
Countering (in)appropriateness
In an earlier blog, I asked whether capturing this kind of change was more about levels of confidence than absolute statements of attribution. I also suggested that we shouldn’t get fixated on assessing the tools we use (“toolsplaining”), because the goal isn’t to explain what we did but rather what caused the change.
All of the above is therefore about appropriateness. Back in 2012, a ground-breaking paper was written for DFID which concluded that there is no “gold standard” method, only methods that are fit-for-purpose. And after much debate, we thought contribution tracing might be appropriate for examining the difference governance programmes can make.
You can find out precisely how the method works in a blog by Gavin Stedman-Bryce, using one of the two learning projects where we recently piloted the method.
The two projects were Ghana Strengthening Accountability Mechanisms (GSAM) and Journey for Advancement in Transparency, Representation and Accountability (JATRA) in Bangladesh. Both are relatively large accountability programmes focused on public spending, funded by USAID and the World Bank respectively.
When we began, both projects were good at numbers, but neither had a clear theory of change or much clarity about causal pathways to higher-level outcomes.
Better value for money
Two of the main aims of testing contribution tracing in these projects were to help the teams understand how and why change happened and figure out what was worth measuring and what wasn’t.
We’re forever asking staff to interview or survey more people, even if that may not be appropriate or useful to explain the change we’re looking at. I can hold up my own hand on that error in the past. And contribution tracing offers a way to help ensure we collect only the data we really need.
GSAM was subjected to an expensive RCT. It was a clever idea to compare the relative effectiveness of public audits and social audits in adjacent districts.
Like in most RCTs, the results were mixed and confusing. The evaluators’ ability to detect effects was complicated by an election, with the loss of more than half the officials in the baseline. Not very shockingly, information and training wasn’t spread homogeneously across districts. And, entirely predictably, delays in implementation meant that the evaluators ended up assessing the medium-term impacts of public audits and only the short-term impacts of the social audits (because they assessed changes two years before the end of the project).
The Ghana team had a process map which described, step-by-step, what they did to influence district assembly response to citizens’ concerns about infrastructure investments. It was a single pathway (like the RCT theory of change), but throughout the process of developing a causal chain, the team realised there were, in fact, four slightly different pathways to the same outcome. Getting diverse perspectives in the room to figure what had actually happened and how was extremely helpful in making that chain more robust.
The Bangladesh team had an even more challenging causal chain, with six pathways to influence pro-poor budgeting. The project’s external evaluation was extremely complimentary, but it struggled to articulate very precisely how the project worked and thus what the process of change really was. It took me weeks to understand it, and I’m still discovering things to this day. But when the team were able to define their own claim, they were able to explain far better what their own causal chain looked like.
What the team realised was that of the 77 pieces of evidence they identified for their causal chain, they only needed half of them, because some evidence was better at validating the project’s contribution claim than others. The team now know what good and bad evidence looks like based on what they are actually trying to prove (probative value).
They found, for example, that a lot of administrative data such as invitation letters, interviews (with the wrong people), or even the contents of trainings were often not that helpful evidence. Testimonial evidence with the right people, and audio-visual material were a great deal more compelling evidence. Knowing this may well save them a lot of money in the future by not gathering data they don’t need.
Context is all
Contribution tracing also enabled the teams to reach an unprecedented level of granularity. In Bangladesh, the general approach of the JATRA project is based on a decade’s worth of pretty sophisticated work, including some very interesting participatory power analysis and social mapping. Alongside developing pathways, contribution tracing asks you to grade your evidence.
One of the big questions we asked the team was about corruption. One type of corruption could be forgery of meeting records. We had meetings led by the CARE team, meetings led by citizen forums (effectively, CSOs), and meetings led by Union Parishads (the government). Some documentation of these meetings was necessary to confirm the team’s contribution claim was real (i.e. if we didn’t find it, we were toast).
However, we wanted to know how likely it was that we might find meeting records even if certain people hadn’t actually attended the meetings. It turns out, due to different social norms and incentives, the team judged that this almost never happened with CARE Bangladesh staff. However, it often happened with citizen forums, because it was common to register people and carry out follow up meetings with those who didn’t actually attend later. And in government, forgery was known to happen, but it wasn’t considered very common. That means that local context massively affected how the team defined the quality of evidence.
Overall, there were four key take-aways that I think are relevant for a much wider audience:
- Choose tools fit-for-purpose: We should clearly link tools and methods with what we actually want to measure. We should first identify the causal mechanisms that will lead to change; then look at what evidence would be necessary to prove the validity of those mechanisms; and only at that point decide which tools and methods to use.
- Clearly define concepts, or you can’t measure them: Governance concepts like transparency, accountability and responsiveness need to be clearly defined to be measured and evaluated. Yet, they are commonly misunderstood and often conflated. Throughout the process, the teams learned that these concepts may require quite different data.
- Construct progressively testable theories of change: Without theories of change clearly articulated, we can’t test our assumptions about what works, what doesn’t, and why. The experience showed how important it is to ensure that these are (progressively) testable in order to make claims about impact for influencing work.
- Don’t just gather more evidence, find better quality evidence: One team member from Bangladesh said: “I learnt that it is possible to collect targeted data even with a small sample, as long as you are very specific…You can also save a lot of resources with a targeted limited scope, which is very valuable. This will help to save a lot of money and resources to demonstrate performance on the ground.”
We’ll be putting some of these lessons into practice more broadly over the coming months; crafting sharper definitions, updating our theory of change guidance, and adapting evidence grading tools so that more staff can make the most of our learning going forward.