The Case for Evidence Based Implementation

How and why to collect data to better understand your work and its impact

Evidence based, evidence based, evidence based. I keep reading that non-profits and the institutions that fund them have entered a new era of developing programs and making decisions based on data and proof of “what works”. In theory, great! But in reality, these encomiums to evidence are pushing a rather simplistic notion of how organizations and their funders should ask questions and collect data to better understand their work and its impact. 

Here’s the latest. In late December the New York Times ran an op-ed lauding the Obama administration's embrace of evidence based social programs - and bemoaning possible Republican efforts to move away from this focus. The author, Ron Haskins, argues that after years of the federal government giving money to social programs that had no long term impact on recipients, Baraka Obama, building on work done by George W. Bush, ushered in an era of increasingly rigorous and accountable federal spending. Money is now funneled to programs that have been properly evaluated by a reputable outside organization and shown to work, allowing them to expand while fewer dollars are spent on programs that don’t work. 

I’ve spent years building evidence based cultures and practices within nonprofits. Which makes me skeptical of Haskins’ argument and conclusions; particularly his sweeping assertion that evidence of success is all we need to make decisions about how to fund social programs.*

It is important that funders make a more concerted effort to identify and fund programs that are committed to honestly evaluating their work. Primarily focusing on the end result, though, ignores the crucial implementation process. It’s one thing to compile a list of “what works” based on positive outcome data in a limited number of sites, and another to take these interventions and implement them in a diverse range of settings with consistent, positive results. Then there’s going to scale with these interventions, which is yet another order of magnitude. 

We need more nuanced definitions and less blunt applications of terms and processes like “evidence-based” and “what works.” Ones that take into consideration that interventions are deployed in complex settings with diverse populations. Ones that acknowledge that a single intervention doesn’t work in isolation, but is embedded in systems where a lot is happening, including other interventions. And finally, ones that articulate that in-between developing a program and writing up its outcomes is a hugely variable and important process – implementation – where how the program is deployed shapes results. Unless we understand how a program is implemented, we can't talk in any real way about its outcomes and impact.

In what I’m calling an “evidence-based implementation” approach, data would be used to clearly demonstrate fidelity and quality of program implementation – enabling quality improvement cycles and setting up future evaluations. Funders would encourage and help resource such work, which would increase learning, accountability and return on investment.  

 

What should "evidence-based" look like?

Margery Turner at the Urban Institute and Tony Bryk and Lisbeth Schorr in the Huffington Post have recently (2013 and 2015, respectively) made very convincing arguments for why it’s important to develop a more nuanced understanding of terms like “evidence-based” and “what works”, as well as how data might be collected to create such understanding.

Turner notes:

the conversation about “evidence-based policy” focuses too narrowly on a single question and a single step in the policymaking process: [the argument is that] if an initiative or program hasn’t been proven effective, it’s not “evidence based” and shouldn’t be implemented.
[But] in reality, policy development occurs in multiple stages and extends over time. New policies emerge in response to problems and needs, possible approaches are advanced and debated, new policies are adopted and implemented, and established policies are critiqued and refined. Evidence can add value at every stage, but the questions decision-makers need to answer differ from one stage to the next.
These questions go beyond “does the intervention work”. A far-from-exhaustive list includes: 

* (How) does this intervention meet the needs of the target population? Other populations?

* What is the cost of this intervention?

* Under what conditions does the work succeed? Under what conditions is it a struggle to implement? 

* What is needed to implement the program properly? How much does it cost? 

* Why this approach? What else has been tried in this space? What were the outcomes of the work?  

Turner goes on to suggest that to remedy this situation

Instead of relying on a single tool [RCT, randomized controlled trials], policymakers and practitioners should draw from a “portfolio” of tools to effectively advance evidence-based policy. Using the wrong tool may produce misleading information or fail to answer the questions that are most relevant when a decision is being made. Applying the right tool to the policy question at hand can inform public debate, help decision-makers allocate scarce resources more effectively, and improve outcomes for people and communities.
These tools might include a randomized control trial; but could also include micro-simulation models, administrative data, or qualitative methods like focus groups, interviews and observations. Qualitative information like in-person observations, one-on-one or group interviews, for example, can be used to break down a complex problem and pinpoint what’s the core issue; develop ways to address the issue, and better understand implementation.

Funders, evaluators and organizations, in other words, shouldn’t be focusing solely on demonstrating that a program “works”. But should explore a variety of questions along the program continuum, from inception to implementation to outputs to outcomes. Doing so will help stakeholders to better understand and hence make better decisions regarding program development, implementation and outcomes.   

Lisbeth Schorr and Tony Bryk (In response to Haskins' editorial in the Times) also call for moving: 

beyond our current preoccupation with evidence from "what works" in the small units that can be experimentally assessed. Achieving quality outcomes reliably, at scale, requires that we supplement carefully controlled, after-the fact program evaluations with continuous real-time learning to improve the quality and effectiveness of both systems and program. 

Why?

Because there is enormous variability in the impact of social interventions across different populations, different organizational contexts, and different community settings. We must learn not only whether an intervention can work (which is what randomized control trials tell us), but how, why, and for whom -- and also how we can do better. We must draw on a half-century of work on quality improvement to complement what experimental evidence can tell us. And, importantly, the learning must be done not alone by dispassionate experts, but must involve the people actually doing the work, as well as those whose lives the interventions are trying to enrich.

A strong summative evaluation, Bryk and Schorr argue, is just one piece of figuring out how to create positive social change. If you truly want to figure out how to make evidence-based interventions work at scale, you cannot deploy widely based solely on the results of RCTs. Instead, you must look at the intervention in context: with the people who are participating in the work and including the wider systems in which the intervention is embedded. 

Acknowledging complexity addresses the fact that a successful intervention may not be appropriate in every setting; and that interventions interact not only with the systems in which they are being deployed, but also with other work taking place there. This is analogous to the understanding that has emerged in medicine that prescribing a drug needs to take into account not only what else is being taken, but also who it is being given to and the context in which they have gotten sick and require care. And that the end result of care (whether it’s “successful” or not) can be very dependent on what has gone on for years before a course of treatment was prescribed.

 

The reality of collecting nuanced evidence

How much do organizations and their funders engage in and support data collection that goes beyond “does it work?” to explore questions of implementation, systems and context? In academic settings (i.e., for interventions developed within universities), such work does sometimes take place, though there are still many calls for the “black box of implementation” to be opened and explored (see, for example, the work of Kimberly Hoagwood and Marc Atkins). 

For your average nonprofit or community based organization (CBO) implementing a school-based intervention, however, in-depth data collection focused on implementation is rare. They - and the organizations that fund them - continue to think about evaluation in the reductive “what works” way rather than what Turner and Bryk/Schorr advocate for.

A key reason for this focus on outcomes and impact over implementation is that organizations currently have very limited data collection resources and capacity. Collecting nuanced evidence (i.e., evidence demonstrating not only that a program produces positive outcomes, but under what conditions and through what mechanisms) is difficult to prioritize given resource constraints. Limited dollars are going to be put towards acquiring evidence of success, not evidence around how the work is being done.

Education organizations, as a result, report to funders about attendance, grades, test scores, teacher retention, student engagement, suspensions, and other variables without drawing a clear through-line between their work and what they claim as outcomes. This can give reporting a fragmentary quality, where what the program does and how it leads to positive outcomes is elusive. For both organizations and funders this is frustrating. All this work goes into collecting, analyzing and writing up data. But in the end what’s reported doesn’t give a clear sense to anyone of the work done and its impact.   

 

Evidence Based Implementation as a Supplement to “What Works”

Through an "evidence based implementation" approach, in contrast, funders would encourage programs - and provide them with the resources - to look rigorously at their program implementation, systematically document their interventions, and think about whether and how the work could be carried out at larger scale.  Data collection would focus on fidelity and quality of implementation, what adaptations have been made and why, who is the population receiving services and what they want and are getting out of the work.

Which doesn't mean, of course, that outcomes and impact would be ignored; rather, they’d be part of a continuum of evidence starting with program development, continuing through implementation and outputs, and then ending with outcomes/impact.  Implementation data would shed light on the work being done; and outcomes could then be tied more tightly to a program’s work.

Most programs cannot evaluate a program well on their own, and for a decent look at outcomes use an outside evaluator. But outside evaluators don’t have much of a connection to the program or community; and coming in without a good record of what’s been done only makes it less likely they’ll be able to draw strong conclusions about the work and its potential effectiveness. Every program is claiming an impact on the same set of variables; so being able to explain how you might be impacting those variables, and what more intermediate outcomes might be, is what can set you apart from other programs. More importantly being able to explain your program gives you a good sense of fidelity, how to train, build capacity, scale, adapt, who is your target audience, etc. In other words, how to do your work with quality.   

 

Evidence Based Implementation and Community Schools

So how might this look in the field? Take NYC's recent school reform initiative focusing on community schools. Given the multiple service providers, large number of schools participating, complicated intervention model and diversity of school and community settings, to have any hope of understanding the impact of the work it will be critical to carefully track implementation. It would be a mistake to jump straight to an outcome/impact focused evaluation. Funders should want to know a great deal about what the model looks like, variations in implementation, where it’s working/not working, what’s working/not working, etc. Looking primarily at outcomes isn’t going to tell you much about what the role of the model was in producing these outcomes.  

A recent Children’s Aid Society (CAS) report on how NYC might scale up its community schools work contains evaluation recommendations from the Center for Innovation Through Data Intelligence (CIDI). On the outcomes end CIDI suggest a nested design that would enable evaluators to look at outcomes at the student, school and community levels. Prior to looking at impact, however, they recommend conducting an implementation evaluation with a focus on i) the early phases of implementation and ii) fidelity to the model. The goals of this work would be to take “corrective action when necessary” and “be able to define the prototype of a successful community school”. [CIDI notes that because an implementation evaluation is “resource intensive”, the work might only be conducted in a sampling of community schools.] 

With the generous funding that’s going to the Community Schools Initiative, I’m hopeful a thorough, thoughtful process of developing and collecting metrics and looking carefully at implementation will take place. [For an example of what this might look like, here’s an evaluation proposal I made to a community schools lead partner in 2013.] 

As with impact, it’s important to look at implementation from a multi-level perspective: students, schools, CBOs/non-profits, and families/communities. Collecting this data will require methodological diversity and creativity. As an example and to wrap up, I want to flesh out an idea given to me by Mary McKay that I think is fantastic, and I’ve long advocated for. 

Years ago she suggested the best way for evaluators (both internal and external) to collect complex, school-based intervention data would be through a dedicated research assistant, one per 1-2 schools. This individual would be responsible for gathering the quotidian implementation data that is impossible to get unless you’re a regular presence in the school – but that if you’re actually doing the work you don’t have the capacity to focus on. This data is absolutely essential to understanding the quality and fidelity of implementation, as well as what kind of modifications and adaptations to the intervention are happening in the field. 

This research assistant would receive training and then follow set protocols to gather the nuanced data that CBOs will never obtain if they go the tried and true school data collection route – relying on school staff to collect the data for them. When you do that you end up with the incomprehensible, incomplete notes of busy social workers; support staff occasionally filling out basic checklists during meetings; constant negotiations with schools about accessing data; and still no idea of how well the work is going in the school. Not to mention the biggest issue: school staff weary of yet another demand placed on them by the organizations that came into their buildings with promises of lightening their burdens. Need I state the obvious – school staff don’t want more paperwork! 

I’d strongly recommend this approach to the Community Schools Initiative. This is a line item that can be written into grants. This might be a great way to utilize AmeriCorps/VISTA staff and give service learning opportunities to recent college graduates. 

 

[* Haskins’ examples of programs that work include two education interventions: Success for All (SFA) and Reading Partners (RP), reading interventions at two very different stages of development and evaluation. These programs in fact demonstrate some of the flaws behind the “what works” approach, but I’ll get to that in another post. This post is focused on the need to create more nuanced, evidence based approaches.]