III. Research Questions, Methods, Data Sources and Format for Reporting Results

    A. Research Question: What Accounts for Variation in the Amount of Serious, Reversible Capital Error from One State, County, Year and Case to the Next?

Some states have higher rates of reversible capital error than others. Some counties in those states have higher error rates than other counties in the same and in different states. And death verdicts imposed in some years are reversed more often than verdicts imposed in other years. These differences pose one of the central questions addressed in the remainder of this Report: What factors tend to be present when capital reversal rates are especially high or low? In addition, some verdicts are reversed at a particular stage of court review while other verdicts are approved at that stage. In regard to the federal habeas stage we ask a related question: What factors tend to be present when capital verdicts are reversed, and when they are affirmed?

These questions invite educated guesses, or hypotheses, about particular conditions that logic or experience suggests might have an effect on the existence and amount of reversible error. One might predict, for example, that poorly funded courts are more likely to impose flawed verdicts than better funded courts. Such hypotheses can be tested by answering the following question: Do the existence and amount of reversible error change from one state, county, year or case to the next when particular conditions are and are not present, or are present to a different degree?

Although one can examine each condition separately, more is learned by examining all of them simultaneously. Doing so helps identify a stronger overall explanation based on multiple conditions, and also takes into account relationships among the conditions as well as between each of them and amounts and rates of capital reversals.234 For example, crude initial analyses might reveal that changes in each of two conditions coincide with changes in capital reversal rates: reversal rates go up (1) when the speed with which relevant jurisdictions decide all court cases goes down, and (2) when spending on courts goes down. This might seem to suggest that both decision delays and underfunded courts "cause" high reversal rates. Suppose, though, that poor funding leads to both delay in deciding cases and flawed death verdicts, so decision delays and flawed verdicts follow somewhat similar patterns of change—not because they are causally related to each other but because both are affected by funding changes. In this event, simultaneously examining the effect on reversal rates of both funding and decision time would likely reveal that only funding has a strong relationship to reversal rates.

Revised in light of the importance of analyzing multiple factors at the same time, our central research question is this: What set of potentially explanatory conditions tends to be present, and to increase in amount or intensity, in those places, years and cases in which serious capital error is present and increases in amount? Answering this empirical question may help answer two policy questions posed by chronic capital error:

  • What reforms might keep serious capital error from occurring as often in the future as they have in the past?

  • Can such reforms reasonably contain the risks and costs that currently overwhelm the capital system?

This Part discusses the methods and data used to answer our central research question. Parts IV-VI discuss the results of 19 statistical analyses applying these methods to our data on capital reversals. Based on those results, Part VII reaches a comprehensive set of conclusions about the factors associated with high numbers and rates of serious capital error. Then, based on those conclusions, Part VIII addresses the two policy questions above, listing reform options for moderating the problem of serious capital error.

    B. Why it Is Helpful to Use Regression Analysis to Identify Statistically Significant Relationships Between Changes in Potentially Explanatory Conditions and Changes in Capital Reversal Rates

    1. The chronic nature of capital error makes it important to look for explanations for reversals beyond the reasons courts give in each case.

Errors that occur at capital trials or in investigations leading up to them are typically discovered in capital appeals whose purpose is to inspect trials for serious error and, if it is found, to reverse the verdict and send the case back for a new trial at which the flaw is avoided. In this sense, the question, Why do reversals occur?, can be answered with lists like those at pp. 40-43 and notes 167, 169 of the reasons courts give for reversing death verdicts. By this measure, reversals most often occur at the review stages about which information is available for study because of egregiously incompetent defense lawyers, the withholding by police and prosecutors of important evidence that the defendant did not commit a capital offense, instructions to jurors that badly misstate the law, and judge and juror bias.235

Although there is a temptation to accept this information as a complete answer to the question of why error occurs, it actually raises much more fundamental questions. Although the public and policy makers have not always been kept informed, key actors in the death penalty system Cstate's attorneys, the capital defense bar and trial and appellate judges—have known for a long time (1) that most death verdicts are overturned because of serious errors, and (2) what those errors are.236 Yet, knowing this has not led those actors to reform the system.237 Why? Are high amounts of capital error inevitable? Or are there reasons—more basic than the particular flaws leading to each reversal — why the same flaws occur again and again, causing the entire system to fail,238 and why participants cannot or will not fix it? And if those reasons are discovered, can they help the public, policy makers and others outside the system to reform it and avoid its worst mistakes?

Rather than assuming from the start that serious capital error is inevitable, and that the public must either resign itself to chronic flaws or abolish the penalty, this Report pursues rigorous and systematic answers to the research questions set out above, which we boil down to these two: Why are some states, counties, periods and cases more prone to capital mistakes than others? What conditions are present when such mistakes are common, and what factors prevail when they are rare? Regression analysis provides a useful way to answer these questions because it enables us to:

  • identify more deep-seated conditions than the actions that led to reversals in each case that are associated with an increased probability that reversible error will be found in capital cases;

  • explain why error chronically recurs despite frequent reversals; and

  • sort out the relationships between factors that are individually associated with higher rates of serious error by testing them simultaneously to see if the relationship remains after controlling for other relevant factors.

    2. The uses of statistically significant results.

The goal of our regression analyses is to identify conditions that tend to be present, and to increase in amount or intensity, when reversible error is present and increases, where that pattern is statistically significant. Statistical significance means the probability is small—5% or less—that the pattern appears in the data purely by chance.239 A statistically significant finding poses an additional question: Is there a cause and effect relationship between reversible error and conditions that are linked to reversible error in a statistically significant way?

Answering this question requires more than statistical analysis. One must use common sense based on other information and experience. Suppose delay in processing court cases is high in places where capital reversal rates are high. This might imply a cause-and-effect relationship between the two conditions, but it does not prove it or show which way causation runs. Courts bogged down with many cases might generate faulty death verdicts. Or having to review many faulty death verdicts might delay other cases. Or both conditions might result from a third factor such as poor funding.

Even so, discovering significant relationships between capital reversal rates and other conditions on which we have data reveals much more than we knew before, and may strongly suggest causal hypotheses and policy options not previously known. In some contexts, relationships between events of possible importance are relatively obvious based on everyday observation, as are their implications for action—e.g., the relationship between clouds and rain, and what it tells us about when to grab an umbrella before heading out the door. But this is not true of a system as complex and removed from most citizens' everyday experience as the death penalty. In that context, recognizing significant relationships between conditions we can change (e.g., high caseloads or poor funding) and outcomes we want to avoid (e.g., unreliable death verdicts, costly errors and retrials and executing the innocent) requires supplements to everyday observation. Providing those supplements is the goal of the statistical analyses we describe below.

    C. Study Methods

    1. Six traits of the capital system and available data that shape analysis.

The nature of a system being studied, and the information available about it, shape the way researchers study it. Six such considerations shape our study.

      a. A lack of variation, given nearly uniform bottom-line failure.

As we note above, few death verdicts pass inspection and are carried out. Although the 34 states with active death penalties during the 1973-1995 period imposed 5826 death verdicts during that period, only 358 (6%) of the verdicts were approved for execution by all three sets of courts that review death verdicts. And only 313 (5%) of the verdicts were carried out—about half of which were from only two states (Texas and Virginia), with the rest sprinkled among about 20 other states.240 Thus, although it is possible to define executions in this context as success and everything else as failure, success is too rare and failure too uniform to provide a useful basis for analyzing the factors associated with either.

This calls for another measure of failure besides the across-the-board failure at the bottom line. The measure we use is court reversals based on findings of serious error, which are highly correlated with a failure to carry out death sentences.241 When all three review stages are considered, reversals and their opposite (affirmances) number in the thousands and vary considerably across place and time—although, as we have noted, most of the variation is above a disturbingly high level of reversals that is common to most places and years.242

      b. Different dates when states began imposing death verdicts under valid capital statutes.

After the U.S. Supreme Court invalidated all existing capital statutes and sentences in 1972,243 the 34 states in our study moved at different speeds to adopt revised capital statutes. Then, in 1976, the Supreme Court invalidated a substantial number, but not all, of the new statutes because they made the death penalty mandatory for entire classes of murder, in violation of the U.S. Constitution.244 States that had adopted mandatory death penalties and wished to continue imposing capital sentences had to draft revised statutes a second time, which they again did at different speeds. For these reasons, the 34 study states began imposing the death verdicts we study here—those imposed under valid modern statutes 245—at times spread over about a decade. The fact that different study states started capital sentencing under valid statutes in different years is a complication, because it means that the states have capital-sentencing experiences of different durations, and potentially of a different character based on when they began.

To permit simultaneous comparison of states that began using the death penalty under valid statutes in different years, we include time in most of our regressions. To begin with, we measure reversal rates not only by state but also by year. Making the unit of comparison each state's capital experience in each year when it imposed at least one death verdict under a valid statute, rather than each state's overall capital experience during the entire study period, helps account for the different duration of each state's capital-sentencing experience, and for differences among states tied to particular years when some were and others were not applying valid capital statutes. Second, we typically include year as a random effect, an analytic strategy which assumes that verdicts imposed in the same year (even if they are imposed by different states) are probably more alike than verdicts imposed in different years. Third, we include the time trend as a potential explanatory factor, to identify patterns of reversal rates over time that are not accounted for by other factors in the analysis.

As we point out below, including time adds complications of its own in interpreting the effect on reversal rates of time, and especially the time trend.246 For that reason, and also to gauge the importance of time and to provide policy makers with a direct comparison of the 34 states' overall capital-sentencing experiences under valid statutes during the 23-year study period, we remove time from consideration in a small number of supplemental analyses.247

      c. A choice between reversal rates for reviewed and imposed verdicts.

Deciding to compare reversal rates across place and time poses another question: Should we try to explain differences in the number of death verdicts reversed during the study period as a proportion of the number of verdicts that were finally reviewed during the period, or as a proportion of the number of verdicts that were imposed in the period? Arithmetically, the question is:

number reversed ÷ number reviewed vs. number reversed ÷ number imposed ?

As we note above, the first of these measures is the true error rate—the proportion of verdicts that were inspected and found to be flawed.248 In contrast, the latter ratio gauges error and delay, because the rate of imposed verdicts that were reversed as of a given time is affected by both conditions: It may be that many verdicts were inspected and only a few were found to be flawed. In that event, a low error rate leads to a low reversal rate. Or it may be that the review process was delayed, so only a few verdicts were reviewed, and thus only a small proportion of imposed verdicts were reversed. In the latter event, even if all of the verdicts that were reviewed were found to be flawed, the reversal rate (as a proportion of all imposed verdicts) will still be very low, because most verdicts got mired in the review process and did not generate reversals or affirmances. In the latter event, it is delay in the review process that leads to a low reversal rate as a proportion of imposed verdicts.

If the ratio of reversed verdicts to imposed verdicts is used to measure error, therefore, it is necessary to "control for" delay. This means identifying one or more factors known to be related to delay in reviewing death verdicts, and measuring their relationship—along with the relationship of conditions thought to be associated with error, not delay—to rates of imposed death verdicts being reversed. When all such factors are studied together, the factors known to be associated with delay can be used to capture that influence on reversal rates, leaving other conditions associated with differences in reversal rates to be linked to error.

We have identified two factors that control for the proportion of imposed verdicts that have not been reversed as of a given time because of delay. The first factor is the year of the death verdict. The more recently a verdict was imposed during the study period, the less likely it is that the verdict will have been reviewed by the end of that period. Because review takes time, the later a death verdict enters the queue of capital verdicts awaiting review, the later it is likely to be reviewed, and the more probable it is that it will not have completed that process by the time the study period ended. As we develop in more detail just below, this means that in studies of reversals as proportions of imposed verdicts, time trend—the year a verdict was imposed—is likely to be negatively associated with reversal rates, not because later verdicts are less prone to error, but simply because later verdicts are less likely to have completed the review process.249

A second measure of delay—the number of capital verdicts awaiting review at any given time—is sensitive not only to how far back in the queue of capital appeals a case is located, but also to how slowly the queue is moving. Thus, the number of backlogged capital cases measures both the number of capital cases experiencing appellate delays, and the severity of a key condition that causes delay—namely, capital litigation itself. Every capital appeal requires a huge commitment of court time, given the factual and legal complexity of capital cases and given the frequency with which capital verdicts are marred by reversible error.250 In addition, all capital appeals must pass once, and usually twice, through a small bottleneck in all states. Nearly all capital cases are reviewed in the first, and usually a second, instance by a single state high court with approximately the same small number of judges from one state to the next—often five or seven. The higher the number of capital appeals pending at any given time, the more clogged the appellate bottleneck is likely to be with these difficult cases, and the more slowly capital appeals are likely to move.

      d. Uncounted reversals of death verdicts imposed in later years.

As we have just noted, comparing states based on the proportion of imposed verdicts that were reversed as of the end of the study period makes it difficult to gauge the relationship of the passage of time to error, as opposed to its relationship to unfinished or delayed review. Suppose we find that the proportion of death verdicts imposed in 1993 that were reversed as of the end of 1995 is smaller than the proportion of death verdicts imposed 10 years earlier that were reversed by that point. This result could mean that death verdicts imposed in 1993 were freer of error than those imposed in 1983. But it could also mean that verdicts imposed in 1993 were just as error-ridden as ones imposed in 1983 (or were worse) but that reviewing courts only had two years, not 12, to find all the errors, leaving many errors still to be discovered by the time the study ended. A decline in the rate of imposed verdicts that were reversed over time thus is not a useful measure of the trend of error over time.

One way to improve the power of the time trend (the year verdicts were imposed) to gauge whether error is increasing or decreasing over time after controlling for other factors is to compare rates of reviewed verdicts that were reversed, instead of comparing rates of imposed verdicts that were reversed. Unfortunately, comparing reversal rates for reviewed cases does not entirely avoid the link between recent verdicts and unfinished appeals, because at the third, federal habeas stage of review, flawed verdicts take longer to review than verdicts without reversible error. Figure 10, p. 93 below, shows that:

  • All verdicts finally reviewed on federal habeas during the study period spent much more time under review in state and federal court in later years than in earlier years—rising from about 52 years on average from sentence to final habeas review for verdicts finally reviewed in 1981, to 12 years for verdicts finally reviewed in 1995.

  • Of greater interest here, the amount of time taken to complete federal habeas review of cases in which relief was granted has generally been longer—by the end of the study period it was about two years longer (averaging about 13 years)Cthan for cases in which habeas relief was denied (averaging about 11 years by the end of the study period).

Given the latter fact, our time-limited study systematically understates reversal rates over time at the federal habeas stage: Death verdicts without reversible error are over-represented among the verdicts imposed in any given year that were finally reviewed on federal habeas by the end of the study period, because those verdicts take less time to review. Conversely, death verdicts with reversible flaws are under-represented among verdicts finally reviewed on habeas by the end date, because they take longer to review. Because the number of unreviewed verdicts rises as the sentencing year gets more recent (fewer 1989 death verdicts were finally reviewed as of 1995 than 1988 ones; fewer 1988 verdicts were reviewed as of then than 1987 ones, and so forth251), the impact of the bias against counting reversible error as of 1995 that eventually will be discovered and reversed grows with each successive sentencing year.

The result is a false impression that later death verdicts are cleaner than they are. What instead is happening is that cleaner verdicts move to the front of the line of cases getting finally reviewed, shoving flawed verdicts to the back of the line. As the sentencing year gets later, the proportion of verdicts awaiting review as of the cut-off date gets larger, as does the proportion of flawed verdicts towards the back of the line that have not yet been reversed—and, so, are not counted in our study.

Average Time to Final Federal Habeas Reversals and Affimances, by Year of final Decision, 1981-1995

The Capital Criminal Process: Trial Through State Post-Conviction and Federal Habeas

      e. Large number of verdicts trapped in a multi-stage review process.

Analyzing only reviewed death verdicts also poses a choice about the reviewed verdicts to study. One might study only verdicts fully reviewed at all three review stages that are depicted in the stylized flow chart on p. 94 above—state direct appeal in state high courts, state post-conviction review in state trial and high courts, and federal habeas review in federal trial and appellate courts. But of the 5826 death verdicts imposed during the study period, only 598 were reviewed at all three stages. Studying only those verdicts limits analysis to state differences, because only a few counties have fully reviewed verdicts, and ignores information from thousands of court decisions reviewing death verdicts at earlier stages. And the state differences that this approach explores are limited to a subset of active death penalty states in which cases had progressed through the entire three-stage review process as of 1995.

Or one could combine all final decisions at each of the three review stages, even if the verdict never made it to a later review stage. But any reversal at the second review stage necessarily follows an affirmance of the same verdict at the first review stage; and any reversal at the third review stage necessarily follows an affirmance of the verdict at two prior stages. As a result, any condition found in a case that leads to a second-stage reversal will also necessarily be found in a case with the opposite outcome, affirming the verdict—i.e., the case reviewing the same verdict at the first review stage. And any condition found in a case that leads to a third-stage reversal will also necessarily be found in two cases that affirmed the verdict—the cases reviewing the same verdict at the first and second stages of review. Studying reversals as a proportion of court decisions at all three stages combined thus might dilute the effect of forces that account for reversals in proportion to how late in the review process the reversal occurred.

The only way to analyze reversals at all three stages combined, therefore, is to consider reversals as a proportion of imposed death verdicts, not reviewed verdicts. Then, each verdict only counts once, as either a reversal or a non-reversal (the latter meaning either an affirmance at some or all stages, or that the case was delayed and not decided at all). If analysis of only reviewed verdicts is desired, the data must be divided into three clumps—one for reversals and affirmances issued at each of the three review stages.

      f. Smaller number of cases at second and third review stages.

The value of the latter approach—separately examining reviewed verdicts at each review stage—is moderated by the successively smaller number of cases reaching the second and third review stages. Stage-by-stage analysis generally can only compare states, because there too few cases spread among too many counties to reach useful conclusions at the county level of analysis.252 And even when comparing states, the small number of cases at the latter two stages makes it more difficult to reach comprehensive conclusions.

As we discuss above, moreover, we do not know how many cases were reviewed at the second, state post-conviction stage, but only how many were reversed at that stage. This information enables us to conservatively estimate state post-conviction reversal rates based on the assumptions that no delays occurred at the second review stage and that every case clearing the first stage was immediately and fully reviewed at the second stage.253 But these same assumptions do not permit us, at this second stage of review, to achieve the advantages of comparing reversal rates among verdicts that were actually reviewed, as opposed to those that were imposed and were merely available for review.

    2. A prudent strategy for reaching conclusions: the results of a principal analysis, tested by the results of 18 follow-up analyses.

These theoretical and empirical considerations call for our main analysis—Analysis 1— to use over-dispersed binomial logistic regression analysis254 to study:

  • state-level factors,

  • that explain differences from state to state and year to year

  • in capital reversal rates calculated as the number of reversals at all three review stages combined as a proportion of all imposed verdicts,

  • using time trend and the number of backlogged death verdicts awaiting review255 to capture the relationship between unfinished and delayed review and lower reversal rates, leaving other significant factors to explain differences in capital error rates that lead to differences in capital reversal rates.

We include these components in our main analysis because they make the best use of all of the detailed information we have collected to identify conditions associated with serious capital error.

To begin to test whether our main results are robust, we first follow-up our main binomial logistic regression with an otherwise identical analysis that uses a different kind of regression— Poisson logarithmic regression—with different assumptions about patterns of capital reversal rates. Robust results are ones that consistently reveal similar relationships between capital error and the explanatory factors the analyses identify, regardless of the type of regression used.

As further tests of the robustness of our main results, we conduct 16 additional follow-up analyses, each altering one or more components of the main analysis to see if its results depend on that particular aspect of the study design. Substantial similarity between the results of the main and follow-up analyses would be a very good indication that our results reflect actual relationships in the data and are not sensitive to the details of the statistical method being used. Each follow-up study substitutes one or more of the following components for the corresponding aspects of the main analysis:

  • analysis of reversal rates at each separate capital review stage, as opposed to reversal rates at all three stages combined, as the outcome to be explained;

  • analysis of reversal rates as a proportion of verdicts reviewed, as opposed to verdicts imposed, during the study period;

  • different assumptions about the importance of time and the year death verdicts were imposed;

  • inclusion of county-level explanatory factors for state reversal rates;

  • use of county-to-county (rather than, and sometimes simultaneously with, state-to-state) differences in reversal rates as the outcome to be explained, with a variety of assumptions about the effect of states on counties and vice versa;

  • use of the results of individual appeals of capital verdicts at a particular review stage (as opposed to reversal rates from the aggregate of all such appeals) as the outcome to be explained, calling for a different type of regression analysis; and

  • different tests of the relationship between delay and rates of reversal and error.


    This strategy is prudent because it limits conclusions to relationships:

     

  • present in 23 years worth of carefully compiled and conservatively measured information about the existence and amount of capital outcomes in this country;256 and

  • confirmed by multiple overlapping results using different sets of explanatory factors, different levels of aggregation of data, different definitions of the outcome being explained and different probability measures.

The next section describes our main analysis and 18 follow-up analyses. The analyses are grouped by those examining differences in:

  • capital reversal rates across states (main Analysis 1; follow-up Analyses 2-6, 14, 15);

  • capital reversal rates across counties (and, sometimes, states) (Analyses 7-13, 16-18); and

  • outcomes of federal habeas appeals reviewing capital verdicts (Analysis 19).

    3. Detailed description of the main and 18 follow-up analyses.

      a. Main analysis and seven follow-up analyses explaining differences in rates of serious capital error across states.

i. Main Analysis 1: over-dispersed binomial logistic regression analysis of the probability of reversal at all three review stages combined. Our main analysis, Analysis 1, explains variation in the number of reversals at all three review stages, as a proportion of the total number of death verdicts imposed in each of the 34 study states in each of the 23 study years in which the state imposed at least one death verdict. The total number of combinations of states and years compared in this analysis (i.e., the sum of the 34 study state times the number of years out of the 23 studied in which that state imposed at least one death verdict) is 519. (Although 34 states times 23 years establishes a maximum of 782 possible "state-years," not all states imposed death verdicts under valid death-sentencing statutes in each of the 23 years.) The analysis uses an over-dispersed binomial logistic regression technique designed to explain conditions (here, rates of serious, reversible error) with a known range of possible outcomes (here, values ranging from 0 to 100%).257

This analysis tests the explanatory power of a variety of specific factors and conditions that might potentially explain variations in reversible error across time and place (e.g., court funding and caseloads). One of those factors is time as a linear trend, which asks whether a pattern of increasing or decreasing amounts of error over time explains changes in reversal rates. As is discussed above, time trend, as well as backlogs of cases awaiting review, are included to isolate the effect of unfinished review and delay, as opposed to error, on reversal rates: By lowering the number of final decisions reviewing imposed death verdicts, unfinished appeals and delay decrease the number of reversals without corresponding gains in the quality of death verdicts.258

Analysis 1 treats each of the 34 study states and each of the 23 study years as random effects. That gives these generalized factors the maximum ability to explain variance in reversal rates. Doing so helps assure that the specific explanatory factors we examine (e.g., court caseloads or funding) do not get credit for explaining variance that instead is attributable to conditions we have not studied but that vary from state to state (in which case each of the states is likely to get credit for explaining the variance that the missing factor causes) or that vary over time (meaning years may get credit for the variance the missing time-dependent factor causes).

ii. Analysis 2: over-dispersed Poisson logarithmic regression analysis of the probability of reversal at all three review stages combined. Analysis 2 uses a different statistical technique, over-dispersed Poisson logarithmic regression, which is used to explain counts of events that are relatively rare.259 Because we deflate reversal rates in this analysis (as in Analysis 1) by calculating them as a proportion of all imposed death verdicts, not all reviewed verdicts,260 the distribution of the condition we are explaining (serious, reversible errors) might reasonably be explained by an over-dispersed Poisson regression. Another advantage of the Poisson regression is that its results provide a somewhat more easily interpreted description than binomial regressions of how much one expects reversal rates to rise or fall based on a specified change in a significant factor (e.g., given an additional $100 per capita in public spending on the courts). Otherwise, Analysis 2 is similar to Analysis 1.

iii. Analysis 3: binomial regression analysis of the probability of reversal on direct appeal. Analysis 3, another follow-up inquiry, explains variation in the number of reversals at a single stage of review—state direct appeal. Limiting analysis to one stage allows us to identify factors related to reversals as a proportion of death verdicts that were fully reviewed during the study period.261 Analysis 3 (like Analyses 4 and 6 and, to some extent Analysis 5, below) thus enables us to see if:

  • reversal rates at one review stage behave differently from those at all three stages combined;

  • Analysis 1 and 2's results for all three review stages combined miss factors operating only at particular review stages; and

  • results differ when the reversal rates being studied are calculated as a proportion of reviewed, not imposed>, verdicts.

The total number of combinations of states and years compared in Analysis 3 (the sum of the 34 study states times the number of years out of the 23 studied in which each state imposed at least one death verdict that was fully reviewed at the state direct appeal stage during the study period) is 453. This analysis compares fewer state-years than the prior analyses because it explains variation in reversal rates only for state-years in which at least one death verdict was finally reviewed at the direct appeal stage during the study period, rather than focusing on the larger number of state-years in which at least one verdict was imposed during the period even if the verdict was never finally reviewed during that period. In other respects, this analysis is similar to Analysis 1.

iv. Analysis 4: Poisson regression analysis of the probability of reversal on direct appeal. Analysis 4 also explains differences in the number of reversals of death verdicts at the direct appeal stage as a proportion of fully reviewed verdicts. It is similar to Analysis 3 in other respects, as well, except that it uses a Poisson regression, rather than a binomial regression, for reasons discussed above in regard to Analysis 2.262

v. Analysis 5: Poisson regression analysis of the probability of reversal on state post-conviction review. Analysis 5 explains variation in the number of reversals of death verdicts at the state post-conviction stage. As is discussed above, we do not know how many verdicts were finally reviewed at this stage, but we do know how many were available for review in that they had been approved at the direct appeal stage that immediately precedes the state post-conviction stage.263 Analysis 5 accordingly considers the number of state post-conviction reversals of death verdicts as a proportion of the number of verdicts available for state post-conviction review after being approved on direct appeal. Analysis 5 includes 26, not 34, states—only those in which death verdicts completed state post-conviction review during the study period for which reversal data are available. The total number of combinations of states and years in this analysis is 359. Analysis 5 uses Poisson regression because of the relatively large proportion of values less than .5 being explained, given relatively low reversal rates at the state post-conviction stage and the use in this analysis of a large base number, or denominator Cthe number of verdicts available for review, not the number actually reviewed—which as we have shown depresses the reversal rates being studied.264

vi. Analysis 6: binomial regression analysis of the probability of reversal on federal habeas review. Analysis 6 explains differences in the number of reversals of death verdicts at the federal habeas stage as a proportion of the number of death verdicts that were fully reviewed at this stage. Analysis 6 includes 28 states, all those in which one or more death verdicts completed federal habeas review during the study period. The total number of combinations of states and years examined in this analysis is 161. The smaller number of observed reversal rates to be explained—161 state-years, compared to 519 for Analyses 1 and 2, 453 for Analyses 3 and 4 and 354 for Analysis 5Cmakes it more difficult for explanatory values to achieve statistical significance, especially given the treatment of states and years as random effects. Because Analysis 6 examines reversal rates on habeas that range fairly evenly from 0 to 100% (given relatively high reversal rates and the use here of the smaller of the two denominators, i.e., reviewed verdicts, not imposed verdicts), we use a binomial regression.265

vii. Analysis 14: binomial regression analysis of the probability of reversal at all review stages combined, with state but not year as a random effect. Analysis 14266 (a binomial regression) modifies main Analysis 1 in an important respect. Whereas Analysis 1 treats both the state and year in which death verdicts were imposed as random effects, Analysis 14 treats only state as a random effect. Analysis 14 thus assumes that reversal rates for all years in each state are relatively responsive to the same set of factors, without making the same assumption about reversal rates for all states in each of the 23 study years. In other words, Analysis 14 clusters the 519 observed reversal rates (one for each relevant state and year) into 34 groups based on the state where the verdicts were imposed and attempts to explain differences among the clusters. But it does not create or attempt to explain differences among cross-cutting clusters based on the year in which death verdicts were imposed. Analysis 14 thus is designed to explain differences in the 34 study states' experiences with capital reversals over the 23-year study period as a whole, with each state's experience being the composite of its reversal rates in all study years in which it imposed death verdicts.

Removing time as a consideration and comparing each state's 23-year experience with capital reversals to that of the other 33 study states makes Analysis 14 useful to policy makers who want to know how much, and why, their state's capital success and failure rates in the modern capital period as a whole differ from the success and failure rates of other capital states in the same period. A comparison of Analysis 1 and Analysis 14 also gauges the importance of time by gauging how much results differ when reversal rates are and are not clustered based on death-sentencing year.

viii. Analysis 15: Poisson regression analysis of the probability of reversal at all review stages combined, with state but not year a random effect. Analysis 15 is the same as Analysis 14, except that it uses a Poisson regression to make sure that results are not tied to the binomial method of analysis.

      b. Ten analyses explaining differences in rates of serious capital error across counties as well as states.

The analyses in this section consider whether results change, or whether more is learned about factors related to capital reversal rates, when county-level as well as state-level reversal rates are studied and when county-level as well as state-level explanations for reversal rates are tested. Because these analyses are designed to identify conditions associated with court reversals of capital verdicts, and not the conditions that produced the verdicts themselves, we analyze only those counties that imposed one or more death verdict during the study period.

i. Analysis 7: Poisson regression analysis of county explanations for county reversal rates at all review stages combined. Analysis 7 begins our county-level inquiries by comparing county reversal rates and county-level explanations for those rates among the 967 counties within the 34 study states that imposed at least one death verdict during the 23-year study period, where the year of that verdict is known.267 This analysis considers only potential county-level explanations, as well as time, but not state-level explanations, for differences in county reversal rates. Because Analysis 7 omits states altogether, even as categories within which counties are grouped for purposes of analysis, it treats each death-sentencing county in the U.S. as an entirely separate unit on a par with all other counties, regardless of whether or not the counties are in the same state. This approach is only a starting point, to identify the full range of conditions operating at the county level that might have an effect on reversal rates. Very possibly, however, introducing states back into the analysis—as we do in our subsequent county analyses—will show that states or state-level explanations turn out to explain variation that a county-only analysis at first seems to attribute to county-level conditions.

Analysis 7 explains variation in the number of reversals at all three stages of review, as a proportion of the total number of death verdicts imposed in each of relevant counties and years. The total number of combinations of counties and years compared in this analysis (the 967 study counties times the number of years out of the 23 studied in which each imposed at least one death verdict) is 3054. (Not all counties imposed death verdicts under valid capital statutes in each of the 23 years.) Analysis 7 tests the explanatory power of county-level conditions comparable to those operating at the state level that are evaluated in Analyses 1-6. It uses Poisson regression analysis and treats time as both a random and a fixed effect.268

ii. Analysis 8: county-within-state binomial regression analysis of county and state explanations for county reversal rates at all review stages combined. Like Analysis 7, Analysis 8 explains differences in county capital reversal rates, calculated as the number of reversals at all three stages of review as a proportion of the total number of death verdicts imposed in each of the 967 study counties269 and 23 study years. The total number of combinations of counties and years compared is, again, 3054. Unlike Analysis 7, Analysis 8 gauges the effect of state, as well as county, factors. In recognition of the fact that state and county explanations might operate differently from each other given their different jurisdictional levels, Analysis 8 treats county level factors as random effects and state level factors as fixed effects. In addition, each of the 967 counties is treated as a subject variable nested within the state among the 34 studied where the county is located. This analysis thus assumes that counties in the same state are more like each other than counties in different states. Analysis 8 is a binomial regression study.

iii. Analysis 9: county-within-state Poisson regression analysis of county and state explanations for county reversal rates at all review stages combined. Analysis 9 is like Analysis 8, except that it uses Poisson regression analysis.

iv. Analysis 10: county-within-state Poisson regression analysis of county and state explanations for county reversal rates on direct appeal. Analysis 10 is like Analysis 8, but studies reversals as a proportion of death verdicts actually reviewed at the single, direct appeal stage. Analysis 10 compares reversal rates for death verdicts in each of 851 counties in each of the 23 years in which at least one death verdict was imposed and was fully reviewed on direct appeal, generating 2472 observations.

v. Analysis 11: binomial regression analysis using predicted values from state analyses and other county factors to explain county reversal rates at all review stages combined. Analysis 11 is a binomial regression designed to determine how well state-level factors explain county reversal rates, while also testing county-level explanations for county reversal rates. Analysis 11 uses Analysis 1's binomial analysis of factors related to state-level reversal rates to generate predicted values for each death-sentencing county in each of the 34 study states. The predicted values are the capital error rates the state analysis predicts for each county given the number of death verdicts the county imposed. The predicted values then are examined along with other county-level factors to see which are significantly associated with county reversal rates. The predicted values are derived from 519 observed state reversal rates (34 states times each of the 23 study years in which the state imposed at least one death verdict) and are used along with other county-level factors to explain 3054 observed county reversal rates (967 capital counties times each of the 23 years in which the county imposed at least one death verdict). Reversal rates are the proportion of death verdicts imposed in the relevant county that were reversed at one of the three review stages.

There are two ways in which Analysis 11 is a more demanding test than Analyses 7-10 of the explanatory power of specific state and county-level factors:

  • In the analysis used to derive predicted values, state and year are treated as random effects. As a result, predicted values are based only on factors that significantly explain variance in reversal rates that is not explained by the state and year in which the verdict was imposed.270

  • In the analysis used to identify county-level factors that significantly explain county reversal rates, the county imposing the death verdict is treated as a random effect. As a result, specific county-level factors (including the predicted values derived from state-level factors) can achieve significance only by reliably explaining variance in county reversal rates that is not explained by the variation among the counties themselves.

vi. Analysis 12: Poisson regression analysis using predicted values from state analyses and other county factors to explain county reversal rates at all review stages combined. Analysis 12 is the same as Analysis 11 except that it is a Poisson regression technique, and it uses Analysis 2 (also a Poisson regression analysis) to derive predicted reversal rates for death-sentencing counties based on significant state-level explanations for state reversal rates.

vii. Analysis 13: county-within-state Poisson regression analysis of county reversal rates at all review stages combined, with county and state explanations as fixed effects. Analysis 13 is a Poisson regression analysis of 3054 observed county reversal rates for each of 967 counties in the nation that imposed at least one death verdict during the study period in each of the 23 study years in which the county imposed a death verdict. Reversal rates are proportions of imposed death verdicts reversed at one of the three review stages in the 23-year study period. By nesting counties in states, this study (like Analyses 8-10) assumes that counties in the same state behave more similarly than counties in different states. Analysis 13 treats potential state- and county-level explanations for county reversal rates as fixed effects and has no random effects.

viii. Analysis 16: county-within-state binomial regression analysis of county and state explanations for county reversal rates at all review stages combined, averaged over time. Like Analysis 8, Analysis 16 is a binomial regression analysis of explanations for differences in county reversal rates, in which county-level explanations are treated as random effects and state-level explanations are treated as fixed effects, and in which reversal rates are the proportion of each county's imposed verdicts that were reversed at one of the three review stages. Unlike in Analysis 8, the county reversal rates being explained in Analysis 16 are each county's total number of reversals during the entire 23-year study period (not the number reversed in each death-sentencing year) divided by its total number of death verdicts. Analysis 16 clusters counties into 34 groups based on the state where each county is located. This technique assumes that counties in the same state behave more similarly than counties in different states. As a result of these steps, Analysis 16 can be understood as examining differences in the 34 study states' experiences with capital reversals during the entire 23-year study period, with each state's capital experience being the composite of its counties' experiences during the period.

By taking a single picture of each state's 23-year experience with capital reversals, Analysis 16 (like state-only Analyses 14-15 above271) allows a more direct examination of differences in the 34 states' experiences in the modern capital era. Because Analysis 16 omits time as a consideration, it includes 35 additional counties that imposed death verdicts in the study period but in an unknown year. This increases to 1002 the number of observed county reversal rates being explained.

ix. Analysis 17: county-within-state Poisson regression analysis of county and state explanations for county reversal rates at all review stages combined, averaged over time. Analysis 17 is like Analysis 16, except that it uses a Poisson regression analysis.

x. Analysis 18: Poisson regression analyses of county reversal rates at all review stages combined in Florida, Georgia and Texas. Analysis 18 is comprised of three Poisson analyses—one each for Florida, Georgia and Texas—of county-level factors that explain variation in reversals among each state's death-sentencing counties in each year in which the county imposed at least one death verdict. We chose Florida, Georgia and Texas because they are the three states with the highest number of death verdicts that underwent at least one stage of review in the 23-year study period and because each imposed death sentences in all 23 study years. County and year are treated as random effects, and county-level explanatory factors are treated as fixed effects.

c. Analysis 19: case-level logistic regression study of factors associated with capital federal habeas reversals.

Analysis 19 supplements the prior analyses' explanations of differences in reversal rates among states and counties with a logistic regression study of differences in the outcomes of particular capital federal habeas cases. Using a data base containing hundreds of items of data we collected on each of the nearly 600 capital cases reviewed on federal habeas between 1973 and 1995, Analysis 19 examines circumstances about defendants, victims, lawyers, judges, court procedures, evidence and timing to see whether they predict the probability that capital verdicts will be reversed at that stage.

4. Eleven inter-related tests of factors' success in explaining variation in rates of serious, reversible error.

a. Three tests applied to individual explanatory factors.

We already have mentioned two tests we apply to any factor or condition that might help explain increases or decreases in the amount of reversible capital error across states and counties or over time: statistical significance and validation by multiple analyses.

i. Statistical significance. The more times we observe how two conditions relate to each other (say, a high number of death verdicts per 1000 homicides and a high probability that any given death verdict is seriously flawed), the more confident we can be that any relationship between the conditions we think we observe is consistent and cannot be explained simply as a chance product of the variation among states or counties. The same is true, the more closely changes in one condition track changes in the other. Statistical significance is a test that uses these criteria—number of observations, clarity and closeness of the observed relationship—to calculate the degree of confidence we can have that the observed relationship could not have occurred by chance. We follow the usual practice of starting to pay attention if the probability that the observed relationship could have occurred by chance is less than 10%, and using results as bases for conclusions when that probability is less than 5%.

ii. Consistency of significance across analyses. Different statistical analyses assume different things about events being compared across place and time—e.g., that they are rare or common; that they have a yes-or-no form (as is true of the outcomes of capital habeas cases, which are either approved or not) or have a range of possible outcomes (as is true of reversal rates); that the range of possible outcomes is bounded or infinite; and that the regression relationship can be expressed on the logistic or logarithmic scale (based on different expected distributions of the outcomes being explained). Where, as is usually true, the relevant events do not follow any of these patterns perfectly, it helps to use multiple analyses to determine whether factors that in one analysis have a significant relationship to the condition under study reveal the same relationship when a different analysis is used.272 The more ways researchers analyze data, the more confidence they have in results supported by most or all analyses. On the other hand, relationships that are not robust—i.e., that only occasionally appear— are less convincing.

As we note above,273 we begin by identifying the empirically and theoretically soundest and most comprehensive methods for analyzing our detailed data about state and county capital reversal rates (Analysis 1, discussed at pp. 99-100 above and pp. 146-91 below). Then, to provide a strict test for robustness, and for whether our results reflect actual relationships in the data between significant explanatory factors and capital reversals and reversal rates and are not dependent on particular study methods, we conduct 18 follow-up analyses that:

  • use different statistical techniques with different assumptions and capacities;

  • vary the power of general factors (e.g., state, time), and specific factors (e.g., homicide rates and death-sentencing patterns) to explain variance;

  • calculate rates using all possible base numbers (all death verdicts, only those available for review at a particular stage or only those actually reviewed);

  • examine decisions made at different stages of the review process (direct appeal, post-conviction, federal habeas and all three combined);

  • consider counties, as well as states, as the possible location of explanatory factors;

  • treat counties as independent, and as substantially influenced by the relevant state, and treat states as independent, and instead, as collections of counties;

  • compare states in each relevant year and as composites of all years; and

  • treat time as the unit for measuring states' and counties' activities, as sporadically important (to see whether particular events have a big effect on the amount of serious error found by the courts), and as continuously important (to see whether the amount of serious error increases or decreases over time).

For each major set of analyses, we conclude by arraying all outcomes in a summary table showing which factors are usually significantly related to serious error and which are only occasionally related to error.274 We emphasize the former factors in making findings and reform proposals. Sometimes, however, a finding of only one analysis is logically explained by the condition being studied in that, but no other, analysis that distinguishes it from all others. In that case, we note the explanatory factor and the evident reason for its significance in one context but not others.

iii. Consistency of significance within analyses. As we note above, reversal rates and a potentially explanatory factor may have about the same pattern of increases and decreases over place and time and yet not be causally linked. Both conditions may be independently reacting to a third factor.275 To probe relationships that appear to be significant, we test the effect—in each of our separate analyses—of many different possible explanatory factors and combinations. In eight analyses, we carry the process a step further, simultaneously testing the effect not only of a variety of different factors but also the effect of the same factor measured at state and county levels. For instance, homicide rates might have some effect on reversal rates; if so, one may wonder whether local or state homicide rates (or both) have that effect. In those eight studies, we examine both.

Some factors are significantly related to capital reversal rates, whether or not other factors or combinations of them are included in the study. Other factors only sporadically appear to be significant, depending on the other factors being tested. When factors follow the latter pattern, we drop them from consideration. We also drop factors that are never significant. The factors we consider are discussed on pp. 135-40 below and described more fully in Appendix E.

      b. Six tests applied to overall sets of explanatory factors.

The tests listed above consider the strength of individual explanatory factors. As we have noted, it is important to examine more than one factor at a time.276 Doing so tests the strength of the relationship between a factor and reversal rates by revealing whether the factor is significant when other factors are also considered. Doing so also reflects the fact that harmful conditions usually have several interrelated explanations—which must be studied together to be understood. We do not stop, therefore, with tests of the consistent significance of individual explanatory factors. Instead, we add four overlapping tests of the strength and consistency of sets of explanatory factors.

i. Five within-analysis tests: fit and explained variance compared to a baseline and to other sets of explanations. The first five tests compare each set of explanatory factors to all other sets analyzed within a particular analysis (i.e., all others analyzed using the same statistical technique). The set of factors that does best overall on the five tests is the best set of explanations the study or statistical technique can provide when applied to the available data.

These tests first ask if there is enough variation among places and times to study. If so, they ask additional questions about the overall explanation for reversal rates that the set of factors provides:

  • Does the explanation fit the data better or worse than the explanation provided by the place (state, county) and year in which the relevant death verdict was imposed—i.e., when no other, more specific factors are analyzed? Lack of fit is gauged by the combined distance between the amounts of reversible error the set of explanatory factors predicts would occur in each place and year, and the amounts actually observed there and then. (If more places and times are studied, lack of fit will be higher, because there are more "distances" between each actually observed value and each predicted value to be added up. As a result, fit can only be compared within, not across, analyses.) When two sets of explanatory factors using the same statistical technique are compared, the set that overall is closer to the actually observed outcomes is better. This fit comparison starts with a baseline analysis of the effect on reversible error of (1) the state where the death verdict was imposed, (2) the year it was imposed and (3) any trend in reversal rates over time. The test asks whether the amount of reversible error predicted by a given set of specific factors is, overall, closer to the observed amount of error than the amount predicted by the baseline analysis of only state, year and the trend of reversal rates over time and, if so, whether the improvement is statistically significant.277

    • Using a different statistical measure of a related condition, as explained further in the accompanying note, we ask: Does the explanation account for more variation among reversal rates across place and time than the explanation given by the baseline analysis of state, time and trend?278

    • Does the explanation fit the data better or worse than the explanations provided by other sets of specific explanatory factors? Here, the fit analysis is the same as above but the explanation generated by each set of factors is compared, not to the explanation generated by the baseline inquiry, but to the explanations generated by other sets of specific factors.

    • Does the explanation account for more of the variation across place and time than the explanations provided by other sets of specific factors?

ii. A cross-analysis test: consistently favorable fit and explained variance. Our confidence in the explanation for reversible error provided by a given set of factors is further enhanced if the various fit and explained variance measures indicate that the group is one of the best overall explanations not only within a particular analysis (i.e., in a comparison of sets of explanatory factors using the same statistical technique) but also among multiple analyses (i.e., in comparing sets of factors using more than one statistical technique).

      c. A tenth test: gauging the size of the effect of each significant explanatory factor, holding other factors constant.

We focus only on statistically significant relationships between serious, reversible error and potentially explanatory factors279Cand only then if confidence in their importance is confirmed by the other tests described above. But even if an explanatory condition is significantly related to error rates—meaning an increase in one tends to coincide with increases or decreases in the other Cthe size of the effect may be too small to warrant attention. If, for example, a 500% increase in per capita funding of courts is associated with a 1% decrease in serious capital error, the relationship between funding and error is not interesting, even if the relationship is highly significant (in the sense that it is highly unlikely the relationship could appear by chance).

The regression techniques we use allow researchers to estimate effect size, which answers the following question: Taking into consideration all explanatory factors tested in an analysis, how much of an increase or decrease in reversal rates is expected to occur if a given factor is increased or decreased by a specified amount? The analyses we use generate estimates of the increase or decrease in reversible error associated with each measurable increase or decrease in the explanatory factor.280

i. Effect-size estimates for binomial analyses. About half our analyses are binomial regressions.281 Those analyses predict that, for each increase of one unit in the value of an explanatory factor— taking all other factors in the analysis into account—the "odds" that a death verdict imposed in that jurisdiction will be reversed change by a factor of x, where x is the amount of the effect-size estimate reported in our results. If that estimate is greater than 1, the analysis predicts that the odds of reversal increase as the value of the factor increases; if the estimate is less than 1, the analysis predicts that the odds of reversal decrease as the value of the factor increases. As an example, consider a binomial analysis in which: increases in homicide rates are significantly associated with increases in capital error rates; homicide rates range from 1 to 10 homicides per 100,000 residents in different states and years; and the effect-size estimate for the homicide rate is 1.4. In that event, the analysis predicts that for each increase of 1 in the number of homicides per 100,000 residents, the "odds" that a death verdict will be reversed increase by a factor of 1.4. An example is given in the accompanying endnote.282 The formula differs for explanatory factors whose values have been logged.283

To help readers interpret effect-size estimates, we often graph the predicted reversal rate associated with each of the range of values for particular explanatory factors that different states in our study have. See, e.g., Figures 22A, 22B, p. 175 below. In graphing the effect size of homicide rates, for example, the range of homicide rates in all study states and years is indicated on the horizontal (x) axis, and the range of possible reversal rates from 0 to 100% is indicated on the vertical (y) axis. For binomial analyses, each point on the line of the graph represents the predicted reversal rate indicated on the vertical axis, for a state and year in which the homicide rate is the value indicated on the horizontal axis—assuming all other explanatory factors are held constant at their average value.284 Another way to say this is that each point on the graph identifies the predicted probability that a death verdict will be reversed, as indicated on the vertical axis, in a state that has the homicide rate indicated on the horizontal axis and otherwise is average in all respects. If a point on the graph corresponds to 4 on the horizontal axis and 49 on the vertical axis, the predicted capital reversal rate for states with 4 homicides per 100,000 residents is 49%. Or, put the other way, the predicted probability that a death verdict will be reversed in a state with 4 homicides per 100,000 residents is 49%, other factors equal.

ii. Effect-size estimates for Poisson analyses. The effect-size estimates our Poisson regression analyses generate are interpreted differently. Here, the analysis predicts that for each increase of one in the explanatory factor—and taking all other factors into account—the rate (not the odds) of reversal increases by a factor of x, with x being the effect-size estimate. Assume, again, that an analysis—now a Poisson analysis—finds that increases in homicide rates are significantly associated with increases in capital error rates, where homicide rates range from 1 to 10 homicides per 100,000 residents in different states and years, and where the effect-size estimate for the homicide rate is 1.4. In that event, the analysis predicts that for each increase of 1 in the number of homicides per 100,000 residents, the capital reversal rate will increase by a factor of 1.4. An example is given in the accompanying endnote.285 The formula is different for explanatory factors whose values have been logged.286

Sometimes, we graph Poisson effect sizes. See, e.g., Figures 22C and 22D, p. 175 below. These graphs have a different interpretation from the binomial graphs, as is indicated by the different label of the vertical (y) axis. On these graphs, it is not possible to link a particular value for the relevant explanatory factor as indicated on the horizontal (x) axis (say a particular state's homicide rate) to a particular reversal rate indicated on the vertical axis. Instead, these graphs are interpreted by comparing two points on the horizontal axis and calculating the percent change in the associated points on the vertical axis.287 Suppose the graph indicates that a homicide rate per 100,000 residents of 5, indicated on the horizontal axis, has a corresponding value of .2 on the vertical axis, and a homicide rate per 100,000 residents of 6 (indicated on the horizontal axis) has a corresponding value of .4 on the vertical axis. The graph shows that the predicted reversal rate increases by 100% as the homicide rate per 100,000 residents increases from 5 to 6Cassuming other factors hold steady at their averages.288 Or, one could say that the predicted probability of reversal of any death verdict increases by 100% as the homicide rate per 100,000 residents rises from 5 to 6, holding other factors constant.

      d. Analogous tests for case-level analyses.

The diagnostic tests described above apply to binomial and Poisson regression analyses of factors explaining differences in capital reversal rates from one place (state or county) to another. Analysis 19 is instead a logistic regression study of factors associated with decisions reversing, as opposed to approving, death verdicts in 600 federal habeas cases. In presenting Analysis 19 below, we use measures of fit and effect size that are analogous to but not the same as the tests described above. The analogous tests are discussed in connection with Analysis 19 below.289

      e. A final test: does the explanation square with common sense and experience?

Statistical studies aid judgment, but they are no substitute for it. Even if explanations for reversible error do well on the tests listed above, they still do not qualify as bases for firm conclusions if they don't jibe with common sense and experience. Even if explanatory factors have a significant, robust and sizeable statistical relationship to serious, reversible error, therefore, we do not rely on that relationship unless we can give a reasoned and practical account of how the two are related.