Most organizations try to improve decisions by training people to think better.
But in complex environments, that strategy may be targeting the wrong problem.
The real bottleneck is not belief accuracy. It is problem representation.
One Core Claim: Minimal Enforceable Decision Structure Dominates Individual Debiasing in Reducing Decision Error under High Complexity
A Testable Theory of Organizational Decision‑Making Under Epistemic Load
Accountability‑Based Universal Wisdom and Trust
Cross‑Sector Pre‑Decision Governance Translator
March 2026
License: CC BY‑NC‑SA 4.0
Contact: tpapgtk@gmail.com
---
Abstract
This paper presents a pre‑registered study design and theoretical framework. No empirical data have been collected as of this writing. All numerical illustrations (e.g., effect size of 28%) are hypothetical reconstructions based on theoretical assumptions and are provided solely to demonstrate feasibility and power calculations. The actual study, when conducted, may yield different results. We propose and experimentally test a single core claim: in high‑complexity organizational decisions, a minimal enforceable decision structure (EDS) dominates individual debiasing in reducing economically meaningful decision error (≥5% reduction in standardized outcome or equivalent cost‑adjusted loss). We formalize this as a regime in which the marginal value of expanding the representation space exceeds the marginal value of improving belief accuracy within a fixed representation. EDS expands the set of considered representations while simultaneously constraining it to a structured, tractable subset—a constrained expansion of the representation space under bounded evaluation capacity. EDS should be interpreted as a minimally enforceable governance protocol that operationalizes representation expansion under compliance constraints. Our comparison is between enforceable governance and non‑enforceable cognition, not between structure and cognition in isolation. Our result should be interpreted as a dominance of enforceable representation‑expanding protocols over non‑enforceable belief‑correction interventions under high epistemic load. The relevant counterfactual is not ideal debiasing, but debiasing as it is realistically deployed in organizations. Using a pre‑registered randomized controlled trial with 40 organizational units, we vary (i) decision complexity (measured by pre‑specified binary thresholds, a continuous index validated against external criteria, and an exogenous complexity shock) and (ii) intervention type (EDS vs. individual debiasing vs. control). We predict that results will be consistent across (i) binary thresholds, (ii) continuous index, and (iii) exogenous complexity shock. Our claim is strictly comparative across implementable bundles and does not identify a primitive causal decomposition between structure and cognition. Our estimand is not a primitive causal parameter, but a policy‑relevant comparison between enforceable and non‑enforceable interventions—i.e., the relevant margin faced by organizations. Thus, our contribution is to identify dominance in implementable intervention space, not to decompose cognition and structure. Formally, our estimand is \mathbb{E}[Y \mid \text{EDS}, \text{High Complexity}] - \mathbb{E}[Y \mid \text{Debiasing}, \text{High Complexity}] . We hypothesize that the effect is increasing in the epistemic gap between actual decision complexity and the organization’s existing governance capacity, which we operationalize as the unpriced risk in the organization’s decision architecture (residual variance unexplained by existing governance controls). We define a minimal enforceable decision structure as the lowest‑cost protocol that (i) expands the representation space and (ii) is verifiably enforceable at the decision instance level. We do not claim optimality; only sufficiency for dominance under high epistemic load. We measure frame completeness as coverage of an ex‑ante validated relevance set and predictive sufficiency. Frame completeness is not intended to approximate a true underlying representation, but to serve as an instrument with demonstrated predictive dominance over baseline heuristics under ex‑ante constraints. Frame completeness is validated not by coverage alone, but by its incremental predictive contribution relative to (i) baseline heuristics and (ii) unstructured elicitation, under cross‑validation and temporal holdout. We do not interpret coefficients on frame completeness causally; they serve as discriminating evidence across mechanism classes. Mechanism identification relies on experimental variation in protocol components (Appendix C), not on observational mediation. An effort‑based explanation would predict increased cognitive load and decision time without systematic improvements in frame completeness; our design will reject this class of explanations. A pure coordination mechanism predicts reduction in belief dispersion without systematic increases in frame dimensionality; we will test and predict to reject this prediction. A pure attention mechanism predicts uniform improvements across complexity regimes; our interaction design will reject this. We do not identify the primitive causal channel; however, we will provide discriminating evidence that eliminates a broad class of alternative mechanisms: effort, coordination, and attention. When the space of plausible problem representations grows faster than the capacity to evaluate them, constraining the representation space dominates improving belief accuracy within any given representation. Our operationalization of complexity captures combinatorial growth in the representation space, not merely stochastic uncertainty. The defining feature of high complexity in our framework is that marginal variables increase interaction dimensionality rather than additive variance. We hypothesize that a substantial share of error originates at the problem representation stage. This has direct implications for organizational design: under high epistemic load, investments in decision structure may yield higher returns than investments in individual training. External validity follows from a structural invariance condition: whenever the growth rate of the representation space exceeds the evaluation capacity of the decision system, representation‑constraining interventions will dominate belief‑improving interventions. Our predictions imply that a large class of behavioral interventions may be targeting a non‑binding constraint in complex organizational environments. While prior literature studies debiasing and structured decision tools separately, no study provides a pre‑registered causal comparison across experimentally varied complexity regimes. We do not imply that cognition is unimportant; rather, under high epistemic load, improving cognition within a mis‑specified representation yields lower marginal returns than restructuring the representation itself.
---
1. Introduction
This paper presents a pre‑registered study design and theoretical framework. No empirical data have been collected as of this writing. All numerical illustrations (e.g., effect size of 28%) are hypothetical reconstructions based on theoretical assumptions and are provided solely to demonstrate feasibility and power calculations. The actual study, when conducted, may yield different results. We hypothesize that under high epistemic load, the binding constraint on decision quality shifts from belief accuracy to problem representation. As a result, enforceable decision structures—rather than individual debiasing—will deliver larger reductions in decision error. EDS expands the set of considered representations while simultaneously constraining it to a structured, tractable subset—a constrained expansion of the representation space under bounded evaluation capacity. EDS should be interpreted as a minimally enforceable governance protocol that operationalizes representation expansion under compliance constraints. Our comparison is between enforceable governance and non‑enforceable cognition, not between structure and cognition in isolation. Our result should be interpreted as a dominance of enforceable representation‑expanding protocols over non‑enforceable belief‑correction interventions under high epistemic load. The relevant counterfactual is not ideal debiasing, but debiasing as it is realistically deployed in organizations. Most theories of organizational decision‑making fall into two families: those that focus on individual cognitive biases (behavioral economics, Kahneman & Tversky) and those that focus on institutional structures (organizational economics, Williamson). Yet the literature lacks a sharp test of when and why one mechanism dominates the other. While prior work studies structured decision tools and debiasing separately, there is no causal test of their relative performance across complexity regimes. We provide that test. We hypothesize that the primary bottleneck in decision quality shifts from belief accuracy to problem representation.
We propose and test a single, crisp claim:
In high‑complexity environments, a minimal enforceable decision structure (EDS) will dominate individual debiasing in reducing decision error. In low‑complexity environments, both will be equally effective.
This claim is:
· Falsifiable – it can be rejected by a properly designed experiment.
· A sharp boundary condition – it specifies the domain where debiasing’s marginal returns collapse and structural interventions become the binding constraint.
· Directly testable using a randomized controlled trial with organizational units.
Interpretation and Limits of the Estimand (read once). Our design compares two bundled interventions; it does not isolate a pure “structure” effect. The comparison is between realistically implementable interventions as deployed in organizations, where structured protocols are enforceable while cognitive training typically is not. Our estimand is not a primitive causal parameter, but a policy‑relevant comparison between enforceable and non‑enforceable interventions—i.e., the relevant margin faced by organizations. Thus, our contribution is to identify dominance in implementable intervention space, not to decompose cognition and structure. We do not claim that a fully enforceable debiasing intervention would not perform similarly. Our claim is strictly comparative across implementable bundles and establishes a dominance relation in policy‑relevant intervention space. This aligns with a policy‑relevant reduced‑form tradition where identification of dominance in implementable intervention space is sufficient for welfare‑relevant inference.
Under high complexity, the space of possible frames grows combinatorially, making marginal improvements in belief accuracy second‑order relative to errors in problem representation. Our operationalization of complexity captures combinatorial growth in the representation space, not merely stochastic uncertainty. The defining feature of high complexity in our framework is that marginal variables increase interaction dimensionality rather than additive variance. We hypothesize that errors in problem representation (frame mis‑specification) are a key driver of errors, but we do not fully exclude alternative interpretations such as improved coordination or attention allocation. Debiasing operates within a given representation, while EDS expands the representation space itself. We measure frame completeness as coverage of an ex‑ante validated relevance set and predictive sufficiency. Frame completeness is not intended to approximate a true underlying representation, but to serve as an instrument with demonstrated predictive dominance over baseline heuristics under ex‑ante constraints. Frame completeness is validated not by coverage alone, but by its incremental predictive contribution relative to (i) baseline heuristics and (ii) unstructured elicitation, under cross‑validation and temporal holdout. We do not interpret coefficients on frame completeness causally; they serve as discriminating evidence across mechanism classes. Mechanism identification relies on experimental variation in protocol components (Appendix C), not on observational mediation. Because any relevance set may reflect expert bias, we will report results across multiple independently elicited panels, show robustness to leave‑one‑panel‑out constructions, and benchmark against a baseline heuristic model. We distinguish between gross cognitive effort and effective cognitive load: while EDS may increase procedural effort, it reduces unstructured cognitive burden, allowing us to reject effort‑based explanations. A pure coordination mechanism predicts reduction in belief dispersion without systematic increases in frame dimensionality; we will test and predict to reject this prediction. A pure attention mechanism predicts uniform improvements across complexity regimes; our interaction design will reject this. We do not identify the primitive causal channel; however, we will provide discriminating evidence that eliminates a broad class of alternative mechanisms: effort, coordination, and attention. If debiasing improves performance in low complexity but not in high complexity, this will imply that belief accuracy is not the binding constraint under high complexity. When the space of plausible problem representations grows faster than the capacity to evaluate them, constraining the representation space dominates improving belief accuracy within any given representation. Our contribution is to identify a regime in which the primary bottleneck shifts from belief accuracy to problem representation. To our knowledge, this is among the first pre‑registered randomized tests explicitly designed to compare structured protocols and debiasing across experimentally varied complexity regimes.
The paper is structured as a single testable hypothesis, supported by a minimal formal model, a clear identification strategy, and a pre‑registered experimental design. The decisions we study are mid‑to‑high stakes, multi‑actor, and involve forecastable outcomes within a 6–12 month horizon. Typical examples include procurement decisions, policy design, strategic planning, and investment committee deliberations. The theory does not extend to one‑shot irreversible decisions or purely technical optimization problems. Practitioners can diagnose applicability by testing whether adding one additional factor increases the dimensionality of the decision space non‑linearly (e.g., via interaction growth or forecast instability). Our sharp testable implication is:
Condition Standard, Behavioral View (Modern), and Our Prediction:
- Low complexity (Debiasing works) : Same (both reduce error vs control)
- High complexity (Debiasing attenuates) : EDS will dominate in reducing decision error
This paper contributes:
1. A direct causal comparison – among the first pre‑registered randomized tests comparing structured protocols and debiasing across experimentally varied complexity regimes.
2. Identification of a regime shift – hypothesizing that under high epistemic load, the primary bottleneck moves from belief accuracy to problem representation.
3. A policy‑relevant dominance result – demonstrating that, within the space of realistically enforceable organizational interventions, structured protocols will dominate individual debiasing under high complexity.
---
2. Hard Core (Reformulated as Non‑Obvious Structural Claims)
We define three hard‑core propositions that are distinctive and falsifiable:
Hard Core & Statement:
HC1* Errors in complex organizational decisions are primarily driven by errors in problem representation (frame mis‑specification) rather than by incorrect beliefs conditional on a given frame. We interpret HC1 as a dominant empirical regularity within the tested regime rather than a universally identified primitive causal mechanism.*
HC2* Interventions targeting the structure of pre‑decision deliberation dominate interventions targeting individual cognition when epistemic load exceeds a threshold L.
HC3* There exists a regime where increasing the quantity of information without structured processing worsens decision quality (an “information overload trap”). (HC3* is presented as a secondary supporting test in Appendix B.)
These claims are:
· Non‑trivial – they are not shared by standard behavioral economics or organizational theory.
· Testable – each implies a specific experimental prediction.
· Falsifiable – a single well‑designed experiment can reject them.
---
3. A Minimal Reduced‑Form Model (Identifiable and Estimable)
We model decision quality as a function of intervention type and complexity. To absorb time‑invariant organizational heterogeneity and account for within‑unit serial correlation, we will include unit fixed effects and cluster standard errors at the unit level. Let:
Y_{ijt} = \alpha + \beta_1 \cdot \text{EDS}_{ij} + \beta_2 \cdot \text{Debias}_{ij} + \beta_3 \cdot \text{Complexity}_{ijt} + \beta_4 \cdot (\text{EDS} \times \text{Complexity})_{ijt} + \mu_j + \epsilon_{ijt}
where:
· Y_{ijt} = decision outcome quality for decision i in unit j at time t. Because ex‑post outcomes are noisy and context‑dependent, we will use a standardized composite outcome as the primary measure: Y_{ijt} = \frac{1}{3}(z(\text{error}) + z(\text{cost overrun}) + z(\text{reversal})), where z denotes standardization across decisions. All components are signed such that higher values indicate worse outcomes prior to standardization. We pre‑registered equal weighting to avoid ex‑post researcher degrees of freedom; results will be robust to alternative weighting schemes (Appendix). The composite reduces measurement noise while preserving economic meaning.
· \text{EDS}_{ij} = indicator for assignment to the minimal enforceable decision structure.
· \text{Debias}_{ij} = indicator for assignment to individual debiasing training.
· \text{Complexity}_{ijt} = indicator for high‑complexity decision environment, measured by pre‑specified binary thresholds (see Section 4.2) and manipulated exogenously in a subset (Section 4.2.2). All complexity measures are defined using pre‑treatment observables and are orthogonal to treatment assignment by construction. We predict that results will be consistent across (i) binary thresholds, (ii) continuous index, and (iii) exogenous complexity shock.
· \mu_j = unit fixed effects.
· Standard errors will be clustered at the unit level. Inference will rely primarily on randomization inference, which remains valid in finite samples with small numbers of clusters. While the number of clusters is moderate, our design prioritizes internal validity and identification of a theoretically decisive interaction.
Key prediction: \beta_4 > 0. That is, the marginal benefit of EDS over debiasing will be increasing in complexity. Under high complexity (\text{Complexity}=1), the total effect of EDS will be \beta_1 + \beta_4, and we predict \beta_1 + \beta_4 > \beta_2.
Falsification condition: If \beta_4 \leq 0 or \beta_1 + \beta_4 \leq \beta_2 in a well‑powered RCT, or if the estimated effect size is economically negligible (≤5% reduction in error rate), our core claim will be rejected. All thresholds are specified ex ante in the pre‑analysis plan.
---
4. Experimental Design: The Crucial Test
4.1 Units and Randomization
· Units: 40 organizational units (teams, departments) from diverse sectors (public, private, non‑profit), stratified by size and baseline error rate. To minimize spillover, units are drawn from geographically separated locations or from organizations that do not regularly interact; we will also implement an exposure‑mapping framework (Aronow & Samii, 2017) with network‑based robustness tests (Section 4.9).
· Randomization: Units will be randomly assigned to one of three arms:
1. EDS arm: Units receive training and implement a simple pre‑decision protocol: before each strategic decision, they must (i) explicitly state the decision frame, (ii) list 3–5 key assumptions, (iii) generate at least two alternative framings, and (iv) document any dissenting views. Time cost: ≤ 15 minutes per decision. No explicit debiasing language is used.
2. Debiasing arm: Units receive training on common cognitive biases (overconfidence, confirmation bias, anchoring) and are taught debiasing techniques (consider opposite, pre‑mortem, etc.). This arm does not receive the structural protocol. The training does not alter meeting structures.
3. Control arm: No intervention.
4.2 Complexity Measurement (Pre‑Registered and Exogenous)
4.2.1 Primary: Binary Classification and Continuous Index
We pre‑register three binary criteria:
Criterion, Threshold, & Source:
- Number of interdependent variables (≥5) : Pre‑pilot survey of decision characteristics
- Interdependence score (≥3 on a 1–5 rubric (IRR > 0.8)) : Independent coders using pre‑registered rubric
- Uncertainty Historical variance (> 0.3 or expert judgment) : Historical data or Delphi panel
A decision is classified as high complexity if it meets at least two of the three criteria. All thresholds are pre‑registered and chosen based on pilot distributional properties, not tuned to maximize treatment effects. We will also compute a continuous complexity index using principal component analysis (PCA) of the three standardized components, used in robustness checks and external validation. All results are expected to be robust to alternative definitions of complexity, including each individual component and leave‑one‑out indices (Appendix).
We distinguish complexity from difficulty: complexity reflects combinatorial expansion of interdependent factors, not merely variance or noise. Our operationalization of complexity captures combinatorial growth in the representation space, not merely stochastic uncertainty. The defining feature of high complexity in our framework is that marginal variables increase interaction dimensionality rather than additive variance. We hypothesize that marginal addition of one variable will increase forecast instability non‑linearly under high complexity.
4.2.2 Exogenous Complexity Shock (Manipulation)
To move beyond measurement and establish causality, we will introduce an exogenous complexity shock in a randomly selected subset of high‑complexity decisions. For half of the decisions classified as high complexity, we will exogenously inject additional interdependent variables (e.g., three new factors that interact with existing ones) that the decision‑making team must consider. The injected variables are drawn from historically realized factors that were ex‑ante plausible but omitted in comparable past decisions. These injected variables do not alter the payoff structure beyond realistic uncertainty expansion, preserving ecological validity. The source of additional variables will not be disclosed to participants, and they will be embedded within standard decision briefs to ensure they are treated as endogenous elements of the decision environment. This injection will be done by the research team and is orthogonal to unit characteristics. This allows us to test whether the EDS effect is amplified when complexity is not just observed but experimentally increased. Our identification does not rely on the experimental augmentation; all main results will be replicated on naturally occurring high‑complexity decisions (pre‑registered primary robustness).
4.2.3 External Validation of Complexity Construct
To ensure our measure reflects genuine difficulty, we will test its correlation with:
· Decision time (higher complexity → longer deliberation)
· Disagreement level (higher complexity → more divergent views)
· Forecast variance (higher complexity → wider prediction intervals)
These correlations are pre‑registered and, if confirmed, will validate that our measure captures real complexity rather than an artifact of construction.
4.3 Outcomes: Decomposing Decision Quality with Mechanically Anchored Measures
Process quality (mechanism):
· Number of assumptions explicitly identified
· Number of alternative options generated
· Presence of documented dissent (binary)
· Cognitive load – measured via the NASA‑TLX instrument, decision time variance, and entropy of discussion (text analysis). To test cognitive load as a causal mechanism, we will estimate:
\text{Load}_{ijt} = \theta_0 + \theta_1 \text{EDS}_{ij} + \theta_2 \text{Debias}_{ij} + \theta_3 \text{Complexity}_{ijt} + \theta_4 (\text{EDS} \times \text{Complexity})_{ijt} + \mu_j + \epsilon_{ijt}
with the prediction \theta_4 < 0 (EDS reduces cognitive load under high complexity). We will then test whether reductions in cognitive load mediate the effect using sequential g‑estimation (Appendix). We distinguish between gross cognitive effort and effective cognitive load: while EDS may increase procedural effort, it reduces unstructured cognitive burden.
To rule out effort‑based explanations, we will include a placebo structure arm that equalizes time and procedural effort without altering framing (debiasing + forced checklist without reframing). This arm is reported in Appendix and will confirm that the framing component, not mere effort, drives the effect.
Ex‑ante decision quality (mechanically anchored):
We will use three complementary objective measures:
1. Forecast accuracy (Brier score): Teams will make probabilistic predictions (e.g., probability of success, cost estimate) before the decision. We will compute the Brier score or log score against realized outcomes.
2. Frame completeness index – Three‑Layer Measurement:
We measure frame completeness as coverage of an ex‑ante validated relevance set and predictive sufficiency. Frame completeness is not intended to approximate a true underlying representation, but to serve as an instrument with demonstrated predictive dominance over baseline heuristics under ex‑ante constraints. Frame completeness is validated not by coverage alone, but by its incremental predictive contribution relative to (i) baseline heuristics and (ii) unstructured elicitation, under cross‑validation and temporal holdout. We do not interpret coefficients on frame completeness causally; they serve as discriminating evidence across mechanism classes. Mechanism identification relies on experimental variation in protocol components (Appendix C), not on observational mediation. Because any relevance set may reflect expert bias, we will report results across multiple independently elicited panels, show robustness to leave‑one‑panel‑out constructions, and benchmark against a baseline heuristic model.
· Layer 1 (ex‑ante fixed relevance set): A Delphi panel of domain experts, blind to treatment, identifies the key factors that should be considered for each decision type. This list is fixed before outcomes are known.
\text{FrameCompleteness}_{exante} = \frac{\text{\# of ex‑ante fixed relevant factors identified}}{\text{\# of ex‑ante fixed relevant factors}}
· Layer 2 (ex‑post revealed relevance – robustness): Using regression analysis after outcomes are known, we will identify factors that significantly influenced the outcome. This index will be reported only as a robustness check.
· Layer 3 (predictive sufficiency): We will test whether the set of factors identified by the team can predict the outcome (using a model trained on ex‑ante factors). To avoid conflating predictive adequacy with causal validity, we will complement predictive sufficiency with stability tests across subsamples and exclude post‑treatment variables from the feature set.
3. Ex‑post outcome quality (primary):
· Standardized composite outcome (see Section 3) as primary.
· As secondary robustness, we will also report the binary error rate (cost overrun >20% or reversal within 6 months), cost overrun magnitude, and reversal rate individually.
Coordination vs framing test: We will compute the variance of beliefs across team members (pre‑decision forecast dispersion). A pure coordination mechanism predicts reduction in belief dispersion without systematic increases in frame dimensionality; we will test and predict to reject this prediction. Results will be reported in Appendix.
Horse race mechanism test for HC1 (secondary evidence):* We will estimate:
Y_{ijt} = \gamma_0 + \gamma_1 \text{FrameCompleteness}_{ijt} + \gamma_2 \text{Alternatives}_{ijt} + \gamma_3 \text{Assumptions}_{ijt} + \text{controls} + \epsilon_{ijt}
If HC1* holds, \gamma_1 should be significantly larger than \gamma_2 and \gamma_3 under high complexity. This will provide evidence consistent with the hypothesized mechanism, but we will not claim definitive causal identification of the primitive channel. We will not identify the primitive causal channel; however, we will provide discriminating evidence that eliminates a broad class of alternative mechanisms: effort, coordination, and attention.
All outcomes will be coded by researchers blind to treatment assignment. The primary outcome is the standardized composite; all other outcomes are pre‑registered as secondary or mechanism outcomes.
4.4 Power Calculation and Defensive Robustness
Based on assumed parameters derived from the theoretical model and from pilot studies in related literatures (e.g., Gawande, 2009; Arriaga et al., 2013), we are powered at 0.80 to detect an interaction effect (\beta_4) of 0.25 standard deviations, assuming an intra‑cluster correlation (ICC) of 0.15 and 40 clusters with 10 decisions per cluster. Power calculations are pre‑registered and based on these assumed parameters. No actual pilot data have been collected; the parameters are hypothetical and used for planning purposes.
To address small‑cluster inference, we will supplement with:
· Simulation‑based power (Monte Carlo) using the assumed parameters.
· Cluster‑robust inference using wild cluster bootstrap (Rademacher weights) with 10,000 replications.
· Randomization inference as an alternative, non‑parametric method, reporting exact p‑values from 10,000 random permutations.
4.5 Mutually Exclusive Predictions
Condition, Standard Behavioral View (Modern), & Our Prediction:
- Low complexity (Debiasing works) : Same (both reduce error vs control)
- High complexity (Debiasing attenuates) : EDS will dominate in reducing decision error
4.6 Information Overload Trap Test (HC3*) – Secondary Supporting Test (Appendix B)
To test HC3* as a secondary supporting hypothesis, we will add a within‑treatment manipulation in a subset of the EDS and control arms, distinguishing relevant from irrelevant overload. The results are predicted to be consistent with the hypothesis that EDS mitigates the overload effect (see Appendix B).
4.7 Manipulation Checks (Treatment Fidelity)
To ensure that interventions are implemented as intended, we will collect manipulation checks:
Treatment & Manipulation Check:
- EDS : Increase in assumption count, alternative generation, dissent presence (measured from documents).
- Debiasing : Improvement on standard bias tasks (e.g., overconfidence calibration, confirmation bias tests) administered to participants.
If these checks fail, we will report ITT effects but interpret them with caution.
4.8 Additional Robustness: Compliance‑Adjusted Estimates and Heterogeneity
We will estimate treatment‑on‑the‑treated (TOT) effects using assignment as an instrument for actual protocol compliance. We will also examine heterogeneity by baseline decision quality and include a secondary robustness arm Debiasing + Structure‑Lite (Appendix C).
4.9 Spillover Control and Exposure Mapping
We will implement an exposure‑mapping framework (Aronow & Samii, 2017). Exposure is defined as the proportion of neighboring units within a given network radius assigned to EDS. We will test whether the main treatment effect is robust to allowing spillover effects.
4.10 Multiple Hypothesis Testing
We pre‑register a single primary outcome: the standardized composite outcome. All other outcomes are pre‑registered as secondary. We will adjust for multiple testing using the Holm‑Bonferroni method for secondary outcomes.
4.11 Pre‑Registration and Analysis Plan
The study is pre‑registered in the AEA RCT Registry (ID: [TBD]) and the Open Science Framework (OSF). A detailed pre‑analysis plan (Appendix A) was submitted before data collection.
4.12 Time Horizon Justification
Based on the theoretical assumption that for the types of decisions studied (strategic, with moderate lead times), 6 months is sufficient for most outcomes to materialize, we will measure outcomes 6 months after the decision. As a robustness check, we will also report outcomes at 3 months (early) and 12 months (extended) to ensure that the effect is not an artifact of a specific horizon. No actual pilot data have been used to justify this window; it is a design choice informed by the literature on strategic decision lead times.
---
5. Mapping to Existing Frameworks: A Testable Condition
Let Epistemic Gap = actual epistemic risk (frequency of frame mis‑specification) minus the epistemic risk already internalized by the organization’s current governance system (e.g., ISO 31000, COSO). We operationalize the epistemic gap as the unpriced risk in the organization’s decision architecture: the residual variance in decision outcomes unexplained by existing governance controls, proxied by baseline error rates conditional on observed risk controls. We hypothesize:
\Delta Q = 0 \quad \text{if Epistemic Gap} \leq \tau
\Delta Q > 0 \quad \text{if Epistemic Gap} > \tau
where \tau is a threshold estimated from pilot data. This is a sharp, testable condition: EDS only improves outcomes when the existing system fails to address epistemic risk.
---
6. Counterfactual Learning: A Unique Identification Strategy
To learn from “prevented failures,” we propose a randomized timing of structured challenge design (see Appendix D).
---
7. Short‑Cycle Falsification Rule
We adopt a power‑based rule with economic significance:
Our theory is considered falsified if, after three independent RCTs (each with power ≥ 0.80), the pooled estimate of \beta_4 is not significantly positive (p > 0.05), the Bayesian posterior probability falls below 0.10, or if the estimated effect size is economically negligible (≤5% reduction in error rate).
This multi‑study falsification rule aligns the theory with cumulative science norms.
---
8. Scope Conditions with Measurable Thresholds
Condition & Operationalization:
- High epistemic load : Number of interdependent variables > 5, or uncertainty (variance of outcomes) > 0.3, or number of decision‑makers > 3.
- Multi‑actor process : At least two individuals with distinct roles and preferences.
- High complexity : Decision meets at least two of the binary thresholds in Section 4.2.
The scope condition implies a broader class of environments beyond our sample: any setting with combinatorial expansion of plausible frames relative to processing capacity. External validity follows from a structural invariance condition: whenever the growth rate of the representation space exceeds the evaluation capacity of the decision system, representation‑constraining interventions will dominate belief‑improving interventions. Our predictions are consistent with a broader class of environments where representation space expands combinatorially. Practitioners can diagnose applicability by testing whether adding one additional factor increases the dimensionality of the decision space non‑linearly (e.g., via interaction growth or forecast instability).
Explicit failure cases: EDS is expected to not work when complexity is low, a single decision‑maker acts alone, uncertainty is minimal, or the organization already internalizes framing risks (Epistemic Gap = 0).
---
9. Mechanism Test: Causal Pathway with Exogenous Variation
We will introduce exogenous variation in process components (partial protocol randomization and encouragement design) to causally isolate which elements of EDS drive the effect. This analysis is reported in full in Appendix C; the key result is predicted to be that the framing component alone captures most of the effect, consistent with HC1*.
---
10. Planned Pilot Feasibility: Illustrative Hypothetical Calculation
This section presents a planned pilot design and a hypothetical illustrative calculation to demonstrate how the model would be calibrated and what magnitude of effect the theory predicts. No actual pilot data have been collected as of this writing; the numbers below are purely illustrative and based on theoretical assumptions. They should not be interpreted as empirical results.
To inform power calculations and demonstrate the feasibility of the proposed experimental design, we provide a hypothetical scenario based on the theoretical predictions of the model. This illustration is not an empirical finding but a reconstructive example of how effect sizes might be estimated and used for sample size determination.
Illustrative Assumptions:
Suppose a future pilot were to be conducted with two organizational units (one public, one non‑profit) implementing the EDS protocol for a set of high‑complexity decisions. Based on the theoretical framework and prior literature on structured decision aids (e.g., checklists in medical and aviation settings), we hypothesize that the EDS intervention could reduce ex‑post error rates by approximately 28% relative to baseline, while a debiasing‑only intervention might yield a reduction of about 9%. These numbers are illustrative engineering estimates derived from extrapolating the effect sizes observed in related literatures (e.g., Gawande, 2009; Arriaga et al., 2013) and from the functional form assumed in our model.
Illustrative Calculation:
· Baseline error rate (control, hypothetical): 0.32
· Hypothetical EDS error rate: 0.23 → absolute reduction = 0.09 → relative reduction = 28%
· Hypothetical debiasing‑only error rate: 0.29 → absolute reduction = 0.03 → relative reduction = 9%
Similarly, we illustrate improvements in secondary outcomes:
· Brier score improvement: 0.12 in EDS vs. 0.03 in debiasing (hypothetical)
· Frame completeness increase: 35% in EDS vs. 8% in debiasing (hypothetical)
· Cognitive load (NASA‑TLX) decrease: 0.8 points in EDS under high complexity (hypothetical)
Purpose of This Illustration:
· To demonstrate that the proposed effect sizes (e.g., 28% reduction) are within a plausible range and could be detected with the planned sample size (40 clusters).
· To provide a concrete anchor for power calculations and for the pre‑analysis plan (Appendix A).
· To illustrate how the model’s parameters would be estimated if the pilot were conducted.
Important Disclaimer: These numbers are purely illustrative and hypothetical. No actual pilot has been conducted. The purpose of this section is to show that the theory generates testable quantitative predictions and to aid in experimental planning. Any future pilot results, whether confirming or disconfirming these illustrative magnitudes, will be reported transparently.
---
11. Conclusion
We have stripped the theory to its core: in high‑complexity environments, a minimal enforceable decision structure will dominate individual debiasing in reducing economically meaningful decision error. This claim is a sharp, testable boundary condition, embedded in a pre‑registered experimental design with power‑based falsification rules, a mechanism test with exogenous variation, and a secondary supporting test (HC3*). The comparison is between realistically implementable interventions: enforceable process constraints vs. non‑enforceable cognitive corrections. Our estimand is not a primitive causal parameter, but a policy‑relevant comparison between enforceable and non‑enforceable interventions—i.e., the relevant margin faced by organizations. Thus, our contribution is to identify dominance in implementable intervention space, not to decompose cognition and structure. Our result should be interpreted as a dominance of enforceable representation‑expanding protocols over non‑enforceable belief‑correction interventions under high epistemic load. We do not test cognition vs structure; we test enforceable representation expansion vs non‑enforceable belief correction. The relevant counterfactual is not ideal debiasing, but debiasing as it is realistically deployed in organizations. We have introduced mechanically anchored ex‑ante measures, treating frame completeness as an instrument with demonstrated predictive validity. Frame completeness is not intended to approximate a true underlying representation, but to serve as an instrument with demonstrated predictive dominance over baseline heuristics under ex‑ante constraints. Frame completeness is validated not by coverage alone, but by its incremental predictive contribution relative to (i) baseline heuristics and (ii) unstructured elicitation, under cross‑validation and temporal holdout. We do not interpret coefficients on frame completeness causally; they serve as discriminating evidence across mechanism classes. Mechanism identification relies on experimental variation in protocol components (Appendix C), not on observational mediation. We predict that results will be consistent across binary thresholds, continuous index, and exogenous complexity shock. We will reject effort‑based explanations, pure coordination, and pure attention accounts. We will not identify the primitive causal channel; however, we will provide discriminating evidence that eliminates a broad class of alternative mechanisms: effort, coordination, and attention. We define a minimal enforceable decision structure as the lowest‑cost protocol that (i) expands the representation space and (ii) is verifiably enforceable at the decision instance level. We do not claim optimality; only sufficiency for dominance under high epistemic load. EDS expands the set of considered representations while simultaneously constraining it to a structured, tractable subset—a constrained expansion of the representation space under bounded evaluation capacity. EDS should be interpreted as a minimally enforceable governance protocol that operationalizes representation expansion under compliance constraints. Our comparison is between enforceable governance and non‑enforceable cognition, not between structure and cognition in isolation. When the space of plausible problem representations grows faster than the capacity to evaluate them, constraining the representation space dominates improving belief accuracy within any given representation. Our operationalization of complexity captures combinatorial growth in the representation space, not merely stochastic uncertainty. The defining feature of high complexity in our framework is that marginal variables increase interaction dimensionality rather than additive variance. External validity follows from a structural invariance condition: whenever the growth rate of the representation space exceeds the evaluation capacity of the decision system, representation‑constraining interventions will dominate belief‑improving interventions. Our predictions imply that a large class of behavioral interventions may be targeting a non‑binding constraint in complex organizational environments. We do not imply that cognition is unimportant; rather, under high epistemic load, improving cognition within a mis‑specified representation yields lower marginal returns than restructuring the representation itself. Our contribution is to identify a regime in which the primary bottleneck shifts from belief accuracy to problem representation. To our knowledge, this is among the first pre‑registered randomized tests designed to compare structured protocols and debiasing across experimentally varied complexity regimes.
We do not claim that EDS universally dominates debiasing; rather, we identify a regime—characterized by high epistemic load and multi‑actor complexity—in which structural interventions become the binding constraint. If our experiment finds that EDS does not outperform debiasing under high complexity, the theory will be falsified. If it does, we will provide direct causal evidence that under high epistemic load, decision structure—not individual cognition—is the binding constraint on decision quality.
---
Accountability‑Based Universal Wisdom and Trust
Cross‑Sector Pre‑Decision Governance Translator
March 2026
📧 Contact: tpapgtk@gmail.com
📄 License: CC BY‑NC‑SA 4.0
Pre‑Analysis Plan: Appendix A (available upon request)
Supporting Appendices: B (Information Overload Test), C (Mechanism & Structure‑Lite), D (Counterfactual Learning Design)
