Pre-Decision Governance for Government AI:
Epistemic Infrastructure, Reasoning Accountability, and Procurement Integration
ABSTRACT
Governments across various countries are increasingly adopting algorithmic decision-making systems (algorithmic governance)—from determining welfare eligibility, predicting criminal risk, to recommending economic policies. However, scandals such as Robodebt in Australia, COMPAS in the United States, and the A-level grading algorithm in the United Kingdom demonstrate that technology adoption has not been accompanied by adequate reasoning accountability mechanisms. Algorithms amplify existing biases and assumptions—about poverty, risk, or human behavior—without ever being tested, challenged, or publicly documented. This article argues that the root problem is an epistemic crisis in algorithmic governance: design assumptions go untested, historical data contain latent biases, there is no space for dissenting views, and humans lose decision sovereignty due to excessive delegation to machines. However, this epistemic crisis cannot be separated from the broader political economy—austerity, surveillance, and fiscal rationality that shape the landscape in which algorithms are developed. Drawing on epistemic governance literature (Jalonen, 2025; Lidskog & Sundqvist, 2025), epistemic injustice (Fricker, 2007), critical algorithm studies (O'Neil, 2016; Noble, 2018; Eubanks, 2018), deliberative governance (Habermas, 1996; Ansell & Gash, 2007), algorithmic accountability theory (Diakopoulos, 2020; Floridi et al., 2018), public procurement governance literature (Sanchez-Graells, 2021; Yukins, 2022), and theories of power and the state (Hobbes, 1651/1996; Foucault, 1977; Scott, 1998), this article develops a Pre-Decision Governance (PDG) framework for Government AI—an epistemic infrastructure operating at the pre-design level of algorithmic systems.
Theoretically, this article makes three main contributions to governance literature. First, it introduces the concept of reasoning accountability as an extension of the idea of public reason (Rawls, 1993; Habermas, 1996) into the algorithmic domain—addressing the challenge that algorithms make decisions but cannot provide reasons. Second, it extends deliberative governance theory from the realm of public policy forums into the realm of technical design, demonstrating how deliberative principles can be operationalized within the algorithmic development pipeline. Third, it positions PDG as a meta-governance layer—a layer that governs how other AI governance frameworks (EU AI Act, NIST AI RMF, Model Cards) are activated, determining whether compliance will be substantive or formalistic.
Unlike existing frameworks, PDG specifically regulates the process of assumption construction and framing before algorithms are written. PDG also differs from conventional deliberative governance because it is embedded in the technical design pipeline, not merely a policy discussion forum; it produces technical artifacts (assumption sheet, model portfolio) that become procurement requirements. Four protocols are proposed: (1) open testing of algorithm design assumptions, with explicit definition of critical assumptions; (2) mandatory consideration of alternative framings of public problems; (3) multi-option mandate for algorithmic models with different assumptions; and (4) structured dissent mechanisms through AI ethics boards designed with awareness of co-optation risks. To ensure substantive implementation, this article develops a Reasoning Quality Index (RQI)—a set of evaluative indicators measuring reasoning trace completeness, framing diversity, model diversity scores, and responsiveness to dissent—while explicitly acknowledging the risk of documentation ritualism that could turn these protocols into "ethics-washing 2.0." This article also defines the scope conditions for PDG application: most relevant for high-risk AI systems impacting socio-economic rights, and less relevant for low-stakes administrative systems. PDG deliberately slows down the pre-design phase to reduce the risk of systemic downstream losses—a repositioning of intervention points, not procedural expansion. This framework contributes to algorithmic governance literature by shifting focus from code transparency to reasoning accountability. PDG does not claim to neutralize power asymmetries; it only makes them more traceable. The article concludes with a concrete empirical research agenda, including quasi-experimental designs to compare error rates and litigation levels between projects with and without PDG.
Keywords: algorithmic governance, pre-decision governance, epistemic governance, artificial intelligence, algorithmic bias, reasoning accountability, Robodebt, public reason, reasoning quality index, ethics-washing, procurement governance, critical assumptions, meta-governance, scope conditions, deliberative governance, epistemic injustice
1. INTRODUCTION
1.1 Concentration of Power in Automated Systems
We are entering a new era in the history of public administration: an era where more and more strategic decisions are made or recommended by algorithms. The term "algorithmic Leviathan" is used metaphorically in this article to describe the concentration of power in automated systems that broadly affect citizens' rights—a power that is often invisible, undocumented, and unchallengeable. This metaphor consciously references the concept of Leviathan in the tradition of political philosophy, particularly Hobbes (1651/1996), who described the state as a sovereign entity whose absolute power is necessary to prevent chaos but also potentially becomes tyranny if unchecked. In contemporary contexts, surveillance state literature (Foucault, 1977; Scott, 1998) has warned how states use technology to enhance surveillance and control capacities, often at the expense of citizen autonomy and rights. The algorithmic Leviathan is a new manifestation of this phenomenon: power no longer exercised through traditional state apparatuses, but through invisible, undocumented, and unchallengeable code, data, and algorithms.
In Australia, the Robodebt system automatically calculated welfare recipients' debts based on annual average income—rather than actual data—and pursued collection without human verification. The result: thousands of citizens were wrongly billed, suicides occurred, and the government had to pay billions of dollars in compensation (Carney, 2021). In the United States, the COMPAS criminal risk prediction system was proven biased against minority groups; a ProPublica investigation (Angwin et al., 2016) showed that this algorithm systematically assigned higher risk scores to black defendants compared to white defendants with similar criminal backgrounds. In the United Kingdom, the A-level grading algorithm used during the COVID-19 pandemic produced mass injustice, with students from low-performing schools systematically disadvantaged, sparking national protests (Amoore, 2020). In the United States, automated welfare fraud detection systems in various states wrongly flagged thousands of poor families, causing them to lose access to basic services (Eubanks, 2018). In Indonesia, discourse around using AI for welfare eligibility determination and corruption prediction continues to develop, unaccompanied by adequate assumption-testing mechanisms.
The problem is not algorithms themselves, but the absence of epistemic infrastructure ensuring that the assumptions underlying algorithm design—about poverty, risk, or human behavior—are tested, challenged, and documented before system implementation. Algorithms are not neutral. They are frozen assumptions in code. And when those assumptions are wrong, algorithms multiply errors on an industrial scale.
1.2 Epistemic Crisis in Algorithmic Governance: Between Technical and Political
Critical algorithm studies literature has long warned of these dangers. O'Neil (2016) in Weapons of Math Destruction shows how algorithms can become "mathematical weapons of mass destruction" that amplify social injustice. Noble (2018) in Algorithms of Oppression reveals how search engines reproduce racism and discrimination. Eubanks (2018) in Automating Inequality documents how algorithmic systems are increasingly used to manage poverty, often with devastating results for poor citizens.
However, it is important to recognize that the epistemic crisis cannot be separated from broader political economy. Robodebt, for instance, was not merely a technical or epistemic failure. It was a product of austerity politics—the desire to cut welfare budgets through seemingly scientific "efficiency." It was an instrument of state surveillance targeting poor citizens. It was a manifestation of fiscal rationality that placed budget savings above citizen protection (Carney, 2021). COMPAS, similarly, cannot be separated from punitive criminal politics and the long history of racial discrimination in the US criminal justice system (Angwin et al., 2016). The UK A-level algorithm was a product of political pressure to conduct examinations during a pandemic, ignoring educators' warnings about resulting injustice (Amoore, 2020).
The PDG framework does not claim to resolve the political economy of austerity, surveillance, or exploitative fiscal rationality. It operates at the epistemic layer where structural failures are operationalized into code and algorithms. In other words, PDG intervenes at the point where political assumptions are translated into computational logic.
1.3 Research Questions
This article addresses three main questions:
- Why do government AI systems often fail systemically, despite being developed with seemingly adequate technical procedures, and how are these failures related to broader political economy?
- How can the Pre-Decision Governance (PDG) framework be integrated into government AI development and procurement cycles to prevent epistemic failures—and what distinguishes it from conventional deliberative governance frameworks and existing AI governance frameworks?
- What protocols are needed to ensure that algorithm design assumptions are tested, alternative framings considered, options explored, and dissenting views documented, and how can the quality of this process be measurably evaluated?
1.4 Contributions and Structure
This article makes three main theoretical contributions to governance literature.
First, it introduces the concept of reasoning accountability as an extension of the idea of public reason in the liberal-democratic tradition (Rawls, 1993; Habermas, 1996). Public reason requires that political decisions be justifiable with reasons acceptable to free and equal citizens. In the algorithmic context, the challenge is: algorithms make decisions but cannot provide reasons. Reasoning accountability bridges this gap by requiring that the reasoning processes preceding algorithm development—assumptions, framings, options, dissent—be documented and examinable. This is not a substitute for public reason but an epistemic infrastructure enabling public reason to operate in the algorithmic age.
Second, this article extends deliberative governance theory from the realm of public policy forums into the realm of technical design. Deliberative governance has traditionally focused on dialogue and argumentation in public spaces and policy forums (Habermas, 1996; Ansell & Gash, 2007). This article demonstrates how deliberative principles—openness to different perspectives, assumption testing, reason documentation—can be operationalized within the algorithm development pipeline, producing technical artifacts that become procurement requirements.
Third, this article positions PDG as a meta-governance layer—a layer that governs how other AI governance frameworks (EU AI Act, NIST AI RMF, Model Cards) are activated. PDG determines whether compliance with these frameworks will be substantive or formalistic. By requiring assumption documentation, framing testing, and multi-option exploration, PDG ensures that compliance processes do not become mere administrative rituals. It also governs relationships between various frameworks—for example, how Model Cards results should be informed by assumptions tested in PDG.
Beyond theoretical contributions, this article also provides practical contributions: developing four operational protocols, integrating them into government procurement cycles, developing the Reasoning Quality Index (RQI) as an evaluation tool, defining PDG scope conditions, illustrating framework application across various real-world cases (Robodebt, COMPAS, UK A-level), and providing a concrete empirical research agenda.
The article is structured as follows. Section 2 reviews literature on algorithmic governance, algorithmic bias, epistemic governance and epistemic injustice, deliberative governance, public procurement, and theories of power. Section 3 develops the PDG framework for Government AI, explaining its four protocols, explicitly distinguishing it from deliberative governance, and defining critical assumptions. Section 4 integrates these protocols into government procurement cycles. Section 5 develops the Reasoning Quality Index as an evaluation tool, including discussion of validity and reliability. Section 6 presents case illustrations from various countries (Robodebt, COMPAS, UK A-level). Section 7 discusses PDG's position within the landscape of existing AI governance frameworks, scope conditions, speed-versus-reflection trade-offs, PDG's role as meta-governance layer, theoretical contributions, practical implications, and limitations. Section 8 contains conclusions and a concrete future research agenda.
2. LITERATURE REVIEW AND THEORETICAL POSITION
2.1 Algorithmic Governance and the Dangers of Hidden Bias
Algorithmic governance refers to the use of algorithmic systems to make, recommend, or automate public decisions (Yeung, 2018; Danaher et al., 2017). Its efficiency potential is enormous, but its risks are equally significant. Empirical studies show that algorithms can:
- Amplify historical bias: The COMPAS criminal risk prediction system in the US was trained on arrest data biased against minorities, thus reproducing injustice (Angwin et al., 2016). Other studies show that recruitment algorithms can discriminate against women and minorities if trained on biased historical data (Raghavan et al., 2020).
- Blur responsibility: When decisions are made by algorithms, it becomes difficult to determine who is responsible when errors occur (Diakopoulos, 2020). In the Robodebt case, the Australian government repeatedly blamed the "automated system," avoiding political responsibility (Carney, 2021).
- Create impenetrable "black boxes": Complex algorithms (deep learning) often cannot be explained even by their developers (Burrell, 2016). This creates serious accountability problems, especially when algorithms are used for decisions affecting citizens' rights.
2.2 From Code Transparency to Reasoning Accountability
Responses to these problems typically revolve around transparency (opening source code) and auditability (testing outcomes). However, code transparency is insufficient. Code can be open, but the assumptions underlying design—about what constitutes "poverty," "risk," or "success"—remain hidden in mathematical notation understood only by a few.
Frameworks like Model Cards (Mitchell et al., 2019) and Datasheets for Datasets (Gebru et al., 2021) have improved transparency by documenting model and data characteristics. The NIST AI Risk Management Framework (NIST, 2023) provides comprehensive guidance for managing AI risks. The EU AI Act (European Commission, 2021) classifies AI systems based on risk levels and requires conformity assessments for high-risk systems.
However, these frameworks, while valuable, have a fundamental gap: they do not regulate the process of assumption construction and framing at the pre-design stage. They focus on risk classification and documentation after assumptions have been established. This is where PDG's unique contribution lies.
2.3 Epistemic Governance and Epistemic Injustice
Epistemic governance (Jalonen, 2025; Lidskog & Sundqvist, 2025) emphasizes understanding how knowledge is produced, validated, and disseminated within governance systems. Jalonen (2025) defines epistemic governance as "the processes shaping collective perceptions and influencing the understanding of a situation," emphasizing that in complex, crisis-prone environments, governance must move beyond traditional models to embrace uncertainty and diverse forms of knowledge. Lidskog and Sundqvist (2025), in their study of the Intergovernmental Panel on Climate Change (IPCC), identify how "epistemic hierarchies" and disciplinary diversity create challenges for maintaining coherence in global assessments.
In the AI context, epistemic questions become central: Whose knowledge is considered valid? Whose assumptions are coded into algorithms? Whose voices are heard in design processes? Why are certain voices systematically silenced?
Fricker's (2007) concept of epistemic injustice is highly relevant here. Fricker distinguishes two forms of epistemic injustice: testimonial injustice (when someone's word is disbelieved due to prejudice about their identity) and hermeneutical injustice (when someone lacks the conceptual resources to understand and articulate their experiences). In government algorithm development contexts, vulnerable groups often experience both forms of injustice. When they warn about negative system impacts, their warnings are ignored (testimonial injustice). When they struggle to articulate how algorithms harm them due to lack of access to technical language, they experience hermeneutical injustice.
This literature also highlights how epistemic failures are often politically convenient. In the Robodebt case, the inability or unwillingness to test the "income averaging" assumption enabled the government to quickly achieve budget savings targets—at least until the scandal erupted. Structured dissent becomes a political risk because it can slow processes, reveal weaknesses, and create resistance. Bureaucracies, with their incentives for harmony and speed, systematically silence epistemic minorities—those with different knowledge who lack the power to be heard.
2.4 Deliberative Governance and Its Limits
Deliberative governance (Habermas, 1996; Ansell & Gash, 2007) emphasizes the importance of dialogue and reasoned argumentation in collective decision-making. Habermas (1996) developed the concept of communicative rationality—the idea that legitimate decisions emerge from reasoned argumentation among free and equal participants. Ansell and Gash (2007) in their seminal study of collaborative governance identified key success conditions: face-to-face dialogue, trust-building, commitment to process, shared understanding, and intermediate outcomes.
This literature has made significant contributions to our understanding of democratic legitimacy. However, it is important to distinguish PDG from conventional deliberative governance because PDG is not merely a policy discussion forum.
| Dimension | Deliberative Governance | Pre-Decision Governance (PDG) |
|---|---|---|
| Locus | Policy forums, public space | Technical design pipeline |
| Actors | Stakeholders, citizens | Developers, engineers, procurement officials |
| Mechanisms | Discussion, deliberation, argumentation | Assumption documentation, framing testing, model portfolios |
| Outputs | Consensus, policy recommendations | Technical artifacts (assumption sheet, model portfolio) |
| Status | Normative-democratic | Procurement requirement |
| Timing | Throughout policy cycle | Before technical design begins |
PDG is embedded in the technical design pipeline, not merely a discussion forum. It produces technical artifacts that become part of system specifications, not merely recommendations. Most importantly, it can be integrated into government procurement cycles as binding requirements, not merely normative appeals.
2.5 Theories of Power and the State: Leviathan, Surveillance, and Governance
The "algorithmic Leviathan" metaphor used in this article consciously references the tradition of political philosophy. Hobbes (1651/1996) described Leviathan as a sovereign entity whose absolute power is necessary to prevent chaos but also potentially becomes tyranny if unchecked. In contemporary contexts, Foucault (1977) developed the concept of panopticism to describe how modern power operates through continuous surveillance and bodily discipline. Scott (1998) in Seeing Like a State shows how modern states attempt to make societies "legible"—readable, measurable, and controllable—often ignoring local knowledge and informal practices.
The algorithmic Leviathan is a new manifestation of this phenomenon: power no longer exercised through traditional state apparatuses, but through invisible, undocumented, and unchallengeable code, data, and algorithms. PDG, with its documentation and transparency mechanisms, attempts to make this power more "legible" in a different sense: not to control citizens, but to make algorithmic decision-making processes examinable and accountable.
2.6 Public Procurement and Algorithmic Risks
Integrating PDG into procurement cycles requires understanding public procurement governance literature. Sanchez-Graells (2021) shows that algorithmic system procurement has specific risks not covered by conventional procurement frameworks: design bias risk, supplier dependency risk, and technical evaluation incapacity risk. Yukins (2022) highlights the importance of incorporating ethical and transparency requirements into technology procurement contracts. This literature provides the foundation for the proposed integration of PDG into procurement requirements, as elaborated in Section 4.
2.7 Identified Gaps
From synthesizing the above literature, the following gaps can be identified:
| Framework | Focus | Gap |
|---|---|---|
| EU AI Act | Risk classification, conformity assessment | Does not regulate pre-design assumptions |
| NIST AI RMF | Organizational risk management | Focuses on technical risks, not assumptions |
| Model Cards | Model transparency | Documentation after model completion |
| Datasheets | Data transparency | Does not regulate data interpretation |
| Fairness Metrics | Statistical bias measurement | Bias measured, causal assumptions untested |
| Deliberative Governance | Policy discussion forums | Not integrated into technical design |
| Procurement Frameworks | Efficiency, compliance | Does not cover reasoning quality |
PDG fills these gaps by focusing on the most upstream stage: when assumptions are formulated, framings established, and design options explored—and by integrating these protocols into technical procurement requirements.
3. PRE-DECISION GOVERNANCE FRAMEWORK FOR GOVERNMENT AI
3.1 Definition and Scope
Pre-Decision Governance for Government AI is defined as:
A set of systematic protocols governing reasoning processes before algorithmic systems are developed or adopted by public institutions, encompassing design assumption testing, exploration of alternative problem framings, multi-option algorithmic model consideration, and documentation of dissenting views—with these protocols integrated into procurement cycles as binding technical requirements, not merely normative recommendations.
This framework is designed for integration into existing AI system procurement and development cycles, without requiring major structural changes, but with explicit acknowledgment that its effectiveness depends on the presence of caring actors and protection mechanisms for those who speak up.
3.2 Four PDG Protocols for Government AI
Protocol 1: Algorithm Design Assumption Testing
| Function | Ensure that critical assumptions underlying algorithm design are identified, documented, and tested before system development. |
| Definition of Critical Assumptions | Critical assumptions are defined as assumptions meeting one or more of the following criteria: • Affect target variables: Assumptions about causal relationships between algorithm inputs and outputs. • Determine classification logic: Assumptions about how entities are categorized. • Affect risk distribution across groups: Assumptions that would disproportionately impact vulnerable groups. • Have significant legal or ethical implications. |
| Key Questions | What assumptions about society, human behavior, or causal relationships are encoded in this algorithm? Are these assumptions supported by evidence? Whose assumptions are considered valid and why? |
| Operational Elements | • Assumption documentation in technical specifications. • Assumption testing against independent data, academic research, or expert judgment. • Contingency scenarios if assumptions prove false. |
| Output | Algorithm Design Assumption Sheet |
Quality Rubric:
- Level 0: No assumption documentation
- Level 1: Assumptions documented but not tested
- Level 2: Assumptions tested with secondary data or literature
- Level 3: Assumptions tested with primary data, expert consultation, or sensitivity analysis
Protocol 2: Counter-Framing Mandate
| Function | Ensure that the public problem to be addressed by the algorithm is defined from multiple perspectives, especially the perspectives of most affected groups. |
| Key Questions | Is the problem definition used in algorithm design the only possible definition? Which group's perspectives might be overlooked? Which framings are politically advantageous? |
| Operational Elements | • Documentation of dominant framing and at least two alternative framings. • Involvement of affected groups in formulating alternative framings. • Analysis of policy implications for each framing. |
| Output | Alternative Framing Matrix |
Quality Rubric:
- Level 0: Only one framing
- Level 1: Two framings but alternative is weak/strawman
- Level 2: Three framings with substantive differences
- Level 3: Alternative framings developed with affected group involvement
Protocol 3: Multi-Option Model Mandate
| Function | Ensure that multiple algorithmic models with different assumptions and architectures are explored and compared before one model is selected. |
| Key Questions | Is only one algorithmic model considered? How would other models (with different assumptions) perform? Which model is fairest to vulnerable groups? |
| Operational Elements | • Development of at least three models with different assumptions, variables, or architectures. • Comparison of performance, fairness, and transparency across models. • Documentation of reasons for final model selection. |
| Output | Algorithmic Model Portfolio |
Quality Rubric:
- Level 0: Only one model
- Level 1: Two models but minimal differences
- Level 2: Three models with substantively different assumptions/architectures
- Level 3: Models systematically compared (performance, fairness, transparency)
Protocol 4: Structured Dissent
| Function | Create institutional space for dissenting views in AI system development and procurement processes, with documentation and follow-up mechanisms—while explicitly acknowledging that this space can be co-opted and requires extra protection. |
| Key Questions | Who has objections to this algorithm's design or implementation? Are their objections heard and documented? How to prevent this mechanism from becoming empty formality? |
| Operational Elements | • Establishment of AI Ethics Board with membership designed to minimize co-optation risk (fixed terms, cross-sector composition, process transparency). • Dissent Forms for developers, civil servants, or citizens. • Mandatory documentation and written responses to every objection. • Whistleblower protection guarantees. • Periodic independent audit of dissent mechanism effectiveness. |
| Output | Dissent and Response Record |
Quality Rubric:
- Response Latency: Time between dissent submission and official response
- Closure Transparency: Are reasons for rejection/acceptance publicly documented?
- Substantive Objection Rate: Proportion of dissent receiving analytical (not merely administrative) responses
3.3 Why PDG Differs from Conventional Deliberative Governance
| Dimension | Deliberative Governance | Pre-Decision Governance (PDG) |
|---|---|---|
| Locus | Policy forums, public space | Technical design pipeline |
| Actors | Stakeholders, citizens | Developers, engineers, procurement officials |
| Mechanisms | Discussion, deliberation, argumentation | Assumption documentation, framing testing, model portfolios |
| Outputs | Consensus, policy recommendations | Technical artifacts (assumption sheet, model portfolio) |
| Status | Normative-democratic | Procurement requirement |
| Timing | Throughout policy cycle | Before technical design begins |
| Binding Nature | Voluntary, participatory | Contractually binding |
PDG is not merely "deliberative governance + documentation." It is an epistemic infrastructure embedded in the technical design pipeline, producing artifacts that become part of system specifications, and can be mandated through procurement contracts.
4. INTEGRATING PDG INTO GOVERNMENT PROCUREMENT CYCLES
To avoid becoming merely a normative ideal, PDG must be integrated into existing government procurement cycles. Public procurement literature (Sanchez-Graells, 2021; Yukins, 2022) shows that technical requirements in contracts can include not only functional specifications but also process requirements—how systems must be developed, not just what must be produced.
4.1 Integration Mapping for High-Risk AI Systems
| Procurement Stage | Standard Activity | PDG Integration Document/Output |
|---|---|---|
| 1. Needs Planning | Problem identification, needs analysis | Require documentation of dominant framing and at least two alternative framings. → Alternative Framing Matrix |
| 2. Technical Specification Development | Drafting Terms of Reference | Require assumption testing as part of specifications. Define critical assumptions. → Algorithm Design Assumption Sheet (template in ToR) |
| 3. Supplier Selection | Technical proposal evaluation | Evaluation weight for reasoning quality (assumption testing, model diversity). → Expanded evaluation criteria |
| 4. Prototype Development | Initial model development | Require portfolio of at least three models with different assumptions. → Algorithmic Model Portfolio |
| 5. Testing and Validation | Technical performance testing | Test assumptions with independent data; validate framing with affected groups. → Assumption Testing Report |
| 6. Finalization and Implementation | Go-live | AI Ethics Board provides final opinion; dissent record attached. → Dissent and Response Record |
| 7. Monitoring and Evaluation | Periodic evaluation | Post-implementation reasoning quality audit (Reasoning Quality Index). → Epistemic Audit Report |
4.2 Example Procurement Clause (Indicative)
The following is example language that could be included in Terms of Reference (ToR) for AI system procurement:
"The provider must explicitly document the critical assumptions underlying algorithm design (Algorithm Design Assumption Sheet). Critical assumptions are defined as assumptions affecting target variables, determining classification logic, affecting risk distribution across groups, or having significant legal/ethical implications. These assumptions must be tested against independent data or credible academic references.
The provider must present at least three algorithmic models with different assumptions or architectures (Algorithmic Model Portfolio) along with comparative analysis of performance and fairness.
All dissenting views arising during the development process must be documented (Dissent and Response Record) and followed up. An independent AI Ethics Board (with transparent composition and processes) will be established to evaluate these documents before implementation.
Technical evaluation will assign 30% weight to reasoning quality as reflected in these documents."
5. EVALUATING PDG QUALITY: REASONING QUALITY INDEX (RQI)
One potential criticism of frameworks like PDG is the risk of documentation ritualism—institutions can fill forms formally, create weak alternative framings, or simulate dissent. To anticipate this, an evaluation tool measuring substantive quality, not merely document existence, is needed.
5.1 Reasoning Quality Index (RQI) Indicators
| Dimension | Indicator | Operational Definition / Score (0-3) |
|---|---|---|
| Assumption Completeness | Completeness of Critical Assumption Mapping | Proportion of critical assumptions documented out of total assumptions identifiable through independent analysis. 0: <30%; 1: 30-60%; 2: 60-90%; 3: >90% |
| Assumption Testing Quality | Assumption Test Rigor | Depth of testing level (see rubric in 3.2.1) 0-3 per rubric |
| Framing Diversity | Framing Diversity Score | Number of substantively different framings considered 0: 1 framing; 1: 2 framings; 2: 3 framings; 3: >3 framings |
| Alternative Framing Quality | Counter-Framing Quality | Were alternative framings developed with affected group involvement? 0: No; 1: Limited consultation; 2: Substantive participation |
| Model Diversity | Model Diversity Score | Number of models with different assumptions/architectures 0: 1 model; 1: 2 models; 2: 3 models; 3: >3 models |
| Model Comparison Quality | Model Comparison Quality | Does comparison cover performance, fairness, and transparency? 0: None; 1: Partial; 2: Complete |
| Dissent Responsiveness | Dissent Response Latency | Time between dissent submission and official response 0: >30 days; 1: 15-30 days; 2: 7-14 days; 3: <7 days |
| Dissent Transparency | Dissent Closure Transparency | Are reasons for rejection/acceptance publicly documented? 0: No; 1: Limited internal; 2: Limited public; 3: Full public |
| Substantive Objections | Substantive Dissent Rate | Proportion of dissent receiving analytical (not merely administrative) responses 0: <30%; 1: 30-60%; 2: 60-90%; 3: >90% |
5.2 Composite Score and Interpretation
| RQI Score | Category | Interpretation |
|---|---|---|
| 0-9 | Critical | Weak reasoning process, high risk of epistemic failure |
| 10-18 | Needs Improvement | Documentation exists but quality low, prone to ritualism |
| 19-24 | Adequate | Sufficient reasoning process, minimum baseline |
| 25-27 | Good | Good reasoning quality, can serve as model |
5.3 Notes on Validity and Reliability
The Reasoning Quality Index (RQI) proposed above is a conceptual evaluative framework designed to guide assessment of PDG implementation quality. These indicators have not been empirically tested and require further research to validate:
- Construct validity: Do these indicators truly measure reasoning quality?
- Inter-rater reliability: Would two different evaluators give the same scores for the same documents?
- Correlation with outcomes: Do projects with high RQI scores have lower failure rates?
Therefore, RQI is currently offered as a starting point for further development, not as an established instrument. Future empirical research needs to test and refine these indicators through case studies and field experiments.
5.4 Anticipating Ritualism Risk
The Reasoning Quality Index cannot fully prevent ritualism, but it makes ritualism more visible. When a project scores high on assumption completeness but low on testing quality, or has many models but superficial comparison, these indicators will reveal such inconsistencies. RQI also enables periodic epistemic audits by independent parties to verify whether documentation matches actual practice.
It is important to emphasize: no framework can fully prevent ethics-washing. What can be done is create infrastructure that makes such practices easier to detect and challenge. PDG, with its RQI, is such infrastructure.
6. CASE ILLUSTRATIONS FROM VARIOUS COUNTRIES
To strengthen the argument that the identified problems are systemic and not limited to a single case, this section presents three case illustrations from different countries: Robodebt (Australia), COMPAS (United States), and the A-level grading algorithm (United Kingdom).
6.1 Case 1: Robodebt Australia
Brief Chronology: Robodebt was an automated system introduced by the Australian government in 2016 to detect welfare overpayments. Instead of using actual recipient income data, the system used income averaging to calculate debts. If someone's annual income was slightly higher than estimated, the system considered it an overpayment and automatically sent a debt notice.
The problem: this method had no legal basis and was statistically flawed. Thousands of citizens were wrongly billed, some experienced severe psychological distress, and at least two suicides were linked to pressure from Robodebt debts. In 2020, courts declared the system illegal, and the government had to pay billions of dollars in compensation (Carney, 2021).
Epistemic Diagnosis: The critical assumption "average annual income = actual income" was never tested. Problem framing solely as "debt collection efficiency" ignored alternative framing of "protecting vulnerable citizens from system errors." Only one model (full automation) was considered, without exploring alternatives like sampling verification. Civil servant objections were ignored and undocumented.
6.2 Case 2: COMPAS United States
Brief Chronology: COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is an algorithm used in many US courts to predict defendant recidivism risk. Generated scores are used to determine sentences, bail, and other important decisions. A ProPublica investigation (Angwin et al., 2016) found COMPAS systematically biased against black defendants: they were twice as likely to be falsely labeled high-risk compared to white defendants with similar criminal backgrounds.
Epistemic Diagnosis: Critical assumptions about relationships between input variables (arrest history, age, etc.) and recidivism risk were inadequately tested. Historical data used to train the algorithm contained systemic bias from discriminatory law enforcement practices. Problem framing as "objective risk prediction" ignored alternative framings like "reproducing racial injustice through technology." Critic and academic objections were documented but not acted upon by courts and agencies adopting the system.
6.3 Case 3: UK A-level Algorithm
Brief Chronology: During the COVID-19 pandemic, A-level examinations in the UK were canceled and replaced by an algorithm developed by Ofqual to calculate student grades. This algorithm used school historical data to predict individual student grades, with the result that students from low-performing schools were systematically disadvantaged—their grades were significantly downgraded compared to students from high-performing schools with equivalent prior exam performance. Results sparked mass protests, and the government eventually abandoned the algorithm (Amoore, 2020).
Epistemic Diagnosis: The critical assumption that "school historical performance is a good predictor of individual student performance" was inadequately tested. Problem framing as "grade standardization during a pandemic" ignored alternative framing of "fairness to students from disadvantaged backgrounds." Objections from educators and headteachers were documented but ignored in decision-making.
6.4 Synthesis: Patterns of Epistemic Failure Across Cases
| Case | Untested Assumption | Narrow Framing | Limited Options | Ignored Dissent |
|---|---|---|---|---|
| Robodebt | Income averaging = actual income | Collection efficiency | Full automation | Civil servant objections |
| COMPAS | Historical data unbiased | Objective prediction | Single proprietary model | Academic criticism |
| A-level | School performance = individual performance | Grade standardization | Single central algorithm | Educator warnings |
This pattern shows that epistemic failures in algorithmic governance are not isolated phenomena but systemic problems requiring structured responses like PDG.
7. POSITIONING PDG WITHIN THE AI GOVERNANCE LANDSCAPE
7.1 Comparison with Existing Frameworks
| Framework | Focus | Method | Gap |
|---|---|---|---|
| EU AI Act | Risk classification | Conformity assessment | Does not regulate pre-design assumptions |
| NIST AI RMF | Organizational risk | Risk management cycle | Focuses on risks, not assumptions |
| Model Cards | Model transparency | Documentation | Documentation, not assumption testing |
| Datasheets | Data transparency | Documentation | Documentation, not interpretation |
| Fairness Metrics | Bias measurement | Statistical metrics | Bias measured, causal assumptions untested |
| Deliberative Governance | Policy forums | Discussion, deliberation | Not integrated into technical design |
| Procurement Frameworks | Efficiency, compliance | Contract specifications | Does not cover reasoning quality |
| PDG (proposed) | Reasoning accountability | Pre-design protocols + procurement + RQI | Fills upstream gap |
The table above shows that PDG does not replace existing frameworks but complements them at the most upstream stage. EU AI Act, NIST RMF, and other frameworks regulate what happens after assumptions are set and design begins. PDG regulates the process before that—when assumptions are formulated, framings established, and options explored.
7.2 Scope Conditions of PDG: When is PDG Most Relevant?
PDG is most relevant for:
- High-risk AI systems meeting criteria in EU AI Act or similar frameworks, especially those impacting fundamental citizen rights.
- Decisions impacting socio-economic rights, such as access to welfare, healthcare, public housing, or eligibility determination for government programs.
- Systems processing sensitive historical data containing structural bias, such as police data for criminal prediction or educational data for student assessment.
- Systems involving complex assumptions about human behavior, such as risk assessment algorithms or service need prediction.
PDG is less relevant for:
- Low-stakes administrative systems without significant impact on individual rights, such as delivery route optimization or inventory management.
- Non-normative operational AI operating in closed technical domains without broad social implications, such as industrial control systems.
- Systems with direct and significant human oversight at every decision stage, allowing algorithm assumptions to be corrected in real-time.
These scope limitations are important to demonstrate that PDG is a proportional and contextual framework, not a universal solution imposed on all types of AI systems.
7.3 Speed versus Reflection Trade-off
One potential criticism of PDG is that it could slow innovation and AI system development. It is important to explicitly acknowledge this trade-off:
PDG deliberately slows down the pre-design phase to reduce the risk of systemic downstream losses.
This slowing is not an unintended side effect but a repositioning of intervention points. PDG does not add new procedural layers to existing systems; it shifts focus from outcome evaluation to pre-design reasoning evaluation. This is a fundamental difference: PDG is not procedural expansion but repositioning of intervention points. By investing more time and reflection upfront, PDG aims to avoid much larger failure costs downstream.
In high-risk systems, failure costs—financial, social, and reputational—far outweigh additional upfront reflection costs. The Robodebt scandal, costing billions of dollars and claiming lives, is a concrete example of expensive downstream costs resulting from upstream neglect. Similarly, litigation and compensation costs for COMPAS and UK A-level cases are estimated in the hundreds of millions of dollars.
PDG offers upfront time investment to save downstream costs. The following table illustrates this logic:
| Cost Type | Without PDG | With PDG |
|---|---|---|
| Pre-design costs | Low (fast) | Higher (slower) |
| Downstream failure costs | High (scandals, litigation, compensation) | Lower (early detection, correction) |
| Total costs | Potentially very high | More controlled |
For low-risk systems, proportional approaches can be applied—for example, lighter PDG versions or selective application to critical components only.
7.4 PDG as Meta-Governance Layer
One of this article's most important theoretical contributions is positioning PDG not merely as an additional protocol but as a meta-governance layer—a layer governing how other AI governance frameworks are activated and implemented.
PDG as meta-governance layer functions to:
- Determine whether compliance with other frameworks will be substantive or formalistic. By requiring assumption documentation, framing testing, and multi-option exploration, PDG ensures that compliance processes do not become mere administrative rituals. Without PDG, a project could meet all EU AI Act documentation requirements while still being based on untested assumptions and biased framings. PDG adds a verification layer that documentation reflects quality reasoning processes.
- Govern relationships between various frameworks. For example, how Model Cards results should be informed by assumptions tested in PDG, or how NIST RMF risk assessments should consider documented alternative framings. PDG creates coherence between frameworks that often operate separately.
- Create infrastructure for reasoning accountability beyond technical compliance. PDG asks not only "are documents filled?" but also "is the reasoning process underlying those documents of quality?" This is a shift from formal compliance to substantive compliance.
- Bridge the gap between technical design and public deliberation. PDG-generated artifacts (Assumption Sheets, Framing Matrices, Model Portfolios, Dissent Records) can become inputs for public deliberative forums, enabling citizens to assess reasoning quality behind systems affecting their lives.
In other words, PDG is an epistemic infrastructure enabling other AI governance frameworks to function as intended. Without this meta-layer, these frameworks are vulnerable to becoming empty compliance rituals. With PDG, there is a mechanism ensuring compliance is based on quality reasoning processes.
7.5 Reasoning Accountability as Extension of Public Reason
To further emphasize this article's theoretical contributions, it is important to define reasoning accountability more philosophically. This concept can be understood as an extension of the idea of public reason in the liberal-democratic tradition (Rawls, 1993; Habermas, 1996). Public reason requires that political decisions be justifiable with reasons acceptable to free and equal citizens. In the algorithmic context, the challenge is: algorithms make decisions but cannot provide reasons.
Reasoning accountability bridges this gap. It requires that the reasoning processes preceding algorithm development—assumptions, framings, options, dissent—be documented and examinable. This is not a substitute for public reason but an epistemic infrastructure enabling public reason to operate in the algorithmic age. When citizens question algorithmic decisions, they are entitled not only to know "what the output was" but also "what assumptions underlie it," "what alternative framings were ignored," "what other models were not chosen," and "whose voices were unheard."
In other words, reasoning accountability is a prerequisite for democratic deliberation about algorithmic systems. Without reasoning traces, citizens face only black boxes. With reasoning traces, they have material to assess, criticize, and reshape systems affecting their lives.
Theoretically, this article extends public reason into the algorithmic domain through the concept of reasoning accountability, extends deliberative governance from policy forums into technical design realms, and introduces a new category—meta-governance layer—governing how other AI governance frameworks are activated.
7.6 Limitations and Risks
The PDG framework, like all governance frameworks, has limitations that need explicit acknowledgment:
- Does not resolve political economy: PDG does not stop austerity politics, surveillance, or exploitative fiscal rationality. It only ensures that when such policies are operationalized into algorithms, their assumptions become visible, alternative framings recorded, and warning voices not lost. PDG cannot neutralize power asymmetries; it can only make them more traceable.
- Co-optation risk: Structured dissent protocols and independent ethics boards can become empty formalities if not supported by healthy organizational culture, public pressure, and genuine accountability mechanisms. Experience with various corporate ethics boards shows that "independence" is often illusory (Metzinger, 2019).
- Documentation ritualism risk: This is the most tangible risk. Institutions can fill assumption sheets formally, create weak (strawman) alternative framings, and simulate dissent. The Reasoning Quality Index is designed to make such ritualism more visible, but cannot fully prevent it. PDG cannot guarantee substance; it can only create infrastructure making substance absence easier to detect.
- Administrative burden: These protocols add documentation layers that can slow development. For low-risk systems, proportional approaches are needed.
- Dependence on caring actors: Like all frameworks in the ABUWT tradition, PDG will only be effective if there are actors within the system who care about reasoning quality. It does not create them, does not automatically protect them, and does not prevent their removal.
With acknowledgment of these limitations, PDG is offered as a tool for reformist minorities—those seeking to improve systems from within—not as a magic solution claiming to solve all problems.
7.7 Who Determines Dominant Framings? Who Chooses Final Models? Who Controls Ethics Boards?
These questions touch the core of implementation realpolitik. In hierarchical bureaucratic structures, dominant framings are typically determined by actors with highest power—unit heads, finance ministries, or political executives. Final models are chosen by procurement officials with strong influence from technology providers. Ethics boards, if not carefully designed, can be co-opted by institutional interests.
PDG does not claim to change this reality. What it can do is:
- Make dominant framings explicit: When dominant framings are documented, they become objects that can be challenged.
- Create traces about who chose final models: Model selection reason documentation enables future accountability.
- Make ethics board composition and processes transparent: This enables public and academic assessment of board independence.
In other words, PDG is infrastructure for power transparency, not a tool for neutralizing power. It does not answer "who holds power?" but answers "how is power operationalized and traceable?"
8. CONCLUSION AND FUTURE RESEARCH AGENDA
We can no longer allow government algorithms to be developed in epistemic vacuums. Every line of code is frozen assumptions. Every model is a choice ignoring alternatives. Every automated decision is a risk that may never have been considered.
The Pre-Decision Governance framework for Government AI offers a way forward: not by rejecting technology, but by slowing down early processes, testing assumptions with discipline, considering alternative framings, exploring different model options, and documenting every dissenting view. This is not procedural expansion but repositioning of intervention points—shifting focus from outcome evaluation to pre-design reasoning evaluation.
Theoretically, this article makes three main contributions to governance literature:
- Introduces the concept of reasoning accountability as an extension of public reason (Rawls, 1993; Habermas, 1996) into the algorithmic domain, addressing the challenge that algorithms make decisions but cannot provide reasons.
- Extends deliberative governance theory from public policy forums into technical design realms, demonstrating how deliberative principles can be operationalized within algorithm development pipelines.
- Positions PDG as a meta-governance layer governing how other AI governance frameworks (EU AI Act, NIST RMF, Model Cards) are activated, determining whether compliance will be substantive or formalistic.
PDG differs from existing AI governance frameworks because it focuses on the most upstream stage: assumption construction and framing. It complements EU AI Act, NIST RMF, Model Cards, and other frameworks by adding a reflective layer previously overlooked. It differs from conventional deliberative governance because it is embedded in technical design pipelines, produces technical artifacts, and can be integrated into procurement cycles as binding requirements.
PDG also has clear scope conditions: most relevant for high-risk AI systems impacting socio-economic rights. It deliberately slows the pre-design phase to reduce systemic downstream loss risk—a conscious repositioning of intervention points, not procedural expansion.
To ensure these protocols do not become mere documentation rituals, the Reasoning Quality Index provides evaluation tools measuring substantive reasoning process quality. Indicators like assumption completeness (with explicit critical assumption definitions), framing diversity, model diversity, and dissent responsiveness enable more meaningful epistemic audits than mere compliance checklists. However, RQI remains a conceptual framework requiring further empirical testing to validate reliability and validity.
However, PDG does not claim to resolve austerity politics or surveillance political economy. It operates at the epistemic layer where structural failures are operationalized into code. It cannot neutralize power asymmetries; it can only make them more traceable. It does not stop power, but it makes power more visible by forcing power to write its assumptions, document its framings, and record silenced voices. In a world where algorithms increasingly determine our lives, making power visible is the first step toward accountability.
As written in the Royal Commission Robodebt report: "This failure was not due to malicious intent, but to the system's inability to hear warning voices." PDG is that hearing system. It does not guarantee warning voices will always be heard, but it ensures that when they are ignored, there are traces that can be held accountable later.
8.1 Future Research Agenda: Concrete Empirical Designs
To test and refine the proposed framework, future research should:
- Empirical RQI testing across three sectors: Apply the Reasoning Quality Index to AI projects in welfare, tax, and judicial sectors across various countries. Measure inter-rater reliability by involving two independent evaluation teams assessing the same documents. Test construct validity by comparing RQI scores with quality assessments by expert panels.
- Quasi-experimental before-after studies: Compare error rates and litigation levels between projects developed before PDG implementation (control group) and projects after PDG implementation (treatment group) within the same government agencies. Control for other factors like project complexity and budget. This design can be implemented in states or provinces adopting PDG earlier than other regions.
- Cross-country comparative studies: Compare PDG implementation across different administrative systems and political cultures (e.g., Scandinavian countries with strong transparency traditions vs. Asian countries with hierarchical bureaucracies) to identify contextual factors affecting effectiveness.
- 3-5 year longitudinal studies: Track several AI projects implementing PDG to see whether generated reasoning documentation is actually used in subsequent policy evaluation and improvement. Measure how much dissent records influence corrective decisions.
- Low-risk system instrument development: Develop lighter PDG versions for low-risk systems, along with practical guides for PDG integration into procurement processes across various jurisdictions.
- Controlled field experiments: Design experiments where similar AI system development occurs with and without PDG protocols, then compare outcome quality, development time, and stakeholder satisfaction. These experiments could be conducted in partnership with willing government agency research partners.
- In-depth adoption case studies: Conduct ethnographic studies in government agencies attempting to adopt PDG, to understand cultural, political, and organizational barriers to implementation.
These studies will help test whether the PDG framework can fulfill its promise: not only creating reasoning traces but also improving algorithmic decision quality and public accountability in an era where machines increasingly determine citizens' fates.
REFERENCES
- Amoore, L. (2020). Cloud ethics: Algorithms and the attributes of ourselves and others. Duke University Press.
- Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica, May 23.
- Ansell, C., & Gash, A. (2007). Collaborative governance in theory and practice. Journal of Public Administration Research and Theory, 18(4), 543-571.
- Burrell, J. (2016). How the machine 'thinks': Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 1-12.
- Carney, T. (2021). Robodebt: The failure of automated administrative justice. Federal Law Review, 49(3), 345-372.
- Danaher, J., Hogan, M. J., Noone, C., Kennedy, R., Behan, A., De Paor, A., ... & Shankar, K. (2017). Algorithmic governance: Developing a research agenda through the power of collective intelligence. Big Data & Society, 4(2), 1-21.
- Diakopoulos, N. (2020). Automating the news: How algorithms are rewriting the media. Harvard University Press.
- Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press.
- European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.
- Floridi, L., Cowls, J., Beltrametti, M., et al. (2018). AI4People—An ethical framework for a good AI society. Minds and Machines, 28, 689-707.
- Foucault, M. (1977). Discipline and punish: The birth of the prison. Pantheon Books.
- Fricker, M. (2007). Epistemic injustice: Power and the ethics of knowing. Oxford University Press.
- Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86-92.
- Habermas, J. (1996). Between facts and norms: Contributions to a discourse theory of law and democracy. MIT Press.
- Hobbes, T. (1996). Leviathan (R. Tuck, Ed.). Cambridge University Press. (Original work published 1651)
- Jalonen, H. (2025). Epistemic governance in the context of crisis: A complexity-informed approach. Administration & Society, 57(2), 218-253.
- Lidskog, R., & Sundqvist, G. (2025). Expert advice and global environmental governance: Institutional and epistemic challenges. Sustainability, 17(17), 7876.
- Metzinger, T. (2019). Ethics washing made in Europe. Der Tagesspiegel, April 8.
- Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229.
- NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology.
- Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. New York University Press.
- O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.
- Raghavan, M., Barocas, S., Kleinberg, J., & Levy, K. (2020). Mitigating bias in algorithmic hiring: Evaluating claims and practices. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 469-481.
- Rahwan, I. (2018). Society-in-the-loop: Programming the algorithmic social contract. Ethics and Information Technology, 20(1), 5-14.
- Rawls, J. (1993). Political liberalism. Columbia University Press.
- Sanchez-Graells, A. (2021). Public procurement and the EU competition rules (3rd ed.). Hart Publishing.
- Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press.
- Wagner, B. (2018). Ethics as an escape from regulation: From "ethics-washing" to ethics-shopping? In M. Hildebrandt (Ed.), Being profiled: Cogitas ergo sum (pp. 84-89). Amsterdam University Press.
- Yeung, K. (2018). Algorithmic regulation: A critical interrogation. Regulation & Governance, 12(4), 505-523.
- Yukins, C. R. (2022). Public procurement and artificial intelligence: A research agenda. George Washington University Law School Public Law Research Paper, No. 2022-12.