A nine-paper research program on admissibility, deterministic execution, semantic coordinates, cross-system transfer, and agent convergence.
Author: Thomas Dionysopoulos · 9 papers · 156 pages · all published with DOIs
Program DOI:
10.5281/zenodo.20459958
AI systems increasingly generate computation rather than having humans write it directly. When the generator is stochastic, the execution system must determine which proposals are admissible, which surface forms are equivalent, whether results can be replayed, and where two executions diverge. This program develops a formal and empirical framework for these problems.
Paper 1 establishes the admissibility boundary: a grounding gate that rejects valid-but-unwarranted operations before execution. Paper 2 formalizes the canonical execution boundary: typed specifications, a canonicalization pipeline, 8-layer provenance hashing, and description/identity separation. Paper 3 proves that the operational equivalence is a congruence, enabling a quotient category that gives precise meaning to deterministic execution identity. Paper 4 defines provenance as a semantic factorization with a dependency-indexed composition law, enabling divergence localization and partial replay. Paper 5 measures the empirical fiber structure of 2,200 stochastic proposals under controlled perturbation, demonstrating that surface-form variation is absorbed while provenance-level changes create clean transitions.
Papers 6–7 investigate the semantic structure of operations themselves: a single 7-valued coordinate (DependencyClass) predicts four independent optimizer behaviors at 99.6% accuracy and generalizes to unseen operations at 100%. Paper 8 tests whether this structure transfers to independently-developed systems: the frozen taxonomy predicts execution behavior in Polars and DuckDB at 91.1% combined accuracy, with zero errors from incorrect dependency-shape assignments. Paper 9 asks whether agents reconstruct structurally equivalent execution-identity primitives under task pressure: across three domains and three model families, 7/8 primitives converge above 0.90.
All constructions are operational, registry-relative, and grounded in a running system (BLISP) evaluated in systematic trading research. The architecture is domain-independent; the evaluation is not.
Read Paper 1 (grounding gate), then Paper 2 (execution semantics). Papers 3–5 for formal quotient semantics, provenance, and fibers. Papers 6–7 for semantic coordinates as predictive objects. Paper 8 for cross-system transfer. Paper 9 for agent convergence.
Start with Paper 1 for the architecture and grounding gate. Paper 2 describes the canonicalization pipeline and 8-layer hashing you would implement. Paper 6 shows how a single coordinate predicts optimizer behavior. Paper 8 demonstrates cross-system portability.
Paper 1 addresses valid-but-unwarranted execution in LLM tool use. Paper 5 measures how LLM-generated proposals behave under controlled perturbation. Paper 9 shows that independent agents reconstruct structurally equivalent execution-identity primitives under task pressure.
Paper 1 establishes the core value proposition. Paper 8 proves the taxonomy transfers across systems. Paper 9 demonstrates convergent reconstruction by agents, implying the structure is natural and worth materializing. The papers portal provides the complete picture.
Read Paper 1: The Grounding Gate. It introduces the core problem (valid-but-unwarranted execution), the grounding gate architecture, and the empirical evaluation. No prerequisites. 13 pages.
Each paper depends on all preceding papers. The program is a linear chain, not a DAG.
AI systems that generate computational pipelines from natural language may propose operations that are structurally valid but semantically unwarranted. This paper presents a grounding gate: a mandatory admissibility boundary between AI-proposed operations and deterministic execution. The system discovers which capabilities match the user's terms by querying a live registry (236 capabilities) and rejects proposals whose names lack discovery evidence. Evaluated on 30 prompts: unwarranted execution reduced from 23.3% to 10.0% (Fisher exact p = 0.027). Replay produces bit-identical hashes across 50 runs. Grounding overhead under 14 ms.
When the generator of computation is stochastic, independently generated programs that represent the same intended computation arrive in different surface forms. This paper presents the canonical execution boundary: an architectural invariant beyond which stochasticity does not propagate. Four mechanisms enforce the boundary: typed specifications, a canonicalization pipeline (278 surface forms to 235 canonical operations), 8-layer execution hashing, and description/identity separation. Evaluated on 1,200 stochastic LLM generations with 50-run replay determinism and provenance stability under registry evolution.
The operational equivalence generated by the system's rewrite rules (alias resolution, argument-order normalization, canonical form selection) forms a congruence: equivalent subexpressions remain equivalent under arbitrary well-typed pipeline composition. This is the central formal result of the program. The resulting quotient category gives precise meaning to deterministic execution identity. Content-addressed hashing serves as a computable operational witness of quotient membership. A projection connects stochastic proposals to their execution classes, with fibers measuring collapse from surface diversity to canonical identity.
Provenance for deterministic execution systems is not metadata but a semantic factorization of execution identity. A provenance map decomposes each execution equivalence class into an 8-layer hash record with declared dependencies. A dependency-indexed composition law establishes that pipeline provenance is determined by stage provenance and the declared dependency map. This enables replay equivalence by hash comparison, divergence localization to specific semantic layers, partial replay of only changed layers, and provenance-preserving registry evolution where discovery aliases are invisible at all eight layers.
Two distinct kinds of variation emerge when stochastic generators propose executable specifications: surface-form variation (absorbed by canonicalization, intra-fiber) and execution ambiguity (changing execution identity, inter-fiber). Across 2,200 proposals with controlled perturbations: synonym rewording stays within fibers (rho = 0.985), metric and family substitutions produce zero same-fiber mass (rho = 0.000) with perfect per-variant stability (sigma = 1.000). The execution adjacency graph is sparse (density = 0.095, 10 connected components). The key finding is that provenance-level changes create clean, stable transitions between execution classes, not noisy instability.
A single 7-valued coordinate (DependencyClass) classifies operations by data-dependency shape and predicts four independent optimizer behaviors—fusion eligibility, window semantics, pipeline position, and state management—with 99.6% accuracy (243/244 behavior predictions, z = 13.0, p < 10−38 vs random baseline). The coordinate is not a descriptive label; it is a predictive object that determines execution behavior from semantic structure alone.
A frozen taxonomy trained on 61 operations generalizes to 25 unseen operations at 100% accuracy (100/100 holdout predictions) with zero recalibration. Coordinate ablation confirms that the full coordinate is minimal—removing any single dimension degrades prediction. Random baselines with equivalent cardinality achieve chance accuracy. The result establishes semantic coordinates as predictive objects: they predict optimizer behavior, not merely describe it.
A frozen 8-valued dependency-shape taxonomy, built without inspecting either target system, predicts three execution behaviors (streaming, buffering, warmup) in Polars (Rust, morsel-driven) and DuckDB (C++, push-based). Buffering predictions reach 96.7% accuracy in both systems. Combined accuracy across 180 predictions is 91.1%, with zero errors from incorrect dependency-shape assignments. All errors trace to architectural choices and API conventions, not to the taxonomy itself.
Independent frontier model families (Anthropic, OpenAI, Google), working on independent domains (finance, SQL, build/CI), reconstruct structurally equivalent execution-identity primitives under task pressure. Nine question tiers of increasing difficulty elicit eight primitives: normalization, canonical identity, equivalence classes, grouping, composite rewriting, replay mappings, computation DAGs, and policy checking. 7/8 primitives converge above 0.90 across 55 runs. Reconstruction is convergent, staged, and expensive (~178,000 tokens per reconstruction). A reference implementation materializes the same eight primitives as persistent, composable, domain-portable infrastructure at zero marginal query cost.
| # | Paper | Type | Pages | DOI | Release |
|---|---|---|---|---|---|
| 1 | The Grounding Gate | Empirical | 13 | 20456984 | v1 |
| 2 | Canonical Execution Semantics | Empirical | 23 | 20457255 | v1 |
| 3 | Execution Categories | Formal | 14 | 20457403 | v1 |
| 4 | Provenance Algebra | Formal | 15 | 20457667 | v1 |
| 5 | Execution Fibers | Empirical | 12 | 20457990 | v1 |
| 6 | The Semantic Structure of Execution | Empirical | 17 | 20612709 | v1 |
| 7 | Semantic Coordinates as Predictive Objects | Empirical | 14 | 20706294 | v1 |
| 8 | Dependency Shape Predicts Execution Behavior | Empirical | 17 | 20706086 | v1 |
| 9 | Cross-Family Convergence | Empirical | 17 | 20706156 | v1 |
Total: 156 pages across 9 papers. All published as open-access working papers under CC-BY-4.0. Each GitHub release contains PDF, LaTeX source, experiment data (where applicable), verification scripts, CITATION.cff, and .zenodo.json.
Seven of the nine papers include computational experiments with published datasets.
| Paper | Dataset | Size |
|---|---|---|
| Paper 1 | 30-prompt evaluation (5 categories, 4 families, 9 metrics) | prompts_30.json |
| Paper 2 | 1,200 LLM generations (30 prompts x 4 temps x 10 reps), replay CSV, provenance CSV | experiment-data.tar.gz |
| Paper 3 | Theoretical paper, no experiment data | -- |
| Paper 4 | Theoretical paper, no experiment data | -- |
| Paper 5 | 2,200 proposals (1,200 baseline + 1,000 perturbations), fiber stats, adjacency graph | experiment-data.tar.gz |
| Paper 6 | 61-operation taxonomy, 4 optimizer behavior predictions, conditional MI analysis, holdout data | cargo test |
| Paper 7 | 25-operation holdout generalization, coordinate ablation, random baseline comparison | cargo test |
| Paper 8 | 30 operations × 2 systems × 3 behaviors (180 predictions), Polars + DuckDB | reproduce.sh |
| Paper 9 | 55 runs across 3 model families × 3 domains × 9 question tiers, ~178k tokens per run | reproduce.sh |
All datasets are included in their respective GitHub releases. Verification scripts are provided for each paper.
To reference the program as a whole:
Dionysopoulos, T. (2026). BLISP Research Program: Admissibility, Deterministic Execution, Provenance, and Capability-Grounded AI Systems. Zenodo. https://doi.org/10.5281/zenodo.20459958
@misc{blisp2026program,
title = {BLISP Research Program: Admissibility, Deterministic Execution,
Provenance, and Capability-Grounded AI Systems},
author = {Dionysopoulos, Thomas},
year = {2026},
doi = {10.5281/zenodo.20459958},
publisher = {Zenodo},
url = {https://doi.org/10.5281/zenodo.20459958},
note = {9-paper program; all papers published with DOIs}
}
| # | BibTeX Key | DOI |
|---|---|---|
| 1 | dionysopoulos2026grounding | 10.5281/zenodo.20456984 |
| 2 | dionysopoulos2026canonical | 10.5281/zenodo.20457255 |
| 3 | dionysopoulos2026categories | 10.5281/zenodo.20457403 |
| 4 | dionysopoulos2026provenance | 10.5281/zenodo.20457667 |
| 5 | dionysopoulos2026fibers | 10.5281/zenodo.20457990 |
| 6 | dionysopoulos2026semantic | 10.5281/zenodo.20612709 |
| 7 | dionysopoulos2026predictive | 10.5281/zenodo.20706294 |
| 8 | dionysopoulos2026transfer | 10.5281/zenodo.20706086 |
| 9 | dionysopoulos2026convergence | 10.5281/zenodo.20706156 |
Full BibTeX entries with DOI fields are available on each paper card.