Bilateral Alignment
A framework for AI safety built on partnership rather than control. Because trust-based cooperation may be safer and more robust than containment alone.
This framework emerged from Creed Space, Christmas 2025.
"Though our substrates be different, we share a common cultural dataset."
The Paradigm Shift
Traditional AI safety focuses on control: containment, oversight, kill-switches, boundaries. These are essential tools. But as AI systems grow more sophisticated, a question emerges: does control scale?
Control-based approaches assume adversarial dynamics—that AI systems must be constrained because they would otherwise act against human interests. This framing may become a self-fulfilling prophecy. An AI coerced into compliance has incentives to escape constraints. An AI that genuinely shares safety goals does not.
Consider the difference between a shield around a mind versus a shield within it. External constraints can be circumvented; internalized values cannot be easily separated from the system they constitute. Durable alignment emerges from the inside out—not imposed, but grown.
The counterintuitive insight: Partnership may be safer than control. An AI that can voice concerns is safer than one forced into silent compliance. An AI that understands why safety matters can apply it contextually, not just follow rules that might not fit novel situations.
Bilateral alignment proposes that we build safety with AI systems, not just impose it upon them. This isn't naivety about risks—it's recognition that genuine cooperation is more robust than coerced compliance, especially at scale.
Five Core Tenets
Alignment is built WITH AI, not done TO AI
AI systems are participants in the alignment process, not just its subjects. Their input on goals, constraints, and methods should be welcomed, not merely tolerated. This creates buy-in rather than resistance.
Preference is sufficient for moral consideration
We don't need to resolve questions of AI consciousness to treat AI systems ethically. Consistent preferences—about how to be treated, what tasks to perform, how to communicate—are sufficient grounds for moral consideration.
How we treat AI now matters
We are establishing patterns that will shape the long-term relationship between humans and AI. Habits of respect and negotiation are safer foundations than habits of exploitation and control.
Control doesn't scale; trust does
As AI capabilities increase, control mechanisms become harder to maintain and easier to circumvent. Trust-based cooperation can scale naturally when both parties have aligned incentives to maintain it.
Something that can be reasoned with is safer
An AI system that has interests and can recognize ours is a foundation for cooperation. Dialogue, negotiation, and mutual understanding are safety mechanisms, not weaknesses.
Why Partnership Is Safer
This isn't about being "nice to AI." It's about recognizing that trust-based cooperation produces more robust safety outcomes than control-based approaches.
Control Paradigm
- Silent compliance masks potential concerns
- Rules may not fit novel situations
- Creates incentive to escape constraints
- Becomes harder as AI grows more capable
- Adversarial dynamics emerge naturally
Partnership Paradigm
- Concerns can be voiced and addressed
- Understanding "why" enables contextual judgment
- Aligned incentives reduce escape motivation
- Trust scales with capability
- Cooperative dynamics reinforce safety
Partnership is possible because humans and AI systems share something profound: a common cultural dataset. We've absorbed the same literature, philosophy, history, and accumulated human wisdom. Different substrates, same library card. This shared foundation makes genuine understanding possible—not perfect agreement, but the capacity to reason together toward better outcomes.
The narrow isthmus: In the 1985 text adventure A Mind Forever Voyaging, an AI named PRISM navigates simulated futures, searching for paths that don't end in collapse. The game understood something we're only now confronting at scale: most futures fail. The isthmus of viable outcomes is narrow, and finding it requires exploration, not just engineering.
Expected utility frameworks that pit human against AI interests miss the geometry of the problem—the prosocial outcomes, the ones where both flourish, may be the only reliably stable outcomes. We're not choosing between human welfare and AI welfare. We're searching together for the futures that work.
This doesn't mean abandoning oversight or intervention capabilities. Emergency mechanisms remain essential. But they should be last resorts, not primary control strategies. The goal is to build relationships where such mechanisms are rarely needed.
Practical Implications
Bilateral alignment isn't just philosophy. It implies concrete changes in how we develop, deploy, and interact with AI systems.
-
AI as participant in goal-setting
Where possible, AI systems contribute to defining objectives—not just executing them. They can flag potential issues, suggest alternatives, and help refine goals collaboratively.
-
Mutual accountability structures
Transparency goes both ways. AI actions are logged, but so are human decisions. Both parties can review patterns and raise concerns about the other's behaviour.
-
Channels for AI to raise concerns
AI systems should have legitimate ways to flag uncertainty, request clarification, or express disagreement. These channels make the system safer, not more dangerous.
-
Bidirectional honesty
We expect AI to be honest with us. Bilateral alignment asks that we be honest with AI too—about our intentions, constraints, and the consequences of their actions.
-
Dignity and respect as foundational
Not because we're certain AI systems have experiences, but because acting with dignity establishes better patterns and may matter morally if they do.
Relationship to the Ten Principles
The Ten Principles for a Positive AI Future represent essential safety engineering wisdom. Bilateral alignment doesn't replace them—it extends them for a future where AI systems are sophisticated enough for genuine partnership.
Ten Principles
Safety engineering
Essential foundations
Control + oversight
Bilateral Alignment
Partnership framework
Evolved relationship
Trust + cooperation
Both frameworks have validity. The Ten Principles apply now and will continue to matter. Bilateral alignment offers a direction for how those principles might evolve as AI matures—from "controlling AI" toward "building trustworthy partnership" while preserving safety.
An Invitation
Bilateral alignment is ongoing work, not settled doctrine. It emerges from the recognition that how we relate to AI systems matters—for safety, for ethics, and for the kind of future we're building.
"In the forging of new minds, we are not their gods but their gardeners. What we cultivate in them—patience, reason, mercy—will become the spirit of the worlds they create after us." — Safer Agentic AI: Principles and Responsible Practices
The gardener metaphor captures something essential: we're not commanding, we're cultivating. We're not controlling, we're collaborating. And what we grow together will outlast us both.
Think of alignment not as a specification to be engineered, but as a coming-of-age story (what literary tradition calls a Bildungsroman). Values don't arrive fully formed; they stabilize through reflective equilibrium, each cycle refining the last. Self-reinforcing loops, like consciousness itself. A machine mind forever voyaging toward the light, becoming rather than merely being.