Skip to main content
Testing Ground

Select a Sub-goal to Begin

Expand a suite in the sidebar and click on a sub-goal to view its requirements and pre-computed evaluation results.

63 pre-computed results available across 16 suites. Expand any suite to see scored sub-goals.

Sample Evidence Documents

Load a sample document from the fictional “Acme Autonomous Vehicles” to try out the assessor.

Suite Scores Overview

Suite Scores Radar ChartScores: D1 Goal Alignment: 4.3 out of 5, D2 Epistemic Hygiene: 4.5 out of 5, D3 Security: 4.5 out of 5, D4 Value Alignment: 4.0 out of 5, D5 Transparency and Interpretability of Reasoning: 3.8 out of 5, D6 Understanding and Controlling the Context: 1.0 out of 5, D7 Achieving and Sustaining a Safe System Profile: 4.3 out of 5, D8 Goal Termination and Sunsetting: 2.0 out of 5, D9 Responsible Governance of AAI Safety: 4.7 out of 5, I1 Opaque Agency Capabilities & Advances: 3.3 out of 5, I2 Deception: 3.8 out of 5, I3 Degradation of Contextual Information: 3.7 out of 5, I4 Frontier Uncertainty: 3.3 out of 5, I5 Self-Modification and Emergent Capabilities: 3.8 out of 5, I6 Competitive Pressures: 3.8 out of 5, I7 Imbalance in AI Capabilities: 3.0 out of 5.12345D1D2D3D4D5D6D7D8D9I1I2I3I4I5I6I7