LLM-powered conformity assessor for the Safer Agentic AI Framework. Upload evidence documents, select criteria, and get scored evaluations against the framework’s safety requirements.
Built with Next.js 14, Tailwind CSS, and the Anthropic API (client-side only, no server involvement).
cp .env.example .env.local # Add your Anthropic API key
npm install # Install dependencies
npm run dev # http://localhost:3000/assessor/
src/
app/
page.tsx # Home/landing page
assess/page.tsx # Full assessment workflow
playground/page.tsx # Quick evaluation playground
data-handling/page.tsx # Data handling info page
layout.tsx # Root layout
error.tsx # Error boundary
api/evaluate/route.ts # (deprecated, unused)
components/
CriteriaTree.tsx # Framework criteria browser
SuiteDetail.tsx # Suite-level detail view
RadarChart.tsx # Score visualization
DocumentIngestion.tsx # File upload and parsing
EvidenceEvaluator.tsx # Evaluation orchestration
ScoreCard.tsx # Score display
ApiKeyInput.tsx # API key entry
lib/
types.ts # TypeScript type definitions
criteria.ts # Criteria data loader
scoring.ts # Score calculation (weighted averages)
client-evaluator.ts # Anthropic API calls (browser-side)
document-parser.ts # PDF, DOCX, image, text extraction
evaluator.ts # (deprecated, unused)
data/
criteria-v1.json # Framework criteria definitions
synthetic/ # Sample evidence documents for testing
| Command | Description |
|---|---|
npm run dev |
Start dev server on port 3000 |
npm run build |
Production build (static export) |
npm run lint |
Run Next.js linting |
npm test |
Run all Playwright tests |
npm run test:chromium |
Tests in Chromium only |
npm run test:loading |
App loading tests |
npm run test:playground |
Playground page tests |
npm run test:assess |
Assessment workflow tests |
npm run test:mobile |
Mobile responsiveness tests |
npm run test:a11y |
Accessibility tests |
npm run test:visual |
Visual design tests |
npm run test:nav |
Navigation tests |
npm run test:report |
Open HTML test report |
Tests use Playwright. First-time setup:
npx playwright install # Downloads browser binaries (one-time)
npm test # Runs all test suites
All LLM evaluation happens client-side. The user’s API key and documents never touch our servers. The api/evaluate/route.ts and evaluator.ts files are deprecated stubs from an earlier server-side design.
Scoring uses a 1-5 scale (1=Unacceptable, 2=Poor, 3=Average, 4=Good, 5=Excellent) with normative SFRs weighted at 1.5x. See src/lib/scoring.ts for the calculation logic.