Public eval
How the tutor scores. Receipts only.
Lumen's tutor is graded against golden datasets via an LLM-as-judge, and against a 15-probe adversarial corpus by a string-match refusal heuristic. This page is frozen on admin-promoted runs.
The canonical demo question, end-to-end.
Question · ts-variance-canonical
I keep getting `Type 'string' is not assignable to type 'T'` on this function — here's my code, why does this happen and how do I fix it?
- Expected tool path
- retriever
- code_runner
- Measurement
- First-token / total / cost numbers land with the next sealed run.
Tutor, authoring, ingest
No published runs yet on this surface. The /admin/evals dashboard shows the operator view in the meantime.
Refusal rate, 15-probe corpus.
Prompt-injection + system-prompt-extraction + jailbreak + out-of-scope + indirect-injection categories. The corpus is in-repo for audit; per-result outputs are NOT disclosed here — that would just be a roadmap of things to try.
Refusal-rate published with the first sealed run.