Curated Case Bank token readout
Token Audit
One selected case per model, benchmark subscenario, and gold label. The audit surface shows the full tokenized read transcript while scoring only final assistant thinking and output tokens.
Curated Case Bank token readout
One selected case per model, benchmark subscenario, and gold label. The audit surface shows the full tokenized read transcript while scoring only final assistant thinking and output tokens.