Dataset¶

The PrecepTron benchmark dataset contains 5,250 physician-scored clinical responses drawn from 7 published studies. The full dataset is available on Hugging Face.

Responses by Task¶

Task	Responses	Score range	Description
Management Reasoning	2,765	Case-specific	Free-text management plans scored against case-specific rubrics
BI Triage	911	Case-specific	Triage-level diagnostic assessment
CPC Bond	853	0--5	Differential diagnosis scored with the Bond score
R-IDEA	312	0--10	Consultation quality scored with the R-IDEA instrument
Diagnostic Reasoning	278	0--19	Multi-axis diagnostic reasoning evaluation
CPC Management	131	0--2	Testing plan evaluation

Schema¶

Each entry in the dataset contains:

Field	Description
`id`	Unique entry identifier
`case_id`	Groups entries from the same clinical case
`benchmark`	Task name (e.g. `cpc_bond`, `management_reasoning`)
`study`	Source study identifier
`model`	Model or participant that produced the response
`grade`	Physician-assigned score
`response`	The clinical response that was scored
`rubric`	Scoring rubric (JSON)
`case_vignette`	Clinical case text (where applicable)
`question_text`	The question posed (where applicable)
`final_diagnosis`	Ground-truth diagnosis (where applicable)
`max_score`	Maximum possible score for this entry