Dataset¶
The PrecepTron benchmark dataset contains 5,250 physician-scored clinical responses drawn from 7 published studies. The full dataset is available on Hugging Face.
Responses by Task¶
| Task | Responses | Score range | Description |
|---|---|---|---|
| Management Reasoning | 2,765 | Case-specific | Free-text management plans scored against case-specific rubrics |
| BI Triage | 911 | Case-specific | Triage-level diagnostic assessment |
| CPC Bond | 853 | 0--5 | Differential diagnosis scored with the Bond score |
| R-IDEA | 312 | 0--10 | Consultation quality scored with the R-IDEA instrument |
| Diagnostic Reasoning | 278 | 0--19 | Multi-axis diagnostic reasoning evaluation |
| CPC Management | 131 | 0--2 | Testing plan evaluation |
Schema¶
Each entry in the dataset contains:
| Field | Description |
|---|---|
id |
Unique entry identifier |
case_id |
Groups entries from the same clinical case |
benchmark |
Task name (e.g. cpc_bond, management_reasoning) |
study |
Source study identifier |
model |
Model or participant that produced the response |
grade |
Physician-assigned score |
response |
The clinical response that was scored |
rubric |
Scoring rubric (JSON) |
case_vignette |
Clinical case text (where applicable) |
question_text |
The question posed (where applicable) |
final_diagnosis |
Ground-truth diagnosis (where applicable) |
max_score |
Maximum possible score for this entry |