Skip to content

Dataset

The PrecepTron benchmark dataset contains 5,250 physician-scored clinical responses drawn from 7 published studies. The full dataset is available on Hugging Face.

Responses by Task

Task Responses Score range Description
Management Reasoning 2,765 Case-specific Free-text management plans scored against case-specific rubrics
BI Triage 911 Case-specific Triage-level diagnostic assessment
CPC Bond 853 0--5 Differential diagnosis scored with the Bond score
R-IDEA 312 0--10 Consultation quality scored with the R-IDEA instrument
Diagnostic Reasoning 278 0--19 Multi-axis diagnostic reasoning evaluation
CPC Management 131 0--2 Testing plan evaluation

Schema

Each entry in the dataset contains:

Field Description
id Unique entry identifier
case_id Groups entries from the same clinical case
benchmark Task name (e.g. cpc_bond, management_reasoning)
study Source study identifier
model Model or participant that produced the response
grade Physician-assigned score
response The clinical response that was scored
rubric Scoring rubric (JSON)
case_vignette Clinical case text (where applicable)
question_text The question posed (where applicable)
final_diagnosis Ground-truth diagnosis (where applicable)
max_score Maximum possible score for this entry