Task packet design
Each task packet should mirror how GovCon work is assigned, performed, reviewed, and defended.
Packets can include solicitations, amendments, award records, budgets, past performance, internal templates, contract files, and evaluation criteria. Synthetic data is acceptable when public release of real data is not approved.
Rubric scoring
Rubrics cover source faithfulness, completeness, compliance awareness, govcon domain reasoning, output usability, auditability, human-review readiness, security posture awareness.
Reviewers should score the work product, not just the final answer. Strong outputs cite sources, preserve assumptions, expose uncertainty, and produce artifacts a human can review.
Publication rules
Do not publish competitor scores until the methodology, evaluation data, and legal review are ready.
Until then, benchmark pages should focus on the evaluation standard, sample task structure, and why these work products require a domain-specific approach.