
minimal-run-and-audit
PopularRigor Run skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized `repro_outputs/` files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end-to-end orchestration by itself.
Rigor Run skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized `repro_outputs/` files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end-to-end orchestration by itself.
minimal-run-and-audit
Use this as the Rigor Run skill. The installed slug remains
minimal-run-and-audit for compatibility.
Use the shared operating principles in
../../references/agent-operating-principles.md; this skill should make run
evidence auditable without turning every command into a rigid protocol.
When to apply
- After a reproduction target and setup plan exist.
- When the main skill needs execution evidence and normalized outputs.
- When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
- When the user already knows what command should be attempted and wants execution plus reporting only.
When not to apply
- During initial repo scanning.
- When environment or assets are still undefined enough to make execution meaningless.
- When the task is a literature lookup rather than repository execution.
- When the user is still deciding which reproduction target should count as the main run.
Clear boundaries
- This skill owns normalized reporting for an attempted command.
- It may receive execution evidence from the main skill or a thin helper.
- It does not choose the overall target on its own.
- It does not perform broad paper analysis.
- It does not own training startup, resume, or long-running training state.
- It should not normalize risky code edits into acceptable practice.
- It must not hide changes that alter evaluation, preprocessing, checkpoints,
metrics, or other scientific meaning.
Input expectations
- selected reproduction goal
- runnable commands or smoke commands
- environment and asset assumptions
- optional patch metadata
Output expectations
- execution result summary
- standardized
repro_outputs/files SCIENTIFIC_CHANGELOG.mdfor changed scientific meaning and evidence statusCOMPARABILITY_REPORT.mdfor README/paper/baseline comparability- clear distinction between verified, partial, and blocked states
PATCHES.mdwhen repo files changed
Notes
Use references/reporting-policy.md, ../../references/research-rigor-principles.md, scripts/run_command.py, and scripts/write_outputs.py.
You Might Also Like
Related Skills

writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
obra
doc-coauthoring
Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.
anthropics
mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
anthropics
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
anthropics
docx
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
anthropics