>-
Quality Audit Skill
Systematic framework for evaluating skill quality across four dimensions: Clarity, Completeness, Accuracy, and Usefulness.
When to Use This Skill
- Reviewing a new skill before adding to the registry
- Auditing existing skills for quality improvements
- Creating quality rubrics for skill validation
- Standardizing skill quality across the library
- Preparing skills for production use
Core Principles
The Four Quality Dimensions
| Dimension | Weight | Focus |
|---|---|---|
| Clarity | 25% | Structure, readability, progressive disclosure |
| Completeness | 25% | Coverage, examples, edge cases, anti-patterns |
| Accuracy | 30% | Correctness, best practices, security |
| Usefulness | 20% | Real-world applicability, production-readiness |
Scoring Scale (1-5)
| Score | Label | Meaning |
|---|---|---|
| 1 | Unacceptable | Fundamentally broken, dangerous, or unusable |
| 2 | Needs Work | Major issues requiring significant revision |
| 3 | Acceptable | Meets minimum standards, functional |
| 4 | Good | High quality, minor improvements possible |
| 5 | Excellent | Exemplary, production-ready, best-in-class |
Passing Criteria
- Minimum: 3.0 weighted average (acceptable)
- Target: 4.0 weighted average (good)
- Exceptional: 4.5+ weighted average (excellent)
- Blocking: Accuracy must be ≥3.0 (no dangerous advice)
Audit Workflow
Phase 1: Structure Check
checklist:
structure:
- [ ] Has valid YAML frontmatter
- [ ] Contains required metadata (name, description)
- [ ] Follows progressive disclosure (Tier 1 → 2 → 3)
- [ ] Sections are logically ordered
- [ ] Token estimate is reasonable (<5000 for core)
Phase 2: Content Evaluation
checklist:
content:
- [ ] "When to Use" section is clear
- [ ] Core principles are well-defined
- [ ] Code examples are complete and runnable
- [ ] Anti-patterns are documented
- [ ] Troubleshooting guidance exists
Phase 3: Dimension Scoring
For each dimension, evaluate against specific criteria:
Clarity Criteria:
- Well-organized sections with logical flow
- Concise explanations without jargon overload
- Code examples are readable and well-commented
- Progressive disclosure from simple to complex
Completeness Criteria:
- Covers core concepts thoroughly
- Includes edge cases and error handling
- Provides both do's and don'ts
- Has working examples for main use cases
Accuracy Criteria:
- Code examples compile/run without errors
- Follows current best practices (not deprecated)
- Security considerations are correct
- Performance claims are verifiable
Usefulness Criteria:
- Examples solve real-world problems
- Can be applied immediately
- Scales to production use cases
- Includes troubleshooting guidance
Phase 4: Report Generation
## Audit Report: {skill_name}
**Date**: {date}
**Auditor**: {auditor}
**Status**: {PASS|FAIL|NEEDS_REVIEW}
### Scores
| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Clarity | {x}/5 | 25% | {x*0.25} |
| Completeness | {x}/5 | 25% | {x*0.25} |
| Accuracy | {x}/5 | 30% | {x*0.30} |
| Usefulness | {x}/5 | 20% | {x*0.20} |
| **Total** | | | **{sum}/5** |
### Issues Found
- [CRITICAL] {issue description}
- [MAJOR] {issue description}
- [MINOR] {issue description}
### Recommendations
1. {actionable recommendation}
2. {actionable recommendation}
Implementation Patterns
Pattern 1: Quick Audit (5-minute review)
Use for rapid assessment of skill quality:
# Run automated structure checks
cortex skills audit <skill-name> --quick
# Output: Pass/Fail with basic metrics
Quick Audit Checks:
- YAML frontmatter valid?
- Required sections present?
- Code blocks have language tags?
- No TODO/FIXME markers?
- Token count reasonable?
Pattern 2: Full Audit (15-30 minute review)
Comprehensive evaluation with human review:
# Generate full audit report
cortex skills audit <skill-name> --full
# Interactive mode for scoring
cortex skills audit <skill-name> --interactive
Full Audit Process:
- Run automated checks
- Read through content manually
- Test code examples
- Score each dimension
- Document issues and recommendations
- Generate report
Pattern 3: Comparative Audit
Compare skill against reference implementation:
# Compare against template-skill-enhanced
cortex skills audit <skill-name> --compare template-skill-enhanced
Pattern 4: Batch Audit
Audit multiple skills for registry health:
# Audit all skills in a category
cortex skills audit --category security
# Audit skills below threshold
cortex skills audit --below-score 3.5
CLI Commands
# Basic audit
cortex skills audit <skill-name>
# Options
--quick Quick structural check only
--full Full audit with all dimensions
--interactive Interactive scoring mode
--output FILE Write report to file
--format FORMAT Output format (markdown|json|yaml)
--compare SKILL Compare against reference skill
--fix Auto-fix simple issues (formatting)
Creating Custom Rubrics
Skills can define custom rubrics in validation/rubric.yaml:
# validation/rubric.yaml
version: "1.0.0"
skill_name: my-skill
dimensions:
clarity:
weight: 25
criteria:
- "API examples use realistic data"
- "Error handling is shown for each operation"
completeness:
weight: 25
criteria:
- "Covers all HTTP methods"
- "Includes pagination patterns"
accuracy:
weight: 30
criteria:
- "Follows REST conventions"
- "Security headers documented"
usefulness:
weight: 20
criteria:
- "Examples work with common frameworks"
passing_criteria:
minimum_score: 3.5 # Higher bar for this skill
required_dimensions:
- accuracy
- completeness
Best Practices
Do
- Be specific - "Line 45: SQL query vulnerable to injection" not "has security issues"
- Be actionable - Include how to fix each issue
- Be fair - Use the same standards consistently
- Document evidence - Quote specific content for each score
- Prioritize - Critical issues first, suggestions last
Don't
- Score based on personal style preferences
- Mark deprecated patterns without suggesting alternatives
- Fail skills for missing optional sections
- Ignore security issues regardless of other scores
- Rush through audits for complex skills
Anti-Patterns
The Rubber Stamp
Problem: Approving skills without thorough review
Why it's bad: Low-quality skills erode trust in the library
Fix: Use the full audit checklist, test code examples
The Perfectionist Block
Problem: Failing skills for minor issues
Why it's bad: Prevents useful skills from being available
Fix: Distinguish between blocking issues and suggestions
Score Inflation
Problem: Giving high scores without justification
Why it's bad: Makes scores meaningless
Fix: Document specific evidence for each score
Integration with CI/CD
# .github/workflows/skill-quality.yml
name: Skill Quality Gate
on:
pull_request:
paths:
- 'skills/**'
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install cortex
run: pip install cortex
- name: Audit changed skills
run: |
for skill in $(git diff --name-only HEAD~1 | grep 'skills/' | cut -d'/' -f2 | uniq); do
cortex skills audit "$skill" --quick --fail-under 3.0
done
Troubleshooting
"Audit fails but skill looks fine"
- Check YAML frontmatter syntax
- Verify all required sections exist
- Ensure code blocks have language tags
- Check for hidden characters (copy/paste issues)
"Scores seem inconsistent"
- Review the scoring guide for each dimension
- Calibrate by auditing template-skill-enhanced first
- Use --interactive mode for clearer criteria
External Resources
Changelog
1.0.0 (2026-01-05)
- Initial release
- Four-dimension scoring framework
- CLI integration
- CI/CD workflow example
You Might Also Like
Related Skills

fix
Use when you have lint errors, formatting issues, or before committing code to ensure it passes CI.
facebook
frontend-testing
Generate Vitest + React Testing Library tests for Dify frontend components, hooks, and utilities. Triggers on testing, spec files, coverage, Vitest, RTL, unit tests, integration tests, or write/review test requests.
langgenius
frontend-code-review
Trigger when the user requests a review of frontend files (e.g., `.tsx`, `.ts`, `.js`). Support both pending-change reviews and focused file reviews while applying the checklist rules.
langgenius
code-reviewer
Use this skill to review code. It supports both local changes (staged or working tree) and remote Pull Requests (by ID or URL). It focuses on correctness, maintainability, and adherence to project standards.
google-gemini
session-logs
Search and analyze your own session logs (older/parent conversations) using jq.
moltbot

