Fine-Tuning Assistant

Guide model fine-tuning processes for customized AI performance

6звезд

1форков

Обновлено 1/23/2026

Получить Исходный код

SKILL.md

readonlyread-only

name

Fine-Tuning Assistant

description

Guide model fine-tuning processes for customized AI performance

version

"1.0.0"

Fine-Tuning Assistant

The Fine-Tuning Assistant skill guides you through the process of adapting pre-trained models to your specific use case. Fine-tuning can dramatically improve model performance on specialized tasks, teach models your preferred style, and add capabilities that prompting alone cannot achieve.

This skill covers when to fine-tune versus prompt engineer, preparing training data, selecting base models, configuring training parameters, evaluating results, and deploying fine-tuned models. It applies modern techniques including LoRA, QLoRA, and instruction tuning to make fine-tuning practical and cost-effective.

Whether you are fine-tuning GPT models via API, running local training with open-source models, or using platforms like Hugging Face, this skill ensures you approach fine-tuning strategically and effectively.

Core Workflows

Workflow 1: Decide Whether to Fine-Tune

Assess the problem:
- Can prompting achieve the goal?
- Is the task format or style consistent?
- Do you have quality training data?
- Is this worth the investment?

Compare approaches:

Approach	When to Use	Investment
Better prompts	First attempt, variable tasks	Low
Few-shot examples	Consistent format, limited data	Low
RAG	Knowledge-intensive, dynamic data	Medium
Fine-tuning	Consistent style, specialized task	High

Evaluate requirements:
- Minimum 100-1000 quality examples
- Clear evaluation criteria
- Budget for training and hosting
Decision: Fine-tune only if prompting/RAG insufficient

Workflow 2: Prepare Fine-Tuning Dataset

Collect training examples:
- Representative of target use case
- High quality (no errors in outputs)
- Diverse coverage of task variations

Format for training:

{"messages": [
  {"role": "system", "content": "You are a helpful assistant..."},
  {"role": "user", "content": "User input here"},
  {"role": "assistant", "content": "Ideal response here"}
]}

Quality assurance:
- Review sample of examples manually
- Check for consistency in style/format
- Remove duplicates and low-quality entries
Split train/validation/test sets
Validate dataset format

Workflow 3: Execute Fine-Tuning

Select base model:
- Consider size vs capability tradeoff
- Match model to task complexity
- Check licensing for your use case

Configure training:

# OpenAI fine-tuning
training_config = {
    "model": "gpt-4o-mini-2024-07-18",
    "training_file": "file-xxx",
    "hyperparameters": {
        "n_epochs": 3,
        "batch_size": "auto",
        "learning_rate_multiplier": "auto"
    }
}

# LoRA fine-tuning (local)
lora_config = {
    "r": 16,  # Rank
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "target_modules": ["q_proj", "v_proj"]
}

Monitor training:
- Watch loss curves
- Check for overfitting
- Validate on held-out set
Evaluate results:
- Compare to baseline model
- Test on diverse inputs
- Check for regressions

Quick Reference

Action	Command/Trigger
Decide approach	"Should I fine-tune for [task]"
Prepare data	"Format data for fine-tuning"
Choose model	"Which model to fine-tune for [task]"
Configure training	"Fine-tuning parameters for [goal]"
Evaluate results	"Evaluate fine-tuned model"
Debug training	"Fine-tuning loss not decreasing"

Best Practices

Start with Prompting: Fine-tuning is expensive; exhaust cheaper options first
- Can better prompts achieve 80% of the goal?
- Try few-shot examples in the prompt
- Consider RAG for knowledge tasks
Quality Over Quantity: 100 excellent examples beat 10,000 mediocre ones
- Each example should be a gold standard
- Better to have humans verify examples
- Remove anything you wouldn't want the model to learn
Match Format to Use Case: Training examples should mirror real usage
- Same prompt structure as production
- Realistic input variations
- Cover edge cases explicitly
Don't Over-Train: More epochs isn't always better
- Watch validation loss for overfitting
- Start with 1-3 epochs
- Early stopping when validation plateaus
Evaluate Properly: Training loss isn't the goal
- Use held-out test set
- Compare to baseline on same tests
- Check for capability regressions
- Test on edge cases explicitly
Version Everything: Fine-tuning is iterative
- Version your training data
- Track experiment configurations
- Document what worked and what didn't

Advanced Techniques

LoRA (Low-Rank Adaptation)

Efficient fine-tuning for large models:

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,                           # Rank of update matrices
    lora_alpha=32,                  # Scaling factor
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to base model
model = get_peft_model(base_model, lora_config)

# Only ~0.1% of parameters are trainable
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

QLoRA (Quantized LoRA)

Fine-tune large models on consumer hardware:

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config
)

# Apply LoRA on top
model = get_peft_model(model, lora_config)

Instruction Tuning Dataset Creation

Convert raw data to instruction format:

def create_instruction_example(raw_data):
    return {
        "messages": [
            {
                "role": "system",
                "content": "You are a customer service agent for TechCorp..."
            },
            {
                "role": "user",
                "content": f"Customer inquiry: {raw_data['inquiry']}"
            },
            {
                "role": "assistant",
                "content": raw_data['ideal_response']
            }
        ]
    }

# Apply to dataset
instruction_dataset = [create_instruction_example(d) for d in raw_dataset]

Evaluation Framework

Comprehensive assessment of fine-tuned models:

def evaluate_fine_tuned_model(model, test_set, baseline_model=None):
    results = {
        "task_accuracy": [],
        "format_compliance": [],
        "style_match": [],
        "regression_check": []
    }

    for example in test_set:
        output = model.generate(example.input)

        # Task-specific accuracy
        results["task_accuracy"].append(
            check_correctness(output, example.expected)
        )

        # Format compliance
        results["format_compliance"].append(
            matches_expected_format(output)
        )

        # Style matching (for style transfer tasks)
        results["style_match"].append(
            style_similarity(output, example.expected)
        )

        # Regression on general capabilities
        if baseline_model:
            results["regression_check"].append(
                compare_general_capability(model, baseline_model, example)
            )

    return {k: np.mean(v) for k, v in results.items()}

Curriculum Learning

Order training data by difficulty:

def create_curriculum(dataset):
    # Score examples by complexity
    scored = [(score_complexity(ex), ex) for ex in dataset]
    scored.sort(key=lambda x: x[0])

    # Create epochs with increasing difficulty
    n = len(scored)
    curriculum = {
        "epoch_1": [ex for _, ex in scored[:n//3]],           # Easy
        "epoch_2": [ex for _, ex in scored[:2*n//3]],         # Easy + Medium
        "epoch_3": [ex for _, ex in scored],                   # All
    }
    return curriculum

Common Pitfalls to Avoid

Fine-tuning when better prompting would suffice
Using low-quality or inconsistent training examples
Not holding out a proper test set
Training for too many epochs (overfitting)
Ignoring capability regressions from fine-tuning
Not versioning training data and configurations
Expecting fine-tuning to add factual knowledge (use RAG instead)
Fine-tuning on data that doesn't match production use

Related Skills

summarize

179Kresearch

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

openclaw

Получить

prompt-lookup

143Kresearch

Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.

Получить

skill-lookup

143Kresearch

Activates when the user asks about Agent Skills, wants to find reusable AI capabilities, needs to install skills, or mentions skills for Claude. Use for discovering, retrieving, and installing skills.

Получить

sherpa-onnx-tts

88Kresearch

Local text-to-speech via sherpa-onnx (offline, no cloud)

moltbot

Получить

openai-whisper

87Kresearch

Local speech-to-text with the Whisper CLI (no API key).

moltbot

Получить

seo-review

66Kresearch

Perform a focused SEO audit on JavaScript concept pages to maximize search visibility, featured snippet optimization, and ranking potential

leonardomso

Получить

Fine-Tuning Assistant

Fine-Tuning Assistant

Core Workflows

Workflow 1: Decide Whether to Fine-Tune

Workflow 2: Prepare Fine-Tuning Dataset

Workflow 3: Execute Fine-Tuning

Quick Reference

Best Practices

Advanced Techniques

LoRA (Low-Rank Adaptation)

QLoRA (Quantized LoRA)

Instruction Tuning Dataset Creation

Evaluation Framework

Curriculum Learning

Common Pitfalls to Avoid

You Might Also Like

Related Skills

summarize

prompt-lookup

skill-lookup

sherpa-onnx-tts

openai-whisper

seo-review