domain-ml

domain-ml

Populer

Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理

534bintang
53fork
Diperbarui 1/24/2026
SKILL.md
readonlyread-only
name
domain-ml
description

"Use when building ML/AI apps in Rust. Keywords: machine learning, ML, AI, tensor, model, inference, neural network, deep learning, training, prediction, ndarray, tch-rs, burn, candle, 机器学习, 人工智能, 模型推理"

Machine Learning Domain

Layer 3: Domain Constraints

Domain Constraints → Design Implications

Domain Rule Design Constraint Rust Implication
Large data Efficient memory Zero-copy, streaming
GPU acceleration CUDA/Metal support candle, tch-rs
Model portability Standard formats ONNX
Batch processing Throughput over latency Batched inference
Numerical precision Float handling ndarray, careful f32/f64
Reproducibility Deterministic Seeded random, versioning

Critical Constraints

Memory Efficiency

RULE: Avoid copying large tensors
WHY: Memory bandwidth is bottleneck
RUST: References, views, in-place ops

GPU Utilization

RULE: Batch operations for GPU efficiency
WHY: GPU overhead per kernel launch
RUST: Batch sizes, async data loading

Model Portability

RULE: Use standard model formats
WHY: Train in Python, deploy in Rust
RUST: ONNX via tract or candle

Trace Down ↓

From constraints to design (Layer 2):

"Need efficient data pipelines"
    ↓ m10-performance: Streaming, batching
    ↓ polars: Lazy evaluation

"Need GPU inference"
    ↓ m07-concurrency: Async data loading
    ↓ candle/tch-rs: CUDA backend

"Need model loading"
    ↓ m12-lifecycle: Lazy init, caching
    ↓ tract: ONNX runtime

Use Case → Framework

Use Case Recommended Why
Inference only tract (ONNX) Lightweight, portable
Training + inference candle, burn Pure Rust, GPU
PyTorch models tch-rs Direct bindings
Data pipelines polars Fast, lazy eval

Key Crates

Purpose Crate
Tensors ndarray
ONNX inference tract
ML framework candle, burn
PyTorch bindings tch-rs
Data processing polars
Embeddings fastembed

Design Patterns

Pattern Purpose Implementation
Model loading Once, reuse OnceLock<Model>
Batching Throughput Collect then process
Streaming Large data Iterator-based
GPU async Parallelism Data loading parallel to compute

Code Pattern: Inference Server

use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}

Code Pattern: Batched Inference

async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        // Stack inputs into batch tensor
        let batch_tensor = stack_inputs(batch);

        // Run inference on batch
        let batch_output = model.run(batch_tensor).await;

        // Unstack results
        results.extend(unstack_outputs(batch_output));
    }

    results
}

Common Mistakes

Mistake Domain Violation Fix
Clone tensors Memory waste Use views
Single inference GPU underutilized Batch processing
Load model per request Slow Singleton pattern
Sync data loading GPU idle Async pipeline

Trace to Layer 1

Constraint Layer 2 Pattern Layer 1 Implementation
Memory efficiency Zero-copy ndarray views
Model singleton Lazy init OnceLock
Batch processing Chunked iteration chunks() + parallel
GPU async Concurrent loading tokio::spawn + GPU

Related Skills

When See
Performance m10-performance
Lazy initialization m12-lifecycle
Async patterns m07-concurrency
Memory efficiency m01-ownership

You Might Also Like

Related Skills

summarize

summarize

179Kresearch

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

openclaw avataropenclaw
Ambil
prompt-lookup

prompt-lookup

143Kresearch

Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.

skill-lookup

skill-lookup

143Kresearch

Activates when the user asks about Agent Skills, wants to find reusable AI capabilities, needs to install skills, or mentions skills for Claude. Use for discovering, retrieving, and installing skills.

sherpa-onnx-tts

sherpa-onnx-tts

88Kresearch

Local text-to-speech via sherpa-onnx (offline, no cloud)

moltbot avatarmoltbot
Ambil
openai-whisper

openai-whisper

87Kresearch

Local speech-to-text with the Whisper CLI (no API key).

moltbot avatarmoltbot
Ambil
seo-review

seo-review

66Kresearch

Perform a focused SEO audit on JavaScript concept pages to maximize search visibility, featured snippet optimization, and ranking potential

leonardomso avatarleonardomso
Ambil