m10-performance

Performance Optimization

Layer 2: Design Choices

Core Question

What's the bottleneck, and is optimization worth it?

Before optimizing:

Have you measured? (Don't guess)
What's the acceptable performance?
Will optimization add complexity?

Performance Decision → Implementation

Goal	Design Choice	Implementation
Reduce allocations	Pre-allocate, reuse	`with_capacity`, object pools
Improve cache	Contiguous data	`Vec`, `SmallVec`
Parallelize	Data parallelism	`rayon`, threads
Avoid copies	Zero-copy	References, `Cow<T>`
Reduce indirection	Inline data	`smallvec`, arrays

Thinking Prompt

Before optimizing:

Have you measured?
- Profile first → flamegraph, perf
- Benchmark → criterion, cargo bench
- Identify actual hotspots
What's the priority?
- Algorithm (10x-1000x improvement)
- Data structure (2x-10x)
- Allocation (2x-5x)
- Cache (1.5x-3x)
What's the trade-off?
- Complexity vs speed
- Memory vs CPU
- Latency vs throughput

Trace Up ↑

To domain constraints (Layer 3):

"How fast does this need to be?"
    ↑ Ask: What's the performance SLA?
    ↑ Check: domain-* (latency requirements)
    ↑ Check: Business requirements (acceptable response time)

Question	Trace To	Ask
Latency requirements	domain-*	What's acceptable response time?
Throughput needs	domain-*	How many requests per second?
Memory constraints	domain-*	What's the memory budget?

Trace Down ↓

To implementation (Layer 1):

"Need to reduce allocations"
    ↓ m01-ownership: Use references, avoid clone
    ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize"
    ↓ m07-concurrency: Choose rayon or threads
    ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency"
    ↓ Data layout: Prefer Vec over HashMap when possible
    ↓ Access patterns: Sequential over random access

Quick Reference

Tool	Purpose
`cargo bench`	Micro-benchmarks
`criterion`	Statistical benchmarks
`perf` / `flamegraph`	CPU profiling
`heaptrack`	Allocation tracking
`valgrind` / `cachegrind`	Cache analysis

Optimization Priority

1. Algorithm choice     (10x - 1000x)
2. Data structure       (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization   (1.5x - 3x)
5. SIMD/Parallelism     (2x - 8x)

Common Techniques

Technique	When	How
Pre-allocation	Known size	`Vec::with_capacity(n)`
Avoid cloning	Hot paths	Use references or `Cow<T>`
Batch operations	Many small ops	Collect then process
SmallVec	Usually small	`smallvec::SmallVec<[T; N]>`
Inline buffers	Fixed-size data	Arrays over Vec

Common Mistakes

Mistake	Why Wrong	Better
Optimize without profiling	Wrong target	Profile first
Benchmark in debug mode	Meaningless	Always `--release`
Use LinkedList	Cache unfriendly	`Vec` or `VecDeque`
Hidden `.clone()`	Unnecessary allocs	Use references
Premature optimization	Wasted effort	Make it work first

Anti-Patterns

Anti-Pattern	Why Bad	Better
Clone to avoid lifetimes	Performance cost	Proper ownership
Box everything	Indirection cost	Stack when possible
HashMap for small sets	Overhead	Vec with linear search
String concat in loop	O(n^2)	`String::with_capacity` or `format!`

Related Skills

When	See
Reducing clones	m01-ownership
Concurrency options	m07-concurrency
Smart pointer choice	m02-resource
Domain requirements	domain-*

Related Skills

fix

243Kdev-testing

Use when you have lint errors, formatting issues, or before committing code to ensure it passes CI.

facebook

获取

peekaboo

179Kdev-testing

Capture and automate macOS UI with the Peekaboo CLI.

openclaw

获取

Generate Vitest + React Testing Library tests for Dify frontend components, hooks, and utilities. Triggers on testing, spec files, coverage, Vitest, RTL, unit tests, integration tests, or write/review test requests.

langgenius

获取

frontend-code-review

127Kdev-testing

Trigger when the user requests a review of frontend files (e.g., `.tsx`, `.ts`, `.js`). Support both pending-change reviews and focused file reviews while applying the checklist rules.

langgenius

获取

code-reviewer

92Kdev-testing

Use this skill to review code. It supports both local changes (staged or working tree) and remote Pull Requests (by ID or URL). It focuses on correctness, maintainability, and adherence to project standards.

google-gemini

获取

session-logs

90Kdev-testing

Search and analyze your own session logs (older/parent conversations) using jq.

moltbot

获取

m10-performance

Performance Optimization

Core Question

Performance Decision → Implementation

Thinking Prompt

Trace Up ↑

Trace Down ↓

Quick Reference

Optimization Priority

Common Techniques

Common Mistakes

Anti-Patterns

Related Skills

You Might Also Like

Related Skills

fix

peekaboo

frontend-testing

frontend-code-review

code-reviewer

session-logs