
phoenix-cli
ПопулярноDebug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
Phoenix CLI
Debug and analyze LLM applications using the Phoenix CLI (px).
Quick Start
Installation
npm install -g @arizeai/phoenix-cli
# Or run directly with npx
npx @arizeai/phoenix-cli
Configuration
Set environment variables before running commands:
export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key # if authentication is enabled
CLI flags override environment variables when specified.
Debugging Workflows
Debug a failing LLM application
- Fetch recent traces to see what's happening:
px traces --limit 10
- Find failed traces:
px traces --limit 50 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
- Get details on a specific trace:
px trace <trace-id>
- Look for errors in spans:
px trace <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'
Find performance issues
- Get the slowest traces:
px traces --limit 20 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
- Analyze span durations within a trace:
px trace <trace-id> --format raw | jq '.spans | sort_by(-.duration_ms) | .[0:5] | .[] | {name, duration_ms, span_kind}'
Analyze LLM usage
Extract models and token counts:
px traces --limit 50 --format raw --no-progress | \
jq -r '.[].spans[] | select(.span_kind == "LLM") | {model: .attributes["llm.model_name"], prompt_tokens: .attributes["llm.token_count.prompt"], completion_tokens: .attributes["llm.token_count.completion"]}'
Review experiment results
- List datasets:
px datasets
- List experiments for a dataset:
px experiments --dataset my-dataset
- Analyze experiment failures:
px experiment <experiment-id> --format raw --no-progress | \
jq '.[] | select(.error != null) | {input: .input, error}'
- Calculate average latency:
px experiment <experiment-id> --format raw --no-progress | \
jq '[.[].latency_ms] | add / length'
Command Reference
px traces
Fetch recent traces from a project.
px traces [directory] [options]
| Option | Description |
|---|---|
[directory] |
Save traces as JSON files to directory |
-n, --limit <number> |
Number of traces (default: 10) |
--last-n-minutes <number> |
Filter by time window |
--since <timestamp> |
Fetch since ISO timestamp |
--format <format> |
pretty, json, or raw |
--include-annotations |
Include span annotations |
px trace
Fetch a specific trace by ID.
px trace <trace-id> [options]
| Option | Description |
|---|---|
--file <path> |
Save to file |
--format <format> |
pretty, json, or raw |
--include-annotations |
Include span annotations |
px datasets
List all datasets.
px datasets [options]
px dataset
Fetch examples from a dataset.
px dataset <dataset-name> [options]
| Option | Description |
|---|---|
--split <name> |
Filter by split (repeatable) |
--version <id> |
Specific dataset version |
--file <path> |
Save to file |
px experiments
List experiments for a dataset.
px experiments --dataset <name> [directory]
| Option | Description |
|---|---|
--dataset <name> |
Dataset name or ID (required) |
[directory] |
Export experiment JSON to directory |
px experiment
Fetch a single experiment with run data.
px experiment <experiment-id> [options]
px prompts
List all prompts.
px prompts [options]
px prompt
Fetch a specific prompt.
px prompt <prompt-name> [options]
Output Formats
pretty(default): Human-readable tree viewjson: Formatted JSON with indentationraw: Compact JSON for piping tojqor other tools
Use --format raw --no-progress when piping output to other commands.
Trace Structure
Traces contain spans with OpenInference semantic attributes:
{
"traceId": "abc123",
"spans": [{
"name": "chat_completion",
"span_kind": "LLM",
"status_code": "OK",
"attributes": {
"llm.model_name": "gpt-4",
"llm.token_count.prompt": 512,
"llm.token_count.completion": 256,
"input.value": "What is the weather?",
"output.value": "The weather is sunny..."
}
}],
"duration": 1250,
"status": "OK"
}
Key span kinds: LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT.
Key attributes for LLM spans:
llm.model_name: Model usedllm.provider: Provider name (e.g., "openai")llm.token_count.prompt/llm.token_count.completion: Token countsllm.input_messages.*: Input messages (indexed, with role and content)llm.output_messages.*: Output messages (indexed, with role and content)input.value/output.value: Raw input/output as textexception.message: Error message if failed
You Might Also Like
Related Skills

verify
Use when you want to validate changes before committing, or when you need to check all React contribution requirements.
facebook
test
Use when you need to run tests for React core. Supports source, www, stable, and experimental channels.
facebook
feature-flags
Use when feature flag tests fail, flags need updating, understanding @gate pragmas, debugging channel-specific test failures, or adding new flags to React.
facebook
extract-errors
Use when adding new error messages to React, or seeing "unknown error code" warnings.
facebook