phoenix-cli

Phoenix CLI

Debug and analyze LLM applications using the Phoenix CLI (px).

Quick Start

Installation

npm install -g @arizeai/phoenix-cli
# Or run directly with npx
npx @arizeai/phoenix-cli

Configuration

Set environment variables before running commands:

export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key  # if authentication is enabled

CLI flags override environment variables when specified.

Debugging Workflows

Debug a failing LLM application

Fetch recent traces to see what's happening:

px traces --limit 10

Find failed traces:

px traces --limit 50 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'

Get details on a specific trace:

px trace <trace-id>

Look for errors in spans:

px trace <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'

Find performance issues

Get the slowest traces:

px traces --limit 20 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'

Analyze span durations within a trace:

px trace <trace-id> --format raw | jq '.spans | sort_by(-.duration_ms) | .[0:5] | .[] | {name, duration_ms, span_kind}'

Analyze LLM usage

Extract models and token counts:

px traces --limit 50 --format raw --no-progress | \
  jq -r '.[].spans[] | select(.span_kind == "LLM") | {model: .attributes["llm.model_name"], prompt_tokens: .attributes["llm.token_count.prompt"], completion_tokens: .attributes["llm.token_count.completion"]}'

Review experiment results

List datasets:

px datasets

List experiments for a dataset:

px experiments --dataset my-dataset

Analyze experiment failures:

px experiment <experiment-id> --format raw --no-progress | \
  jq '.[] | select(.error != null) | {input: .input, error}'

Calculate average latency:

px experiment <experiment-id> --format raw --no-progress | \
  jq '[.[].latency_ms] | add / length'

Command Reference

px traces

Fetch recent traces from a project.

px traces [directory] [options]

Option	Description
`[directory]`	Save traces as JSON files to directory
`-n, --limit <number>`	Number of traces (default: 10)
`--last-n-minutes <number>`	Filter by time window
`--since <timestamp>`	Fetch since ISO timestamp
`--format <format>`	`pretty`, `json`, or `raw`
`--include-annotations`	Include span annotations

px trace

Fetch a specific trace by ID.

px trace <trace-id> [options]

Option	Description
`--file <path>`	Save to file
`--format <format>`	`pretty`, `json`, or `raw`
`--include-annotations`	Include span annotations

px datasets

List all datasets.

px datasets [options]

px dataset

Fetch examples from a dataset.

px dataset <dataset-name> [options]

Option	Description
`--split <name>`	Filter by split (repeatable)
`--version <id>`	Specific dataset version
`--file <path>`	Save to file

px experiments

List experiments for a dataset.

px experiments --dataset <name> [directory]

Option	Description
`--dataset <name>`	Dataset name or ID (required)
`[directory]`	Export experiment JSON to directory

px experiment

Fetch a single experiment with run data.

px experiment <experiment-id> [options]

px prompts

List all prompts.

px prompts [options]

px prompt

Fetch a specific prompt.

px prompt <prompt-name> [options]

Output Formats

pretty (default): Human-readable tree view
json: Formatted JSON with indentation
raw: Compact JSON for piping to jq or other tools

Use --format raw --no-progress when piping output to other commands.

Trace Structure

Traces contain spans with OpenInference semantic attributes:

{
  "traceId": "abc123",
  "spans": [{
    "name": "chat_completion",
    "span_kind": "LLM",
    "status_code": "OK",
    "attributes": {
      "llm.model_name": "gpt-4",
      "llm.token_count.prompt": 512,
      "llm.token_count.completion": 256,
      "input.value": "What is the weather?",
      "output.value": "The weather is sunny..."
    }
  }],
  "duration": 1250,
  "status": "OK"
}

Key span kinds: LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT.

Key attributes for LLM spans:

llm.model_name: Model used
llm.provider: Provider name (e.g., "openai")
llm.token_count.prompt / llm.token_count.completion: Token counts
llm.input_messages.*: Input messages (indexed, with role and content)
llm.output_messages.*: Output messages (indexed, with role and content)
input.value / output.value: Raw input/output as text
exception.message: Error message if failed

Related Skills

verify

243K

Use when you want to validate changes before committing, or when you need to check all React contribution requirements.

facebook

獲取

test

243K

Use when you need to run tests for React core. Supports source, www, stable, and experimental channels.

facebook

獲取

feature-flags

243K

Use when feature flag tests fail, flags need updating, understanding @gate pragmas, debugging channel-specific test failures, or adding new flags to React.

facebook

獲取

extract-errors

243K

Use when adding new error messages to React, or seeing "unknown error code" warnings.

facebook

獲取

flow

243K

Use when you need to run Flow type checking, or when seeing Flow type errors in React code.

facebook

獲取

flags

243K

Use when you need to check feature flag states, compare channels, or debug why a feature behaves differently across release channels.

facebook

獲取

phoenix-cli

Phoenix CLI

Quick Start

Installation

Configuration

Debugging Workflows

Debug a failing LLM application

Find performance issues

Analyze LLM usage

Review experiment results

Command Reference

px traces

px trace

px datasets

px dataset

px experiments

px experiment

px prompts

px prompt

Output Formats

Trace Structure

You Might Also Like

Related Skills

verify

test

feature-flags

extract-errors

flow

flags