phoenix-cli

phoenix-cli

熱門

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.

8.4K星標
696分支
更新於 1/27/2026
SKILL.md
readonlyread-only
name
phoenix-cli
description

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.

version
"1.0"

Phoenix CLI

Debug and analyze LLM applications using the Phoenix CLI (px).

Quick Start

Installation

npm install -g @arizeai/phoenix-cli
# Or run directly with npx
npx @arizeai/phoenix-cli

Configuration

Set environment variables before running commands:

export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key  # if authentication is enabled

CLI flags override environment variables when specified.

Debugging Workflows

Debug a failing LLM application

  1. Fetch recent traces to see what's happening:
px traces --limit 10
  1. Find failed traces:
px traces --limit 50 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
  1. Get details on a specific trace:
px trace <trace-id>
  1. Look for errors in spans:
px trace <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'

Find performance issues

  1. Get the slowest traces:
px traces --limit 20 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
  1. Analyze span durations within a trace:
px trace <trace-id> --format raw | jq '.spans | sort_by(-.duration_ms) | .[0:5] | .[] | {name, duration_ms, span_kind}'

Analyze LLM usage

Extract models and token counts:

px traces --limit 50 --format raw --no-progress | \
  jq -r '.[].spans[] | select(.span_kind == "LLM") | {model: .attributes["llm.model_name"], prompt_tokens: .attributes["llm.token_count.prompt"], completion_tokens: .attributes["llm.token_count.completion"]}'

Review experiment results

  1. List datasets:
px datasets
  1. List experiments for a dataset:
px experiments --dataset my-dataset
  1. Analyze experiment failures:
px experiment <experiment-id> --format raw --no-progress | \
  jq '.[] | select(.error != null) | {input: .input, error}'
  1. Calculate average latency:
px experiment <experiment-id> --format raw --no-progress | \
  jq '[.[].latency_ms] | add / length'

Command Reference

px traces

Fetch recent traces from a project.

px traces [directory] [options]
Option Description
[directory] Save traces as JSON files to directory
-n, --limit <number> Number of traces (default: 10)
--last-n-minutes <number> Filter by time window
--since <timestamp> Fetch since ISO timestamp
--format <format> pretty, json, or raw
--include-annotations Include span annotations

px trace

Fetch a specific trace by ID.

px trace <trace-id> [options]
Option Description
--file <path> Save to file
--format <format> pretty, json, or raw
--include-annotations Include span annotations

px datasets

List all datasets.

px datasets [options]

px dataset

Fetch examples from a dataset.

px dataset <dataset-name> [options]
Option Description
--split <name> Filter by split (repeatable)
--version <id> Specific dataset version
--file <path> Save to file

px experiments

List experiments for a dataset.

px experiments --dataset <name> [directory]
Option Description
--dataset <name> Dataset name or ID (required)
[directory] Export experiment JSON to directory

px experiment

Fetch a single experiment with run data.

px experiment <experiment-id> [options]

px prompts

List all prompts.

px prompts [options]

px prompt

Fetch a specific prompt.

px prompt <prompt-name> [options]

Output Formats

  • pretty (default): Human-readable tree view
  • json: Formatted JSON with indentation
  • raw: Compact JSON for piping to jq or other tools

Use --format raw --no-progress when piping output to other commands.

Trace Structure

Traces contain spans with OpenInference semantic attributes:

{
  "traceId": "abc123",
  "spans": [{
    "name": "chat_completion",
    "span_kind": "LLM",
    "status_code": "OK",
    "attributes": {
      "llm.model_name": "gpt-4",
      "llm.token_count.prompt": 512,
      "llm.token_count.completion": 256,
      "input.value": "What is the weather?",
      "output.value": "The weather is sunny..."
    }
  }],
  "duration": 1250,
  "status": "OK"
}

Key span kinds: LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT.

Key attributes for LLM spans:

  • llm.model_name: Model used
  • llm.provider: Provider name (e.g., "openai")
  • llm.token_count.prompt / llm.token_count.completion: Token counts
  • llm.input_messages.*: Input messages (indexed, with role and content)
  • llm.output_messages.*: Output messages (indexed, with role and content)
  • input.value / output.value: Raw input/output as text
  • exception.message: Error message if failed

You Might Also Like

Related Skills

verify

verify

243K

Use when you want to validate changes before committing, or when you need to check all React contribution requirements.

facebook avatarfacebook
獲取
test

test

243K

Use when you need to run tests for React core. Supports source, www, stable, and experimental channels.

facebook avatarfacebook
獲取

Use when feature flag tests fail, flags need updating, understanding @gate pragmas, debugging channel-specific test failures, or adding new flags to React.

facebook avatarfacebook
獲取

Use when adding new error messages to React, or seeing "unknown error code" warnings.

facebook avatarfacebook
獲取
flow

flow

243K

Use when you need to run Flow type checking, or when seeing Flow type errors in React code.

facebook avatarfacebook
獲取
flags

flags

243K

Use when you need to check feature flag states, compare channels, or debug why a feature behaves differently across release channels.

facebook avatarfacebook
獲取