fiftyone-dataset-export

fiftyone-dataset-export

Exports FiftyOne datasets to standard formats (COCO, YOLO, VOC, CVAT, CSV, etc.). Use when converting datasets, exporting for training, creating archives, or sharing data in specific formats.

3星標
0分支
更新於 1/22/2026
SKILL.md
readonlyread-only
name
fiftyone-dataset-export
description

Exports FiftyOne datasets to standard formats (COCO, YOLO, VOC, CVAT, CSV, etc.). Use when converting datasets, exporting for training, creating archives, or sharing data in specific formats.

Export FiftyOne Datasets

Key Directives

ALWAYS follow these rules:

1. Load and understand the dataset first

set_context(dataset_name="my-dataset")
dataset_summary(name="my-dataset")

2. Confirm export settings with user

Before exporting, present:

  • Dataset name and sample count
  • Available label fields and their types
  • Proposed export format
  • Export directory path

3. Match format to label types

Different formats support different label types:

Format Label Types
COCO detections, segmentations, keypoints
YOLO (v4, v5) detections
VOC detections
CVAT classifications, detections, polylines, keypoints
CSV all (custom fields)
Image Classification Directory Tree classification

4. Use absolute paths

Always use absolute paths for export directories:

params={
    "export_dir": {"absolute_path": "/path/to/export"}
}

5. Warn about overwriting

Check if export directory exists before exporting. If it does, ask user whether to overwrite.

Complete Workflow

Step 1: Load Dataset and Understand Content

# Set context
set_context(dataset_name="my-dataset")

# Get dataset summary to see fields and label types
dataset_summary(name="my-dataset")

Identify:

  • Total sample count
  • Media type (images, videos, point clouds)
  • Available label fields and their types (Detections, Classifications, etc.)

Step 2: Get Export Operator Schema

# Discover export parameters dynamically
get_operator_schema(operator_uri="@voxel51/io/export_samples")

Step 3: Present Export Options to User

Before exporting, confirm with the user:

Dataset: my-dataset (5,000 samples)
Media type: image

Available label fields:
  - ground_truth (Detections)
  - predictions (Detections)

Export options:
  - Format: COCO (recommended for detections)
  - Export directory: /path/to/export
  - Label field: ground_truth

Proceed with export?

Step 4: Execute Export

Export media and labels:

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "export_dir": {"absolute_path": "/path/to/export"},
        "label_field": "ground_truth"
    }
)

Export labels only (no media copy):

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "LABELS_ONLY",
        "dataset_type": "COCO",
        "labels_path": {"absolute_path": "/path/to/labels.json"},
        "label_field": "ground_truth"
    }
)

Export media only (no labels):

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_ONLY",
        "export_dir": {"absolute_path": "/path/to/media"}
    }
)

Step 5: Verify Export

After export, verify the output:

ls -la /path/to/export

Report exported file count and structure to user.

Supported Export Formats

Detection Formats

Format dataset_type Value Label Types Labels-Only
COCO "COCO" detections, segmentations, keypoints Yes
YOLOv4 "YOLOv4" detections Yes
YOLOv5 "YOLOv5" detections No
VOC "VOC" detections Yes
KITTI "KITTI" detections Yes
CVAT Image "CVAT Image" classifications, detections, polylines, keypoints Yes
CVAT Video "CVAT Video" frame labels Yes
TF Object Detection "TF Object Detection" detections No

Classification Formats

Format dataset_type Value Media Type Labels-Only
Image Classification Directory Tree "Image Classification Directory Tree" image No
Video Classification Directory Tree "Video Classification Directory Tree" video No
TF Image Classification "TF Image Classification" image No

Segmentation Formats

Format dataset_type Value Label Types Labels-Only
Image Segmentation "Image Segmentation" segmentation Yes

General Formats

Format dataset_type Value Best For Labels-Only
CSV "CSV" Custom fields, spreadsheet analysis Yes
GeoJSON "GeoJSON" Geolocation data Yes
FiftyOne Dataset "FiftyOne Dataset" Full dataset backup with all metadata Yes

Note: Formats with "Labels-Only: No" require export_type: "MEDIA_AND_LABELS" (cannot export labels without media).

Export Type Options

export_type Value Description
"MEDIA_AND_LABELS" Export both media files and labels
"LABELS_ONLY" Export labels only (use labels_path instead of export_dir)
"MEDIA_ONLY" Export media files only (no labels)
"FILEPATHS_ONLY" Export CSV with filepaths only

Target Options

Export from different sources:

target Value Description
"DATASET" Export entire dataset (default)
"CURRENT_VIEW" Export current filtered view
"SELECTED_SAMPLES" Export selected samples only

Common Use Cases

Use Case 1: Export to COCO Format

For training with frameworks that use COCO format:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "export_dir": {"absolute_path": "/path/to/coco_export"},
        "label_field": "ground_truth"
    }
)

Output structure:

coco_export/
├── data/
│   ├── image1.jpg
│   └── image2.jpg
└── labels.json

Use Case 2: Export to YOLO Format

For training YOLOv5/v8 models:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "YOLOv5",
        "export_dir": {"absolute_path": "/path/to/yolo_export"},
        "label_field": "ground_truth"
    }
)

Output structure:

yolo_export/
├── images/
│   └── train/
│       └── image1.jpg
├── labels/
│   └── train/
│       └── image1.txt
└── dataset.yaml

Use Case 3: Export Filtered View

Export only a subset of samples:

# Set context
set_context(dataset_name="my-dataset")

# Filter samples in the App
set_view(tags=["validated"])

# Export the filtered view
execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "target": "CURRENT_VIEW",
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "export_dir": {"absolute_path": "/path/to/validated_export"},
        "label_field": "ground_truth"
    }
)

Use Case 4: Export Labels Only

When media should stay in place:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "LABELS_ONLY",
        "dataset_type": "COCO",
        "labels_path": {"absolute_path": "/path/to/annotations.json"},
        "label_field": "ground_truth"
    }
)

Use Case 5: Export for Classification Training

For image classification datasets:

set_context(dataset_name="my-classification-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "Image Classification Directory Tree",
        "export_dir": {"absolute_path": "/path/to/classification_export"},
        "label_field": "ground_truth"
    }
)

Output structure:

classification_export/
├── cat/
│   ├── cat1.jpg
│   └── cat2.jpg
└── dog/
    ├── dog1.jpg
    └── dog2.jpg

Use Case 6: Export to CSV

For analysis in spreadsheets:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "LABELS_ONLY",
        "dataset_type": "CSV",
        "labels_path": {"absolute_path": "/path/to/data.csv"},
        "csv_fields": ["filepath", "ground_truth.detections.label"]
    }
)

Use Case 7: Export FiftyOne Dataset (Full Backup)

For complete dataset backup including all metadata:

set_context(dataset_name="my-dataset")

execute_operator(
    operator_uri="@voxel51/io/export_samples",
    params={
        "export_type": "MEDIA_AND_LABELS",
        "dataset_type": "FiftyOne Dataset",
        "export_dir": {"absolute_path": "/path/to/backup"}
    }
)

Output structure:

backup/
├── metadata.json
├── samples.json
├── data/
│   └── ...
├── annotations/
├── brain/
└── evaluations/

Python SDK Alternative

For more control, guide users to use the Python SDK directly:

import fiftyone as fo
import fiftyone.types as fot

# Load dataset
dataset = fo.load_dataset("my-dataset")

# Export to COCO format
dataset.export(
    export_dir="/path/to/export",
    dataset_type=fot.COCODetectionDataset,
    label_field="ground_truth",
)

# Export labels only
dataset.export(
    labels_path="/path/to/labels.json",
    dataset_type=fot.COCODetectionDataset,
    label_field="ground_truth",
)

# Export a filtered view
view = dataset.match_tags("validated")
view.export(
    export_dir="/path/to/validated",
    dataset_type=fot.YOLOv5Dataset,
    label_field="ground_truth",
)

Python SDK dataset types:

  • fot.COCODetectionDataset - COCO format
  • fot.YOLOv4Dataset - YOLOv4 format
  • fot.YOLOv5Dataset - YOLOv5 format
  • fot.VOCDetectionDataset - Pascal VOC format
  • fot.KITTIDetectionDataset - KITTI format
  • fot.CVATImageDataset - CVAT image format
  • fot.CVATVideoDataset - CVAT video format
  • fot.TFObjectDetectionDataset - TensorFlow Object Detection format
  • fot.ImageClassificationDirectoryTree - Classification folder structure
  • fot.VideoClassificationDirectoryTree - Video classification folders
  • fot.TFImageClassificationDataset - TensorFlow classification format
  • fot.ImageSegmentationDirectory - Segmentation masks
  • fot.CSVDataset - CSV format
  • fot.GeoJSONDataset - GeoJSON format
  • fot.FiftyOneDataset - Native FiftyOne format

Troubleshooting

Error: "Export directory already exists"

  • Add "overwrite": true to params
  • Or specify a different export directory

Error: "Label field not found"

  • Use dataset_summary() to see available label fields
  • Verify the field name spelling

Error: "Unsupported label type for format"

  • Check that the export format supports your label type
  • COCO: detections, segmentations, keypoints
  • YOLO: detections only
  • Classification formats: classification labels only

Error: "Permission denied"

  • Verify write permissions for the export directory
  • Check parent directory exists

Export is slow

  • Large datasets take time; consider exporting a view first
  • Export to local disk rather than network drives
  • For labels only, use LABELS_ONLY export type

Best Practices

  1. Understand your data first - Use dataset_summary() to know what fields and label types exist
  2. Match format to purpose - Use COCO/YOLO for training, CSV for analysis, FiftyOne Dataset for backups
  3. Confirm with user - Present export settings before executing
  4. Export filtered views - Only export what's needed rather than entire datasets
  5. Verify after export - Check exported file counts match expectations
  6. Use labels_path for LABELS_ONLY - When exporting labels only, use labels_path not export_dir

Resources

You Might Also Like

Related Skills

zig-system-calls

zig-system-calls

87Kdev-database

Guides using bun.sys for system calls and file I/O in Zig. Use when implementing file operations instead of std.fs or std.posix.

oven-sh avataroven-sh
獲取
bun-file-io

bun-file-io

86Kdev-database

Use this when you are working on file operations like reading, writing, scanning, or deleting files. It summarizes the preferred file APIs and patterns used in this repo. It also notes when to use filesystem helpers for directories.

anomalyco avataranomalyco
獲取
vector-index-tuning

vector-index-tuning

26Kdev-database

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

wshobson avatarwshobson
獲取

Implement efficient similarity search with vector databases. Use when building semantic search, implementing nearest neighbor queries, or optimizing retrieval performance.

wshobson avatarwshobson
獲取

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

wshobson avatarwshobson
獲取
event-store-design

event-store-design

26Kdev-database

Design and implement event stores for event-sourced systems. Use when building event sourcing infrastructure, choosing event store technologies, or implementing event persistence patterns.

wshobson avatarwshobson
獲取