firecrawl-knowledge-base

Name: firecrawl-knowledge-base
Author: firecrawl

Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic corpora, or LLM-ready markdown organized from web sources.

53stars

0forks

Updated 6/25/2026

Get Skill Source Code

SKILL.md

readonlyread-only

name

firecrawl-knowledge-base

description

Firecrawl API key for hosted Firecrawl requests.

version

"0.1.0"

Firecrawl Knowledge Base

Use this to turn URLs or topics into organized LLM-ready content.

Onboarding Interview

Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.

Firecrawl Collection Plan

Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.

For files, follow the Firecrawl download-style convention:

.firecrawl/
  <hostname>/
    <path>/
      index.md

Parallel Work

If appropriate, use sub-agents or equivalent parallel task runners:

one docs section per researcher
official docs, tutorials, community discussions, and references by source type
source scraping vs chunk generation vs manifest generation

Output Modes

Reference: markdown files, index.md, and sources.json.
RAG: markdown files plus chunk files and manifest.json.
Training: scraped source files plus training-data.jsonl and training-metadata.json.
Docs mirror: complete markdown mirror with a table of contents.

Final Deliverable

# Knowledge Base: [Source]

## Summary
[What was collected and why]

## Output Structure
[Files/directories created]

## Coverage
[Sections, source types, counts]

## Usage Notes
[How to use in RAG, docs, training, or agent context]

## Sources
[URLs collected]

## Rerun Inputs
workflow: firecrawl-knowledge-base
source: [url/topic]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]

Quality Bar

Preserve code examples and formatting.
Remove boilerplate navigation where possible.
Include source URLs in frontmatter or metadata.

Related Skills

summarize

380Kresearch-knowledge

Summarize or transcribe URLs, YouTube/videos, podcasts, articles, transcripts, PDFs, and local files.

steipete

Get

writing-skills

233Kresearch-knowledge

Use when creating new skills, editing existing skills, or verifying skills work before deployment

obra

Get

doc-coauthoring

153Kresearch-knowledge

Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.

anthropics

Get

claude-api

153Kresearch-knowledge

anthropics

Get

mcp-builder

153Kresearch-knowledge

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

anthropics

Get

xlsx

152Kresearch-knowledge

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

anthropics

Get

firecrawl-knowledge-base

Firecrawl Knowledge Base

Onboarding Interview

Firecrawl Collection Plan

Parallel Work

Output Modes

Final Deliverable

Quality Bar

You Might Also Like

Related Skills

summarize

writing-skills

doc-coauthoring

claude-api

mcp-builder

xlsx