
firecrawl-knowledge-base
Build a knowledge base from web content with Firecrawl. Use for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic corpora, or LLM-ready markdown organized from web sources.
Firecrawl API key for hosted Firecrawl requests.
Firecrawl Knowledge Base
Use this to turn URLs or topics into organized LLM-ready content.
Onboarding Interview
Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.
Firecrawl Collection Plan
Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.
For files, follow the Firecrawl download-style convention:
.firecrawl/
<hostname>/
<path>/
index.md
Parallel Work
If appropriate, use sub-agents or equivalent parallel task runners:
- one docs section per researcher
- official docs, tutorials, community discussions, and references by source type
- source scraping vs chunk generation vs manifest generation
Output Modes
- Reference: markdown files,
index.md, andsources.json. - RAG: markdown files plus chunk files and
manifest.json. - Training: scraped source files plus
training-data.jsonlandtraining-metadata.json. - Docs mirror: complete markdown mirror with a table of contents.
Final Deliverable
# Knowledge Base: [Source]
## Summary
[What was collected and why]
## Output Structure
[Files/directories created]
## Coverage
[Sections, source types, counts]
## Usage Notes
[How to use in RAG, docs, training, or agent context]
## Sources
[URLs collected]
## Rerun Inputs
workflow: firecrawl-knowledge-base
source: [url/topic]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]
Quality Bar
- Preserve code examples and formatting.
- Remove boilerplate navigation where possible.
- Include source URLs in frontmatter or metadata.
You Might Also Like
Related Skills

summarize
Summarize or transcribe URLs, YouTube/videos, podcasts, articles, transcripts, PDFs, and local files.
steipete
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
obra
doc-coauthoring
Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.
anthropics
mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
anthropics
xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like \"the xlsx in my downloads\") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
anthropics