
crawl-url
Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL.
"Crawl any website and save pages as local markdown files. Use when you need to download documentation, knowledge bases, or web content for offline access or analysis. No code required - just provide a URL."
URL Crawler
Crawls websites using Tavily Crawl API and saves each page as a separate markdown file in a flat directory structure.
Prerequisites
Tavily API Key Required - Get your key at https://tavily.com
Add to ~/.claude/settings.json:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
Restart Claude Code after adding your API key.
When to Use
Use this skill when the user wants to:
- Crawl and extract content from a website
- Download API documentation, framework docs, or knowledge bases
- Save web content locally for offline access or analysis
Usage
Execute the crawl script with a URL and optional instruction:
python scripts/crawl_url.py <URL> [--instruction "guidance text"]
Required Parameters
- URL: The website to crawl (e.g.,
https://docs.stripe.com/api)
Optional Parameters
--instruction, -i: Natural language guidance for the crawler (e.g., "Focus on API endpoints only")--output, -o: Output directory (default:<repo_root>/crawled_context/<domain>)--depth, -d: Max crawl depth (default: 2, range: 1-5)--breadth, -b: Max links per level (default: 50)--limit, -l: Max total pages to crawl (default: 50)
Output
The script creates a flat directory structure at <repo_root>/crawled_context/<domain>/ with one markdown file per crawled page. Filenames are derived from URLs (e.g., docs_stripe_com_api_authentication.md).
Each markdown file includes:
- Frontmatter with source URL and crawl timestamp
- The extracted content in markdown format
Examples
Basic Crawl
python scripts/crawl_url.py https://docs.anthropic.com
Crawls the Anthropic docs with default settings, saves to <repo_root>/crawled_context/docs_anthropic_com/.
With Instruction
python scripts/crawl_url.py https://react.dev --instruction "Focus on API reference pages and hooks documentation"
Uses natural language instruction to guide the crawler toward specific content.
Custom Output Directory
python scripts/crawl_url.py https://docs.stripe.com/api -o ./stripe-api-docs
Saves results to a custom directory.
Adjust Crawl Parameters
python scripts/crawl_url.py https://nextjs.org/docs --depth 3 --breadth 100 --limit 200
Increases crawl depth, breadth, and page limit for more comprehensive coverage.
Important Notes
- API Key Required: Set
TAVILY_API_KEYenvironment variable (loads from.envif available) - Crawl Time: Deeper crawls take longer (depth 3+ may take many minutes)
- Filename Safety: URLs are converted to safe filenames automatically
- Flat Structure: All files saved in
<repo_root>/crawled_context/<domain>/directory regardless of original URL hierarchy - Duplicate Prevention: Files are overwritten if URLs generate identical filenames
You Might Also Like
Related Skills

summarize
Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
openclaw
prompt-lookup
Activates when the user asks about AI prompts, needs prompt templates, wants to search for prompts, or mentions prompts.chat. Use for discovering, retrieving, and improving prompts.

skill-lookup
Activates when the user asks about Agent Skills, wants to find reusable AI capabilities, needs to install skills, or mentions skills for Claude. Use for discovering, retrieving, and installing skills.

seo-review
Perform a focused SEO audit on JavaScript concept pages to maximize search visibility, featured snippet optimization, and ranking potential
leonardomso

