data-engineering

data-engineering

Master data engineering, ETL/ELT, data warehousing, SQL optimization, and analytics. Use when building data pipelines, designing data systems, or working with large datasets.

1Star
0Fork
更新于 1/5/2026
SKILL.md
readonly只读
name
data-engineering
description

Master data engineering, ETL/ELT, data warehousing, SQL optimization, and analytics. Use when building data pipelines, designing data systems, or working with large datasets.

Data Engineering & Analytics Skill

Quick Start - SQL Data Pipeline

-- Create staging table
CREATE TABLE staging_events AS
SELECT 
  event_id,
  user_id,
  event_type,
  event_time,
  properties
FROM raw_events
WHERE event_time >= CURRENT_DATE - INTERVAL '1 day'
AND event_type IN ('click', 'purchase', 'view');

-- Aggregate metrics
SELECT
  DATE(event_time) as date,
  user_id,
  COUNT(*) as event_count,
  COUNT(DISTINCT event_type) as unique_events
FROM staging_events
GROUP BY 1, 2
ORDER BY date DESC, event_count DESC;

Core Technologies

Data Processing

  • Apache Spark
  • Apache Flink
  • Pandas / Polars
  • dbt (data transformation)

Data Warehousing

  • Snowflake
  • BigQuery (GCP)
  • Redshift (AWS)
  • Azure Synapse

ETL/ELT Tools

  • dbt
  • Airflow
  • Talend
  • Informatica

Streaming

  • Apache Kafka
  • AWS Kinesis
  • Apache Pulsar

ML & Analytics

  • scikit-learn
  • TensorFlow
  • Tableau / Power BI

Best Practices

  1. Data Quality - Validation and testing
  2. Documentation - Clear metadata
  3. Performance - Query optimization
  4. Governance - Data security
  5. Monitoring - Pipeline alerts
  6. Scalability - Design for growth
  7. Version Control - Git for code and configs
  8. Testing - Data and pipeline testing

Resources

You Might Also Like

Related Skills

zig-system-calls

zig-system-calls

87Kdev-database

Guides using bun.sys for system calls and file I/O in Zig. Use when implementing file operations instead of std.fs or std.posix.

oven-sh avataroven-sh
获取
bun-file-io

bun-file-io

86Kdev-database

Use this when you are working on file operations like reading, writing, scanning, or deleting files. It summarizes the preferred file APIs and patterns used in this repo. It also notes when to use filesystem helpers for directories.

anomalyco avataranomalyco
获取
vector-index-tuning

vector-index-tuning

26Kdev-database

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

wshobson avatarwshobson
获取

Implement efficient similarity search with vector databases. Use when building semantic search, implementing nearest neighbor queries, or optimizing retrieval performance.

wshobson avatarwshobson
获取

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

wshobson avatarwshobson
获取
event-store-design

event-store-design

26Kdev-database

Design and implement event stores for event-sourced systems. Use when building event sourcing infrastructure, choosing event store technologies, or implementing event persistence patterns.

wshobson avatarwshobson
获取