Skip to content

DuckGuard

Data Quality That Just Works

3 lines of code · Any data source · 10x faster

One API for CSV, Parquet, Snowflake, Databricks, BigQuery, and 15+ sources. No boilerplate.

pip install duckguard
from duckguard import connect

orders = connect("s3://warehouse/orders.parquet")       # or Snowflake, Databricks, CSV...
assert orders.customer_id.is_not_null()                 # Just like pytest!
assert orders.total_amount.between(0, 10000)
assert orders.status.isin(["pending", "shipped", "delivered"])

quality = orders.score()
print(f"Grade: {quality.grade}")  # A, B, C, D, or F

Same 3 lines whether your data lives in S3, Snowflake, Databricks, or a local CSV.


Why DuckGuard?

Every data quality tool asks you to write 50+ lines of boilerplate before you validate a single column. DuckGuard gives you a pytest-like API powered by DuckDB's speed.

Feature DuckGuard Great Expectations Soda Core Pandera
Lines of code to start 3 50+ 10+ 5+
Time for 1GB CSV ~4 sec ~45 sec ~20 sec ~15 sec
Memory for 1GB CSV ~200 MB ~4 GB ~1.5 GB ~1.5 GB
Learning curve Minutes Days Hours Minutes
Pytest-like API
DuckDB-powered Partial
PII detection Built-in
Anomaly detection 7 methods Partial
Data contracts

  • :material-rocket-launch: Getting Started

    Install and validate your first dataset in 30 seconds

  • :material-check-circle: Validation Guide

    Column checks, cross-dataset validation, conditional rules

  • :material-database: Connectors

    CSV, Parquet, S3, PostgreSQL, Snowflake, BigQuery, and more

  • :material-snowflake: Snowflake · :material-fire: Databricks · :material-microsoft: Fabric · :material-notebook: Kaggle

    Platform-specific guides for your data stack

  • :material-puzzle: Integrations

    pytest, dbt, Airflow, GitHub Actions, Slack, Teams

  • :material-console: CLI Reference

    Command-line tools for validation, profiling, and reports

  • :material-api: API Reference

    Complete Python API documentation


What's New in 3.0

DuckGuard 3.0 adds 23 new check types while maintaining 100% backward compatibility:

  • Conditional checks — validate only when conditions are met
  • Multi-column checks — composite keys, column pair relationships
  • Query-based checks — custom SQL with built-in security
  • Distributional checks — KS test, chi-square, normality testing
  • 7 anomaly detection methods — Z-score, IQR, ML baselines, seasonal
  • Enterprise security — multi-layer SQL injection prevention

Read the full changelog →