Quickstart¶
Get from zero to validated in 30 seconds.
Connect to Your Data¶
from duckguard import connect
# Files
orders = connect("orders.csv") # CSV
orders = connect("orders.parquet") # Parquet
orders = connect("orders.json") # JSON
# Cloud
orders = connect("s3://bucket/orders.parquet")
# Databases
orders = connect("postgres://localhost/db", table="orders")
# pandas DataFrame
import pandas as pd
orders = connect(pd.read_csv("orders.csv"))
DuckGuard connects to anything — files, cloud storage, databases, DataFrames. See all connectors →
Validate Columns¶
Validations work like pytest assertions — readable, composable, and they tell you exactly what failed.
# Null & uniqueness
assert orders.order_id.is_not_null()
assert orders.order_id.is_unique()
# Range checks
assert orders.total_amount.between(0, 10000)
assert orders.quantity.greater_than(0)
# Patterns & enums
assert orders.email.matches(r'^[\w.+-]+@[\w-]+\.[\w.]+$')
assert orders.status.isin(["pending", "shipped", "delivered"])
Debug Failures¶
When a check fails, you get row-level details:
result = orders.quantity.between(1, 100)
if not result.passed:
print(result.summary())
# Column 'quantity' has 3 values outside [1, 100]
#
# Sample of 3 failing rows (total: 3):
# Row 5: quantity=500 - Value outside range [1, 100]
# Row 23: quantity=-2 - Value outside range [1, 100]
# Row 29: quantity=0 - Value outside range [1, 100]
Score Your Data¶
Get an instant quality grade:
score = orders.score()
print(score.grade) # A, B, C, D, or F
print(score.completeness) # % non-null
print(score.uniqueness) # % unique keys
print(score.validity) # % passing checks
print(score.consistency) # % consistent format
Use YAML Rules¶
Define checks declaratively:
# duckguard.yaml
name: orders_validation
checks:
order_id:
- not_null
- unique
quantity:
- between: [1, 1000]
status:
- allowed_values: [pending, shipped, delivered]
from duckguard import load_rules, execute_rules
rules = load_rules("duckguard.yaml")
result = execute_rules(rules, "orders.csv")
print(f"Passed: {result.passed_count}/{result.total_checks}")
Or auto-discover rules from your data:
Run from CLI¶
# Validate
duckguard check orders.csv --config duckguard.yaml
# Profile
duckguard profile orders.csv
# Generate report
duckguard report orders.csv --output report.html
What's Next?¶
- Column Validation — All validation methods
- Cross-Dataset Checks — FK validation, reconciliation
- Anomaly Detection — 7 detection methods
- Data Contracts — Schema enforcement
- Integrations — pytest, dbt, Airflow, CI/CD