Modern Formats¶
Connect to Delta Lake and Apache Iceberg tables.
Delta Lake¶
DuckGuard reads Delta Lake tables through DuckDB's Delta extension.
Quick Start¶
from duckguard import connect
# Local Delta table
data = connect("path/to/delta_table/")
# Delta on S3
data = connect("s3://bucket/delta_table/")
Time Travel¶
DuckDB's Delta support enables reading specific versions:
# Read Delta table at a specific version via DuckDB SQL
data = connect("delta_table/")
rows = data.execute_sql(
"SELECT * FROM delta_scan('{source}', version=5)"
)
Requirements¶
Delta Lake support requires the DuckDB delta extension, which loads automatically on first use.
Validation¶
data = connect("s3://datalake/orders_delta/")
# Same API as any other source
assert data.row_count > 0
assert data.order_id.null_percent == 0
score = data.score()
print(f"Quality: {score.grade}")
Apache Iceberg¶
DuckGuard reads Iceberg tables through DuckDB's Iceberg extension.
Quick Start¶
from duckguard import connect
# Local Iceberg table
data = connect("path/to/iceberg_table/")
# Iceberg on S3
data = connect("s3://bucket/iceberg_table/")
Requirements¶
Iceberg support requires the DuckDB iceberg extension.
Validation¶
data = connect("s3://warehouse/orders_iceberg/")
assert data.row_count > 0
result = data.amount.between(0, 100000)
assert result.passed
Comparison¶
| Feature | Delta Lake | Iceberg |
|---|---|---|
| ACID transactions | ✅ | ✅ |
| Time travel | ✅ | ✅ |
| Schema evolution | ✅ | ✅ |
| Partition pruning | ✅ | ✅ |
| DuckDB extension | delta |
iceberg |
| Cloud support | S3, GCS, Azure | S3, GCS, Azure |
Practical Patterns¶
Quality gate for lakehouse¶
from duckguard import connect, load_rules, execute_rules
data = connect("s3://datalake/orders/")
rules = load_rules("duckguard.yaml")
result = execute_rules(rules, dataset=data)
if not result.passed:
raise RuntimeError(f"Quality gate failed: {result.failed_count} checks")
Compare Delta versions¶
current = connect("s3://lake/orders/")
backup = connect("s3://lake/orders_backup/")
result = current.row_count_matches(backup, tolerance=100)
assert result.passed