Changelog¶
All notable changes to DuckGuard will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[3.2.0] — 2026-02-03¶
AI-powered data quality, improved semantic detection, and Apache 2.0 license.
Added¶
- AI Module (
duckguard.ai) — Native LLM integration for data quality: explainer.explain()— Natural language quality summariesrules_generator.suggest_rules()— AI-powered validation rule generationfixer.suggest_fixes()— AI-suggested data quality fixesnatural_language.natural_rules()— Plain English validation rules- Multi-provider support: OpenAI, Anthropic, Ollama (local)
- AI CLI Commands —
duckguard explain,duckguard suggest,duckguard fix - Documentation Site — 34-page MkDocs Material site at xdatahubai.github.io/duckguard
- NOTICE file — Apache 2.0 attribution
- GitHub Templates — Bug report, feature request, and PR templates
- SECURITY.md — Vulnerability reporting policy
- Dependabot — Automated dependency updates
- PEP 561 —
py.typedmarker for static type checkers
Fixed¶
- Semantic type detection — Name patterns now take priority over ambiguous value patterns; latitude/longitude require 4+ decimal places; added currency patterns for tax, shipping, unit_price
- SQL injection prevention — Added
_escape_sql_string()tocolumn.py(matches, isin, _get_failed_rows) andconditional.py(isin_when, matches_when)
Changed¶
- License — Switched from Elastic License 2.0 to Apache License 2.0 (OSI-approved)
- Classifier — Updated PyPI classifier to
License :: OSI Approved :: Apache Software License
[3.1.0] — 2026-01-30¶
Enhanced profiler: wired 4 existing helper modules into AutoProfiler, added duckguard profile CLI command, and made profiling thresholds configurable.
Added¶
Integrated Profiling Pipeline — AutoProfiler now leverages all 4 helper modules:
- PatternMatcher — 25+ built-in patterns (email, SSN, UUID, credit card, etc.) replace the previous 7 hardcoded patterns
- QualityScorer — Every column gets a quality score (0–100) and letter grade (A–F)
- DistributionAnalyzer (deep mode) — distribution type, skewness, kurtosis, normality test
- OutlierDetector (deep mode) — IQR-based outlier count and percentage
Percentile Statistics — median_value, p25_value, p75_value now included in column profiles.
Configurable Thresholds — null_threshold, unique_threshold, enum_max_values, pattern_sample_size, pattern_min_confidence
duckguard profile CLI Command:
duckguard profile data.csv
duckguard profile data.csv --deep
duckguard profile data.csv --format json -o profile.json
Changed¶
ColumnProfile: 10 new optional fields (backward-compatibleNonedefaults)ProfileResult: 2 new fields (overall_quality_score,overall_quality_grade)- AutoProfiler delegates pattern detection to
PatternMatcher
[3.0.0] — 2026-01-27¶
Major feature release — 23 new check types, enterprise-grade security, 100% backward compatible with 2.x.
Added¶
Conditional Expectations (5 check types) — validate only when a SQL condition is met:
not_null_when,unique_when,between_when,isin_when,matches_when
Multi-Column Expectations (8 check types) — cross-column validation:
expect_column_pair_satisfy,expect_columns_unique,expect_multicolumn_sum_to_equal- Column comparison operators:
column_a_gt_b,column_a_gte_b,column_a_lt_b,column_a_lte_b,column_a_eq_b
Query-Based Expectations (6 check types) — custom SQL validation:
expect_query_to_return_no_rows,expect_query_to_return_rowsexpect_query_result_to_equal,expect_query_result_to_be_betweenquery_result_gt,query_result_lt
Distributional Checks (4 check types) — statistical tests:
expect_distribution_normal,expect_distribution_uniformexpect_ks_test,expect_chi_square_test
Enhanced Profiling:
- DistributionAnalyzer — best-fit distribution, skewness, kurtosis, normality testing
- OutlierDetector — Z-score, IQR, Isolation Forest, LOF, consensus detection
- PatternMatcher — 25+ built-in patterns with confidence scoring
- QualityScorer — multi-dimensional quality grading (A–F)
Security:
- Multi-layer SQL injection prevention
- QueryValidator + QuerySecurityValidator + ExpressionParser
- 80+ security tests, OWASP Top 10 compliance
- READ-ONLY enforcement, 30s timeout, 10K row limit
Performance¶
Benchmarks on 1M rows: conditional 2.1s, multi-column 3.8s, query 2.5–7.2s, distributional 8.3s.
Dependencies¶
scipy>=1.11.0(optional) —pip install 'duckguard[statistics]'scikit-learn>=1.3.0(optional) —pip install 'duckguard[profiling]'
[2.3.0] — 2025-01-25¶
Added¶
- Cross-dataset validation:
exists_in(),references(),find_orphans(),matches_values() - Dataset reconciliation:
reconcile()with key matching, value comparison, tolerance - Distribution drift:
detect_drift()using Kolmogorov-Smirnov test - Group-by checks:
group_by().row_count_greater_than(),stats(),validate() - Row count comparison:
row_count_matches()between datasets
[2.2.1] — 2025-01-24¶
Fixed¶
- Resolved lint errors and minor bug fixes
[2.2.0] — 2025-01-23¶
Added¶
- Initial public release with core features:
- Data profiling and quality scoring
- YAML-based validation rules
- Semantic type detection and PII detection
- Data contracts (generate, validate, diff)
- Anomaly detection (z-score, IQR, baseline, KS test)
- CLI interface with Rich output
- pytest integration
- Connectors: CSV, Parquet, JSON, Excel, S3, GCS, Azure, PostgreSQL, MySQL, SQLite, Snowflake, BigQuery, Redshift, SQL Server, Databricks, Oracle, MongoDB, Kafka
- HTML/PDF report generation
- Slack, Teams, Email notifications
- Historical tracking and trend analysis