Skip to content

Changelog

All notable changes to DuckGuard will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.


[3.2.0] — 2026-02-03

AI-powered data quality, improved semantic detection, and Apache 2.0 license.

Added

  • AI Module (duckguard.ai) — Native LLM integration for data quality:
  • explainer.explain() — Natural language quality summaries
  • rules_generator.suggest_rules() — AI-powered validation rule generation
  • fixer.suggest_fixes() — AI-suggested data quality fixes
  • natural_language.natural_rules() — Plain English validation rules
  • Multi-provider support: OpenAI, Anthropic, Ollama (local)
  • AI CLI Commandsduckguard explain, duckguard suggest, duckguard fix
  • Documentation Site — 34-page MkDocs Material site at xdatahubai.github.io/duckguard
  • NOTICE file — Apache 2.0 attribution
  • GitHub Templates — Bug report, feature request, and PR templates
  • SECURITY.md — Vulnerability reporting policy
  • Dependabot — Automated dependency updates
  • PEP 561py.typed marker for static type checkers

Fixed

  • Semantic type detection — Name patterns now take priority over ambiguous value patterns; latitude/longitude require 4+ decimal places; added currency patterns for tax, shipping, unit_price
  • SQL injection prevention — Added _escape_sql_string() to column.py (matches, isin, _get_failed_rows) and conditional.py (isin_when, matches_when)

Changed

  • License — Switched from Elastic License 2.0 to Apache License 2.0 (OSI-approved)
  • Classifier — Updated PyPI classifier to License :: OSI Approved :: Apache Software License

[3.1.0] — 2026-01-30

Enhanced profiler: wired 4 existing helper modules into AutoProfiler, added duckguard profile CLI command, and made profiling thresholds configurable.

Added

Integrated Profiling Pipeline — AutoProfiler now leverages all 4 helper modules:

  • PatternMatcher — 25+ built-in patterns (email, SSN, UUID, credit card, etc.) replace the previous 7 hardcoded patterns
  • QualityScorer — Every column gets a quality score (0–100) and letter grade (A–F)
  • DistributionAnalyzer (deep mode) — distribution type, skewness, kurtosis, normality test
  • OutlierDetector (deep mode) — IQR-based outlier count and percentage

Percentile Statisticsmedian_value, p25_value, p75_value now included in column profiles.

Configurable Thresholdsnull_threshold, unique_threshold, enum_max_values, pattern_sample_size, pattern_min_confidence

duckguard profile CLI Command:

duckguard profile data.csv
duckguard profile data.csv --deep
duckguard profile data.csv --format json -o profile.json

Changed

  • ColumnProfile: 10 new optional fields (backward-compatible None defaults)
  • ProfileResult: 2 new fields (overall_quality_score, overall_quality_grade)
  • AutoProfiler delegates pattern detection to PatternMatcher

[3.0.0] — 2026-01-27

Major feature release — 23 new check types, enterprise-grade security, 100% backward compatible with 2.x.

Added

Conditional Expectations (5 check types) — validate only when a SQL condition is met:

  • not_null_when, unique_when, between_when, isin_when, matches_when

Multi-Column Expectations (8 check types) — cross-column validation:

  • expect_column_pair_satisfy, expect_columns_unique, expect_multicolumn_sum_to_equal
  • Column comparison operators: column_a_gt_b, column_a_gte_b, column_a_lt_b, column_a_lte_b, column_a_eq_b

Query-Based Expectations (6 check types) — custom SQL validation:

  • expect_query_to_return_no_rows, expect_query_to_return_rows
  • expect_query_result_to_equal, expect_query_result_to_be_between
  • query_result_gt, query_result_lt

Distributional Checks (4 check types) — statistical tests:

  • expect_distribution_normal, expect_distribution_uniform
  • expect_ks_test, expect_chi_square_test

Enhanced Profiling:

  • DistributionAnalyzer — best-fit distribution, skewness, kurtosis, normality testing
  • OutlierDetector — Z-score, IQR, Isolation Forest, LOF, consensus detection
  • PatternMatcher — 25+ built-in patterns with confidence scoring
  • QualityScorer — multi-dimensional quality grading (A–F)

Security:

  • Multi-layer SQL injection prevention
  • QueryValidator + QuerySecurityValidator + ExpressionParser
  • 80+ security tests, OWASP Top 10 compliance
  • READ-ONLY enforcement, 30s timeout, 10K row limit

Performance

Benchmarks on 1M rows: conditional 2.1s, multi-column 3.8s, query 2.5–7.2s, distributional 8.3s.

Dependencies

  • scipy>=1.11.0 (optional) — pip install 'duckguard[statistics]'
  • scikit-learn>=1.3.0 (optional) — pip install 'duckguard[profiling]'

[2.3.0] — 2025-01-25

Added

  • Cross-dataset validation: exists_in(), references(), find_orphans(), matches_values()
  • Dataset reconciliation: reconcile() with key matching, value comparison, tolerance
  • Distribution drift: detect_drift() using Kolmogorov-Smirnov test
  • Group-by checks: group_by().row_count_greater_than(), stats(), validate()
  • Row count comparison: row_count_matches() between datasets

[2.2.1] — 2025-01-24

Fixed

  • Resolved lint errors and minor bug fixes

[2.2.0] — 2025-01-23

Added

  • Initial public release with core features:
  • Data profiling and quality scoring
  • YAML-based validation rules
  • Semantic type detection and PII detection
  • Data contracts (generate, validate, diff)
  • Anomaly detection (z-score, IQR, baseline, KS test)
  • CLI interface with Rich output
  • pytest integration
  • Connectors: CSV, Parquet, JSON, Excel, S3, GCS, Azure, PostgreSQL, MySQL, SQLite, Snowflake, BigQuery, Redshift, SQL Server, Databricks, Oracle, MongoDB, Kafka
  • HTML/PDF report generation
  • Slack, Teams, Email notifications
  • Historical tracking and trend analysis