Skip to content

Multi-Column Checks

Validate relationships between columns — date ranges, composite keys, cross-column arithmetic.

Quick Start

from duckguard import connect

data = connect("orders.csv")

# Date range: end must be after start
result = data.expect_column_pair_satisfy(
    column_a="end_date",
    column_b="start_date",
    expression="end_date >= start_date"
)
assert result.passed

# Composite primary key
result = data.expect_columns_unique(columns=["user_id", "session_id"])
assert result.passed

# Components must sum to 100%
result = data.expect_multicolumn_sum_to_equal(
    columns=["q1_pct", "q2_pct", "q3_pct", "q4_pct"],
    expected_sum=100.0
)
assert result.passed

Column Pair Expressions

expect_column_pair_satisfy() evaluates a SQL expression across two columns.

# Arithmetic constraint
result = data.expect_column_pair_satisfy(
    column_a="total",
    column_b="subtotal",
    expression="total = subtotal * 1.1"  # 10% markup
)

# With threshold — allow up to 5% violations
result = data.expect_column_pair_satisfy(
    column_a="end_date",
    column_b="start_date",
    expression="end_date >= start_date",
    threshold=0.95
)

Supported operators: >, <, >=, <=, =, !=, +, -, *, /, AND, OR

The expression parser validates syntax, checks for SQL injection, and scores complexity (max 50 by default).

Composite Uniqueness

Validate that a combination of columns forms a unique key:

# Two-column composite key
result = data.expect_columns_unique(
    columns=["user_id", "session_id"]
)

# Three-column composite key with threshold
result = data.expect_columns_unique(
    columns=["year", "month", "day"],
    threshold=0.99  # Allow 1% duplicates
)

# Inspect results
print(result.details["duplicate_combinations"])
print(result.details["uniqueness_rate"])

Note

At least 2 columns are required. For single-column uniqueness, use column.has_no_duplicates().

Multi-Column Sum

Check that columns sum to an expected value per row:

# Budget allocation check
result = data.expect_multicolumn_sum_to_equal(
    columns=["marketing", "sales", "r_and_d"],
    expected_sum=1000000,
    threshold=0.01  # Allow $0.01 rounding error
)

YAML Rules

All multi-column checks work in YAML rule files:

checks:
  _multicolumn:
    - column_pair_satisfy:
        column_a: end_date
        column_b: start_date
        expression: "end_date >= start_date"

    - multicolumn_unique:
        columns: [user_id, session_id]

    - multicolumn_sum:
        columns: [q1, q2, q3, q4]
        expected_sum: 100.0

Result Details

Every result includes detailed metadata:

result = data.expect_column_pair_satisfy(
    column_a="end_date",
    column_b="start_date",
    expression="end_date >= start_date"
)

print(result.details["violations"])       # Count of failing rows
print(result.details["total_rows"])       # Total rows checked
print(result.details["violation_rate"])   # Fraction that failed