Skip to content

API Reference

Core classes and functions exported by duckguard.

Entry Point

from duckguard import connect

data = connect("data.csv")

connect(source, *, table=None, schema=None, database=None, **options) → Dataset

Connect to any data source. Auto-detects format from file extension or connection string prefix.

Core Classes

Dataset

from duckguard import Dataset
Property / Method Returns Description
row_count int Number of rows
columns list[str] Column names
column_count int Number of columns
source str Source path
name str Dataset name
data.column_name Column Access column by attribute
data["column_name"] Column Access column by bracket
column(name) Column Access column by method
has_column(name) bool Check column exists
head(n=5) list[dict] First n rows
sample(n=10) list[dict] Sample n rows
execute_sql(sql) list[tuple] Run custom SQL
score(weights=None) QualityScore Quality score
freshness FreshnessResult Freshness info
is_fresh(max_age) bool Freshness check
group_by(columns) GroupedDataset Group for validation
reconcile(target, ...) ReconciliationResult Compare datasets
row_count_matches(other, tolerance) ValidationResult Compare row counts

Multi-Column Methods

Method Description
expect_column_pair_satisfy(column_a, column_b, expression, threshold) Column pair relationship
expect_columns_unique(columns, threshold) Composite key uniqueness
expect_multicolumn_sum_to_equal(columns, expected_sum, threshold) Sum constraint

Query Methods

Method Description
expect_query_to_return_no_rows(query, message) Query returns 0 rows
expect_query_to_return_rows(query, message) Query returns ≥1 row
expect_query_result_to_equal(query, expected, tolerance) Scalar result match
expect_query_result_to_be_between(query, min_value, max_value) Scalar in range

Column

Accessed via dataset.column_name or dataset["column_name"].

Property / Method Description
null_count, null_percent Null statistics
unique_count, unique_percent Uniqueness statistics
total_count Non-null count
min, max, mean, median Numeric statistics
between(min, max) Range validation
isin(values) Allowed values check
matches(pattern) Regex pattern check
has_no_duplicates() Uniqueness check
greater_than(value) Minimum check
less_than(value) Maximum check
not_null_when(condition) Conditional null check
unique_when(condition) Conditional uniqueness
between_when(min, max, condition) Conditional range
isin_when(values, condition) Conditional allowed values
matches_when(pattern, condition) Conditional pattern
expect_distribution_normal() Normal distribution test
expect_distribution_uniform() Uniform distribution test
expect_ks_test(distribution) KS test
expect_chi_square_test() Chi-square test

ValidationResult

from duckguard import ValidationResult
Property / Method Description
passed bool — check passed
actual_value Actual value found
expected_value Expected value
message Summary string
details Metadata dict
failed_rows list[FailedRow] — sample failures
total_failures Total failure count
summary() Human-readable summary
to_dataframe() Export to pandas DataFrame
get_failed_values() List of bad values
get_failed_row_indices() List of row indices

QualityScore

score = data.score()
score.overall       # 0-100
score.grade         # A, B, C, D, F
score.completeness  # Dimension score
score.uniqueness    # Dimension score
score.validity      # Dimension score
score.consistency   # Dimension score

Top-Level Functions

Function Description
connect(source) Connect to data source
profile(dataset) Profile dataset
load_rules(path) Load YAML rules
execute_rules(ruleset) Execute rules
generate_rules(dataset) Auto-generate rules
load_contract(path) Load data contract
validate_contract(contract, source) Validate against contract
generate_contract(source) Auto-generate contract
diff_contracts(old, new) Compare contracts
detect_type(dataset, column) Detect semantic type
detect_types_for_dataset(dataset) Detect all types
detect_anomalies(dataset) Anomaly detection
score(dataset) Quality score