Data Governance

Unity Catalog and Data Quality for Finance: A Governance Rollout

Mid‑market financial institutions can meet audit expectations without ballooning headcount by pairing Databricks Unity Catalog with data quality, policy‑as‑code, and clear ownership. This guide lays out a pragmatic 90‑day rollout—covering governed access, lineage, scorecards, SoD, and agentic runbooks—to reduce audit friction, raise trust, and speed delivery.

â€¢ 7 min read

Unity Catalog and Data Quality for Finance: A Governance Rollout

1. Problem / Context

Financial institutions in the $50M–$300M range face a familiar squeeze: regulators expect bank‑grade data governance, while teams and budgets look more like a lean mid‑market operator. Payments, loan servicing, customer analytics, and regulatory reporting often run on fragmented data with inconsistent definitions. Lineage is scattered across spreadsheets and tribal knowledge. Access is provisioned manually, and masking rules differ by team. When auditors ask, “Who used what data to produce which report?” answers take weeks—and confidence suffers.

Databricks Unity Catalog promises a unifying layer for access control, lineage, and auditing across data and AI assets. But value only lands with a disciplined rollout that pairs Unity Catalog with data quality, policy‑as‑code, and clear ownership. This article offers a pragmatic, phased governance plan that mid‑market finance leaders can execute quickly—without boiling the ocean.

2. Key Definitions & Concepts

Unity Catalog: The Databricks governance layer for data, ML, and AI assets. It centralizes permissions, enables fine‑grained access (row/column), and captures lineage from notebooks, jobs, and dashboards.
Data domains and ownership: Groupings like Payments, Loans, and Customers, each with a named steward responsible for definitions, quality, and access policy.
Masking and row/column policies: Rules that redact or restrict sensitive attributes (e.g., PAN, SSN, account numbers) based on role, purpose, and jurisdiction.
Delta expectations and quality scorecards: Data quality checks attached to tables/pipelines (null rate, referential integrity, allowed value sets), rolled up into domain‑level scorecards for operational visibility.
Policy‑as‑code: Storing and versioning access policies, quality rules, and controls in code (e.g., Terraform/Files + CI/CD) to ensure consistency, peer review, and auditable changes.
Segregation of Duties (SoD): Preventing conflicts—e.g., the engineer who grants entitlements cannot certify their own access.
Agentic runbooks: Automated playbooks that detect incidents (failed quality checks, policy drift), triage the root cause, and orchestrate remediation and approvals.

3. Why This Matters for Mid‑Market Regulated Firms

Audit pressure: Examiners and internal audit expect reproducible lineage, evidence of approvals, and consistent access enforcement across reports that feed SOX, GLBA, PCI‑adjacent, and board metrics.
Cost and speed: Manual provisioning and ad‑hoc data fixes slow analytics and increase risk. Unified controls and quality reduce firefighting and duplicated effort.
Talent constraints: Lean data teams need automation—policy‑as‑code, reusable quality checks, and agentic runbooks—to keep up with demand without adding headcount.
Business trust: Branch operations, lending, finance, and risk leaders need reliable metrics. Quality scorecards and lineage build confidence in decisions tied to margin and risk.

4. Practical Implementation Steps / Roadmap

Follow a 90‑day, three‑phase rollout that makes Unity Catalog the backbone for access, lineage, and quality.

Phase 1 (Days 0–30): Inventory and foundation

Classify datasets into core domains: Payments, Loans, Customers.
Define ownership and stewardship per domain (named individuals, not committees).
Map sensitive attributes and masking rules (PII, PCI‑adjacent, KYC artifacts). Document purpose‑based access.
Stand up Unity Catalog with metastore, catalogs, schemas, and baseline permissions. Turn on lineage capture for notebooks, jobs, and BI connections.

Owners: Data Governance Lead with Security and Platform.

Phase 2 (Days 31–60): Quality and governed access pilot

Implement Delta expectations for critical tables (e.g., payments ledger, loan master, customer KYC).
Build quality scorecards per domain; publish them to stakeholders weekly.
Add approval workflows for schema changes and policy updates.
Pilot governed access using row/column policies for a sensitive domain (e.g., mask account number and redact customer PII unless role = “Risk Analyst – Approved Purpose”).

Owners: Data Stewards, Data Engineering, Compliance.

Phase 3 (Days 61–90): Scale via code and close the audit loop

Convert policies to policy‑as‑code with version control and CI checks.
Run periodic entitlement reviews with SoD enforcement and recorded approvals.
Provide read‑only auditor access and pre‑packaged evidence (lineage diagrams, access logs, policy diffs).
Automate incident detection and remediation with agentic runbooks (e.g., quarantine a bad table, notify steward, open ticket, generate RCA template).

Owners: Platform Ops, Compliance, Internal Audit.

5. Governance, Compliance & Risk Controls Needed

Access recertification: Quarterly entitlement reviews with steward and data owner approvals. Capture sign‑offs and exceptions.
Data loss prevention (DLP): Scan tables and notebooks for sensitive patterns; block exfiltration to unmanaged destinations.
Encryption and key rotation: Use managed keys with rotation schedules; keep key custody and audit logs aligned to policy.
Lineage everywhere: Require lineage on all critical reports; no “go‑live” without traceable sources, transforms, and consumers.
SoD enforcement: Separate roles for policy authors, approvers, and deployers; restrict production grants to a narrow operator set.
Change control: All policy and schema changes via pull requests with evidence of testing and approvals.
Vendor lock‑in mitigation: Express policies and quality rules as code and keep business definitions in a central, portable repo.

6. ROI & Metrics

Mid‑market finance teams should baseline current state during Phase 1 and track improvements after Phase 2 and 3.

Cycle time: Time to provision data access for a new analyst or use case. Target: 60–80% faster via standardized policies and approvals.
Data quality: Defect rate per thousand records in critical tables. Target: 30–50% reduction via Delta expectations and stewardship.
Audit readiness: Hours to compile evidence for a regulatory exam. Target: Reduce by 50–70% with lineage, policy‑as‑code, and packaged logs.
Incident MTTR: Time to detect and remediate data issues. Target: 40–60% faster with agentic runbooks.
Business impact (example): A specialty lender’s monthly delinquency report moved from 5 business days and frequent re‑runs to 2 days with near‑zero restatements. Result: faster reserve decisions and fewer executive escalations.

7. Common Pitfalls & How to Avoid Them

Starting with technology, not ownership: Without named stewards, policies sprawl. Assign owners on day one.
One‑size‑fits‑all masking: Finance has nuanced roles and purposes. Use row/column policies keyed to role and use case, not just table‑level grants.
Quality checks without action: Expectations that fail silently erode trust. Wire failures to incident queues and remediation runbooks.
Skipping SoD: Letting engineers grant and certify their own access invites audit findings. Separate duties and record approvals.
No auditor workspace: If auditors cannot self‑serve read‑only evidence, you’ll burn weeks exporting screenshots. Provide curated, read‑only views and lineage.
Over‑customization: Hard‑coded, team‑specific rules are brittle. Centralize definitions and express policies as code for reuse.

30/60/90-Day Start Plan

First 30 Days

Inventory and classify datasets into Payments, Loans, Customers; tag sensitivity levels.
Establish domain ownership and stewardship; define RACI across CDO sponsor, Stewards, Platform, Security, Compliance.
Stand up Unity Catalog with baseline structure and lineage turned on for critical pipelines and BI connectors.
Draft masking rules for PII and PCI‑adjacent fields; document purpose‑based access.
Baseline metrics: access SLAs, quality defect rates, audit evidence effort.

Days 31–60

Implement Delta expectations on priority tables; publish domain scorecards.
Introduce change approvals for schema/policy updates; capture evidence.
Pilot governed access with row/column policies in one sensitive domain (e.g., Loans) and measure access SLA and user satisfaction.
Begin entitlement review cadence; test SoD separation.

Days 61–90

Migrate policies to policy‑as‑code with CI checks and deployment gates.
Expand governed access patterns across domains; standardize masking macros.
Enable auditor read‑only workspace and automate evidence packaging.
Deploy agentic runbooks for incident triage and remediation; track MTTR improvement.
Conduct an access recertification; close exceptions and publish a final governance report.

9. Industry‑Specific Considerations

GLBA and privacy: Treat customer PII with purpose‑limited access; log all queries that touch sensitive fields.
PCI adjacency: Even if you don’t store full PAN, enforce stricter controls around payment tokens and card‑present metadata.
Model risk: For credit and fraud models, keep lineage and approvals for features and training data; capture who approved the dataset and when.
Reconciliations: Tie financial reporting tables to golden sources and enforce referential integrity via expectations.
Jurisdictional rules: If operating across states, encode state‑specific masking (e.g., stricter SSN handling) into row/column policies.

10. Conclusion / Next Steps

Unity Catalog can anchor a finance‑grade governance program—but only if paired with clear ownership, quality enforcement, and policy‑as‑code. A 90‑day rollout that starts small, proves governed access and quality, and then scales controls organization‑wide will reduce audit friction and restore trust in metrics.

Kriv AI, a governed AI and agentic automation partner for the mid‑market, helps teams move faster with governance‑as‑code starter kits, agentic entitlement reviews, quality monitor blueprints, and audit evidence packaging. Kriv AI also supports data readiness, MLOps, and workflow orchestration so lean teams can run governed AI at scale. If you’re exploring governed Agentic AI for your mid‑market organization, Kriv AI can serve as your operational and governance backbone.

JavaScript is disabled.

This page requires JavaScript to load the full interactive experience.

Reload page | Browse all articles