Software Engineering

What I Learned Building a Compliant Credit Scoring Pipeline (and how ZenML made it simple)

Marwan Zaarab
May 23, 2025
8 mins

Imagine you're responsible for a credit scoring system that evaluates thousands of loan applications daily. Your models directly impact people's financial opportunities. They determine who gets approved for loans, mortgages, and credit cards, and under what terms. When working properly, these algorithms can increase financial inclusion by making consistent, data-driven decisions. When flawed, they risk perpetuating or amplifying existing biases—as Goldman Sachs discovered when a single viral tweet exposed a 20× gender gap in Apple Card credit limits, triggering multi-year regulatory scrutiny.

Under the EU AI Act, these systems fall under the category of "high-risk AI" (Annex III), which triggers strict requirements for documentation, fairness, oversight, and risk management. The SCHUFA ruling in late 2023 tightened these requirements further. For financial institutions, this introduces a new compliance landscape that fundamentally changes how credit scoring models are built, deployed, and maintained.

This post walks through a demo project using ZenML to automate most EU AI Act compliance requirements.
For a deep dive into the EU AI Act itself and its February 2025 updates, check out Alex’s overview post. This blog focuses on the practical implementation of those requirements.

The Compliance Challenge in Banking

Banks have always needed to justify their models, but the AI Act raises the bar. It's no longer sufficient to demonstrate that your model works. You must prove how it works, why it works, and ensure it works fairly, securely, and accountably.

The documentation requirements are extensive: system descriptions, technical specs, development processes, risk procedures, validation results, and monitoring plans. This must stay current across model lifecycles and remain audit-ready. With weekly retrains and hundreds of production models, manual compliance becomes unmanageable

Non-compliance carries serious consequences: fines up to €35 million or 7% of global turnover, not to mention the potential reputational damage. Yet only 14% of financial organizations still track models in spreadsheets.

What’s needed is infrastructure that makes compliance a natural outcome of good engineering, not a separate and burdensome process. The EU AI Act essentially codifies ML best practices:

I built a credit-scoring pipeline satisfying every requirement without spreadsheets or manual processes. What made the process dramatically easier was leveraging ZenML’s lineage tracking, versioning, artifact management, and automatic logging, turning compliance into a natural byproduct of a well-designed ML pipeline.

Architecture for Compliance: Three-Pipeline Design

The credit-scoring workflow is split into three ZenML pipelines:

  • Feature Engineering: Ingests raw loan-application data, profiles quality with WhyLogs, applies cleaning and encoding steps, and stores versioned feature sets for downstream use.
  • Model Training: Trains and tunes a LightGBMClassifier, logs hyperparameters, evaluates performance across demographic groups, and bundles the model with full lineage.
  • Deployment & Monitoring: Adds an approval gate, packages the model with its compliance artifacts (including Annex IV), and deploys it as a FastAPI service on Modal.

Let's dive into each pipeline to see how they address specific EU AI Act compliance requirements.

Feature Engineering Pipeline: Data Governance in Action

The first pipeline lays the groundwork for EU AI Act compliance by focusing on data governance. Articles 10 emphasizes strong data quality controls, while Article 12  requires meticulous record-keeping for all data processing activities.

Python
@pipeline(name=FEATURE_ENGINEERING_PIPELINE_NAME)
def feature_engineering(
    dataset_path: str = "src/data/credit_scoring.csv",
    test_size: float = 0.2,
    normalize: bool = True,
    target: str = "TARGET",
    random_state: int = 42,
    sample_fraction: Optional[float] = None,
    sensitive_attributes: List[str] = None,
):
    """End-to-end feature engineering pipeline with data governance controls."""
    # Load data with provenance tracking (Art. 10)
    df, whylogs_data_profile = ingest(
        dataset_path=dataset_path,
        sample_fraction=sample_fraction,
        target=target,
        sensitive_attributes=sensitive_attributes,
    )

    # Split dataset with documented rationale (Art. 10)
    train_df, test_df = data_splitter(
        df=df,
        test_size=test_size,
        target=target,
        random_state=random_state,
    )

    # Preprocess with transformation tracking (Art. 10)
    train_df_processed, test_df_processed, preprocess_pipeline = data_preprocessor(
        train_df=train_df,
        test_df=test_df,
        normalize=normalize,
        target=target,
    )

    return whylogs_data_profile,train_df_processed, test_df_processed, preprocess_pipeline

The pipeline incorporates several essential compliance features:

  • Data Provenance with SHA-256 Hashing: Ensures the exact data used for training is traceable and tamper-proof.
  • WhyLogs Profiling: Captures comprehensive quality metrics like data drift and anomalies.
  • Preprocessing Transparency: Each transformation is logged, enabling full visibility into how data is prepared.
  • Sensitive Attribute Identification: Flags protected demographic features for fairness assessments.

By wrapping these steps in ZenML's pipeline framework, we gain automatic lineage tracking, artifact versioning, and metadata persistence, which are all essential for meeting the record-keeping requirements of Article 12.

Training Pipeline: Performance with Accountability

The second pipeline handles model training, evaluation, and risk assessment, directly addressing Articles 9 (Risk Management), 11 (Technical Documentation), and 15 (Accuracy) of the EU AI Act.

Python
@pipeline(name=TRAINING_PIPELINE_NAME)
def training(
    train_df: Any = None,
    test_df: Any = None,
    target: str = "TARGET",
    hyperparameters: Optional[Dict[str, Any]] = None,
    protected_attributes: Optional[List[str]] = None,
    approval_thresholds: Optional[Dict[str, float]] = None,
    risk_register_path: str = "docs/risk/risk_register.xlsx",
    model_path: str = "models/model.pkl",
):
    """Training pipeline with integrated evaluation and risk assessment."""
    # Train model with documented hyperparameters (Art. 11)
    model = train_model(
        train_df=train_df,
        test_df=test_df,
        target=target,
        hyperparameters=hyperparameters,
    )

    # Evaluate with fairness metrics (Art. 15)
    evaluation_results, fairness_metrics = evaluate_model(
        model=model,
        test_df=test_df,
        target=target,
        protected_attributes=protected_attributes,
        approval_thresholds=approval_thresholds,
    )

    # Conduct risk assessment (Art. 9)
    risk_scores = risk_assessment(
        evaluation_results=evaluation_results,
	fairness_metrics=fairness_metrics,
        risk_register_path=risk_register_path,
    )

    return model, evaluation_results, risk_scores

The key compliance features here include:

  • Hyperparameter documentation for reproducibility (satisfying Article 11's technical documentation requirements)
  • Performance metrics across demographic groups (providing the transparency needed for Article 15 compliance)
  • Risk scoring with mitigation tracking (providing the structured risk management required by Article 9)

The evaluate_model step automatically calculates fairness metrics across protected attributes and generates alerts when potential bias is detected, which addresses the fairness requirements of Article 15.

Deployment Pipeline: Human Oversight and Documentation

The third pipeline manages model approval, deployment, and monitoring, addressing Articles 14, 17, and 18 of the EU AI Act. This pipeline demonstrates how meaningful human oversight can be integrated into automated deployment workflows.

Python
@pipeline(name=DEPLOYMENT_PIPELINE_NAME)
def deployment(
    model: Annotated[Any, MODEL_NAME] = None,
    preprocess_pipeline: Annotated[Any, PREPROCESS_PIPELINE_NAME] = None,
    evaluation_results: Annotated[Any, EVALUATION_RESULTS_NAME] = None,
    risk_scores: Annotated[Any, RISK_SCORES_NAME] = None,
    environment: str = MODAL_ENVIRONMENT,
):
    """Deployment pipeline with human oversight and monitoring."""
    # Human approval gate (Art. 14)
    approved, approval_record = approve_deployment(
        evaluation_results=evaluation_results,
        risk_scores=risk_scores,
        approval_thresholds=DEFAULT_APPROVAL_THRESHOLDS,
    )

    # Deploy with versioning (Art. 16)
    if not approved:
        return None

    deployment_info = modal_deployment(
        model=model,
        preprocess_pipeline=preprocess_pipeline,
        environment=environment,
    )

    # Generate SBOM for security tracking (Art. 15)
    sbom = generate_sbom(model=model, preprocess_pipeline=preprocess_pipeline)

    # Setup monitoring (Art. 17)
    monitoring_plan = post_market_monitoring(
        model=model,
        evaluation_results=evaluation_results,
        deployment_info=deployment_info,
    )

    # Generate technical documentation (Art. 11)
    documentation_path = generate_annex_iv_documentation(
        evaluation_results=evaluation_results,
        risk_scores=risk_scores,
        deployment_info=deployment_info,
        sbom=sbom,
        monitoring_plan=monitoring_plan,
        approval_record=approval_record,
    )

    return documentation_path

The key compliance features include:

  • Human approval gate with documented rationale via the approve_deployment step (important for Article 14)
  • Software Bill of Materials generation
  • Post-market monitoring configuration
  • Annex IV documentation generation

The human oversight implementation is particularly important. The approve_deployment step requires explicit review and sign-off before models can be deployed, creating an auditable record of human oversight. When the system detects issues requiring review, it sends structured Slack notifications using ZenML's Slack alerter integration with this assessment:

This wasn't a false positive. Our model exhibited severe disparate impact (0.161 ratio) across age groups, meaning it was systematically discriminating against certain age demographics. The Slack alert enabled immediate visibility into this critical bias issue, ensuring it didn't get lost in terminal logs or overlooked during deployment cycles.

The EU AI Act's requirements allowed me to discover a critical fairness issue that demanded immediate remediation, which could've gone undetected without full compliance monitoring. It shows how compliance infrastructure becomes a quality assurance mechanism that protects both institutions and the people they serve.

The Documentation Engine: Automating Annex IV

Automating comprehensive technical documentation turned out to be one of the most technically challenging aspects of compliance. The EU AI Act's Annex IV requirements are extensive and specific, demanding everything from general system descriptions to detailed post-market monitoring plans.

These requirements become manageable when you understand how existing MLOps infrastructure can address them. Rather than building compliance systems from scratch, ZenML's native capabilities handle most requirements through features you're likely already using for good ML engineering practices.

Article Key Requirements ZenML Implementation
Art. 9 Risk Management Risk assessment, mitigation measures Step output tracking, metadata logging
Art. 10 Data Governance Data quality, representativeness Artifact lineage, dataset versioning
Art. 11 Documentation Complete system documentation Pipeline run history, step metadata
Art. 12 Record-keeping Automated event logging Run history tracking, artifact storage
Art. 14 Human Oversight Human review capabilities Approval steps, model workflows
Art. 15 Accuracy Performance metrics, robustness Metrics tracking, evaluation artifacts
Art. 17 Post-market Monitoring Deployed model monitoring Artifact versioning, pipeline triggers
Art. 18 Incident Notification Incident reporting mechanisms Step outcomes tracking, failure monitoring

The generate_annex_iv_documentation step transforms scattered pipeline metadata into structured compliance packages. Rather than maintaining separate compliance documentation, the system leverages ZenML's automatic metadata capture and artifact store to generate complete technical documentation bundles.

Each documentation package includes multiple artifacts that address specific compliance requirements:

Artifact Content Source EU AI Act Coverage
annex_iv.md ZenML metadata + Jinja templates Art. 11 (Technical Documentation)
model_card.md Performance metrics + fairness analysis Art. 13 (Transparency)
evaluation_results.yaml Model evaluation step outputs Art. 15 (Accuracy)
risk_scores.yaml Risk assessment step results Art. 9 (Risk Management)
git_info.md Repository commit information Art. 12 (Record-keeping)
sbom.json Requirements.txt parsing Art. 15 (Cybersecurity)
log_metadata.json Pipeline execution logs Art. 12 (Record-keeping)
README.md Generated artifact index Cross-article compliance mapping

This automated approach eliminates the inconsistency and incompleteness that plague manual documentation processes. The system ensures documentation accurately reflects actual implementation since it's generated directly from the same metadata that drives model training and deployment.

Benefits Beyond Compliance

While regulatory compliance is often perceived as a cost center, my experience implementing the EU AI Act requirements revealed several unexpected business benefits that extend far beyond regulatory checkbox exercises.

The documentation and validation requirements forced me to implement more rigorous testing protocols, resulting in models with better performance and reliability. Explicit fairness assessments led to more equitable predictions across demographic groups. Comprehensive validation procedures identified edge cases that might otherwise have been missed. Structured risk assessment practices helped prioritize model improvements based on actual impact and likelihood.

The Competitive Advantage of Compliance-Ready Infrastructure

The EU AI Act represents more than a regulatory burden. It's an opportunity to build better AI systems. Organizations that approach compliance as an infrastructure challenge rather than a documentation exercise will discover that regulatory requirements become competitive advantages.

ZenML's pipeline-based approach demonstrates how compliance can be embedded directly into development workflows, creating systems that are simultaneously more reliable, more transparent, and more trustworthy. As the AI Act's implementation timeline advances, the organizations that build these capabilities now will be positioned to innovate confidently within the regulatory framework.

Although this project focused on the financial sector, similar approaches apply to other high-stakes domains like the energy sector, where regulators like Ofgem are rolling out new AI oversight guidelines.

Try It Yourself

You can explore the CreditScorer project and experiment with it by following these steps:

  1. Clone the project.
  2. Follow the setup and installation instructions detailed in the project's README file.
  3. Run pipelines or launch the demo compliance dashboard:
Bash
streamlit run_dashboard.py

The included Streamlit app provides an interactive UI that provides real-time visibility into EU AI Act compliance status, a summary of current risk levels and compliance metrics, and a generated Annex IV documentation with export options.

Looking to Get Ahead in MLOps & LLMOps?

Subscribe to the ZenML newsletter and receive regular product updates, tutorials, examples, and more articles like this one.
We care about your data in our privacy policy.