MLOps end-to-end Technical Blueprint

This page contains information related to upcoming products, features, and functionality. It is important to note that the information presented is for informational purposes only. Please do not rely on this information for purchasing or planning purposes. The development, release, and timing of any products, features, or functionality may be subject to change or delay and remain at the sole discretion of GitLab Inc.
Status Authors Coach DRIs Owning Stage Created
proposed a_akgun fdegier igor.drozdov sean_carroll devops modelops 2025-01-30

This blueprint describes GitLab end-to-end MLOps platform architecture, designed to support the complete machine learning lifecycle from experimentation to production deployment. This initiative supports our SaaS instance and self-managed instances while maintaining our “single application” philosophy.

image

Summary

GitLab MLOps is an integrated platform that provides end-to-end machine learning lifecycle management capabilities within GitLab’s single application. It extends GitLab’s existing CI/CD and registry capabilities to support ML workflows from experimentation to production and observability.

Motivation

Organizations face several key challenges when operationalizing ML:

  1. Reproducibility: Data scientists struggle to track experiments and recreate results
  2. Collaboration: Disconnect between data science, engineering and governance teams slows development
  3. Deployment: Manual, error-prone processes for moving models to production
  4. Governance: Difficulty maintaining oversight of model development, deployment and impact

These challenges often result in:

  • Extended time-to-production for ML models
  • Inconsistent development practices
  • Security and compliance risks
  • Resource inefficiencies

Goals

  • Provide end-to-end ML lifecycle management integrated with existing GitLab DevOps workflows
  • Provide a Model Registry - a place to store model versions, runs, metadata and artifacts
  • Enable deployment of model versions from the model registry using CI/CD pipelines
  • Enable importing models from Vertex and Huggingface to GitLab model registry
  • Limited compatability with MLflow client for model experiments and registry

Non-Goals

  • Providing extensive computation resources for model training beyond GPU runners
  • Providing a model serving infrastructure
  • Implementing feature stores
  • Implementing data stores and becoming a dataops platform
  • Developing a full-fledged MLflow server by achieving 100% MLflow API compatibility
  • Model monitoring and tracing

Proposal

GitLab will provide a comprehensive MLOps platform built on top of existing GitLab infrastructure, leveraging and extending our CI/CD capabilities, package registry for artifact storage. The platform will support the full ML lifecycle through dedicated components while maintaining GitLab single application philosophy.

Design and Implementation Details

Component Architecture

graph TB
    subgraph DevPhase["Development Phase"]
        direction TB
        A1[Experiment Tracking]:::ongoing
        A2[Model Registry]:::ongoing
        A3[GPU Runner Management]:::ongoing
        A4[Code repository]:::completed
    end

    subgraph CiCd["CI/CD Pipeline"]
        B4[Deployment Pipeline]:::new
    end


    A1 --> A2
    DevPhase --> CiCd
    A3 --> A1
    A4 --> A1
    CiCd --> DevPhase

    %% Define styles for different statuses
    classDef completed fill:#a3cfbb,stroke:#178344,color:black
    classDef new fill:#ffdebd,stroke:#ff8c00,color:black
    classDef ongoing fill:#b8d0ff,stroke:#0066cc,color:black

    %% Place the legend at the bottom with right alignment
    subgraph Legend[" "]
        direction LR
        L3[Ongoing]:::ongoing
        L2[New]:::new
        L1[Completed]:::completed
    end

    %% Position the legend at the bottom right
    style Legend fill:none,stroke:none

Diagram Notes

  • Code Repository: This is the Git repository either remote or locally.
  • Experiment tracking: Code produces runs, artifacts, metrics etc. the metadata is stored centrally in Experiment Tracking
  • Model Registry: Uses Package Registry to store artifacts
  • Deployment pipeline: These are triggered either via Model Registry or via Git triggers.

Core Components

1. Experiment Tracking (existing feature)

The experiment management system will track ML training runs and their parameters:

2. Model Registry (existing feature)

Central repository for ML model management: Model registry docs.

  • Model versioning and tagging (link to docs)
  • Model metadata and lineage tracking
  • Model approval workflows using GitLab labels for models and versions
  • Integration with CI/CD pipelines to allow training and deployment
  • Access control and security policies for model registry based on existing roles, custom roles and model registry read and write permissions.
  • Compatibility with MLflow client
  • Model cards with freeform markdown descriptions
  • Governance instruments
  • Users can store large data files in model registry too next to their model version artifacts for example.
  • Integration with GCP Vertex AI model registry

3. Connection to GPU resources (existing feature)

Link to GPU runners docs.

  • Maintain compatibility with GitLab runner
  • Ensure ease of use with GPU runners

4. Model Deployment

Automated model deployment pipeline:

  • Container-based deployment
  • Multi-variate testing support
  • Canary deployments
  • Rollback capabilities
  • Environment management
  • Integration with GCP Vertex AI for deployment

5. API Clients

Integration Points

  1. GitLab CI/CD Integration

    • Provide training, evaluation and validation CI/CD templates for ML workflows using GitLab (GPU) runners.
    • Predefined variables for ML operations
    • ML-specific CI/CD stages
  2. Issue Tracking Integration

    • Model development issues
    • Approval workflows
  3. GitLab Package registry

    • Used for storage of model artifacts

Deployment Options

MLOps will support self-managed installation, including support for air-gapped environments and GitLab.com deployment and GitLab Dedicated.

Development Guidelines

No additional need beyond GDK. You might need MLflow client and GitLab MLOps Python Client

Out of scope

  • Full MLflow client compatibility
  • LLMOps
  • AgentOps
  • Model Governance, Security and Compliance
  • Container Registry Integration

Conclusion

This technical blueprint provides a framework for implementing a comprehensive MLOps platform within GitLab. The proposed architecture leverages GitLab existing strengths while adding ML-specific capabilities that enable organizations to effectively manage their ML workflows at scale.

Last modified April 28, 2025: Cleanup and reorg shortcodes (eef3c341)