Blog
Building an On-Premises AI Model Registry: Version Control for Machine Learning
How to design and implement a model registry that brings version control, lineage tracking, and reproducibility to your on-premises AI infrastructure.
Why Model Registries Matter On-Premises
When organizations run AI workloads in the cloud, managed services like AWS SageMaker Model Registry or Azure ML handle model versioning behind the scenes. On-premises teams rarely have this luxury. Instead, models live in shared filesystems, S3-compatible buckets, or worse, individual developer machines. The result is predictable: teams lose track of which model version is running in production, cannot reproduce results from three months ago, and have no reliable way to roll back when a new deployment degrades performance.
A model registry solves this by acting as the single source of truth for every model artifact your organization produces. It stores not just the model weights, but the metadata that makes those weights meaningful: training data references, hyperparameters, evaluation metrics, lineage information, and deployment status. Think of it as Git for your machine learning artifacts — except the objects being tracked are often gigabytes rather than kilobytes.
Core Components of an On-Premises Model Registry
A well-designed model registry consists of four interconnected layers, each addressing a specific operational need:
Artifact storage: The physical layer where model files, weights, and associated assets are persisted. On-premises, this typically means an S3-compatible object store like MinIO or a distributed filesystem like CephFS. The key requirement is content-addressable storage — every artifact gets a unique hash so you can verify integrity and detect tampering.
Metadata catalog: A structured database (PostgreSQL works well) that stores everything about a model except the model itself. This includes version numbers, training parameters, dataset fingerprints, evaluation scores, authorship, timestamps, and tags. The metadata catalog is what makes the registry searchable and auditable.
Lineage graph: A directed acyclic graph that traces how each model was produced. Which dataset was used? Which preprocessing pipeline transformed it? Which training run produced the weights? Lineage answers the question "why does this model behave the way it does?" and is essential for debugging production issues.
Lifecycle state machine: Models move through defined stages — development, staging, production, archived, deprecated. The registry enforces these transitions, ensuring that only models that pass validation gates can be promoted to production. This prevents ad-hoc deployments that bypass quality checks.
Choosing the Right Tool for On-Premises Deployment
Several open-source tools can serve as the foundation for an on-premises model registry. The right choice depends on your existing infrastructure and team capabilities.
MLflow Model Registry is the most widely adopted option. It integrates natively with MLflow's experiment tracking and provides a REST API for programmatic access. MLflow runs entirely on-premises with a PostgreSQL backend and S3-compatible artifact store. Its main limitation is that lineage tracking is relatively shallow — it connects models to runs, but does not trace data provenance through upstream pipelines without additional tooling.
DVC (Data Version Control) takes a Git-native approach. Model files are tracked alongside code using Git-compatible metadata files while the actual artifacts live in remote storage. DVC excels at reproducibility because every experiment is a commit, but it lacks the lifecycle management features (staging, approval workflows) that production teams need without building custom automation.
ModelDB from MIT's CSAIL focuses on lineage and reproducibility with a strong query interface for comparing model versions. It is well-suited for research-oriented teams but requires more effort to integrate into production serving workflows.
For most enterprise on-premises deployments, MLflow provides the best balance of features, community support, and operational maturity. Pair it with MinIO for artifact storage and PostgreSQL for metadata, and you have a registry stack that runs entirely within your network perimeter.
Implementing Lineage Tracking That Actually Works
Lineage tracking is the feature most teams intend to implement and most often get wrong. The common failure mode is treating lineage as documentation — something engineers fill in manually after training. Manual lineage is always incomplete and frequently inaccurate.
Effective lineage must be automatic and immutable. Every training run should programmatically log its inputs (dataset version, configuration file hash, base model reference), its environment (container image digest, library versions, hardware specifications), and its outputs (model artifact hash, evaluation metrics). This happens through instrumentation in your training pipelines, not through manual tagging.
A practical approach is to wrap your training entrypoints in a decorator or context manager that captures this information automatically. When a data scientist runs a training job, the wrapper logs the Git commit of the training code, the hash of the input dataset, the exact container image used, and all hyperparameters. The resulting model artifact is then registered with all of this provenance attached. No extra steps required from the practitioner.
Lineage becomes particularly valuable during incident response. When a production model starts producing unexpected outputs, you can trace back through the lineage graph to identify exactly what changed — the data, the code, the configuration, or the base model. Without lineage, this investigation becomes guesswork.
Lifecycle Management and Promotion Gates
A model registry without lifecycle management is just a file store with extra metadata. The real operational value comes from enforcing promotion gates — automated checks that a model must pass before moving from one stage to the next.
A typical lifecycle looks like this: a newly trained model enters the registry in a Development stage. To move to Staging, it must pass automated evaluation against a held-out test set and meet minimum performance thresholds. To reach Production, it must pass integration tests in a staging environment, demonstrate that it does not regress on key metrics compared to the current production model, and receive approval from a designated reviewer.
Implement these gates as CI/CD pipeline stages triggered by registry events. When a model is registered, a webhook fires and kicks off the evaluation pipeline. Results are written back to the registry as metadata. Promotion requests that do not meet the gate criteria are automatically rejected with a clear explanation of which checks failed.
This workflow eliminates the dangerous pattern of deploying models through ad-hoc scripts or manual file copies. Every production model has a traceable path from training to deployment, and that path includes evidence that quality standards were met.
Scaling the Registry for Multi-Team Environments
As more teams adopt the registry, access control and organization become critical. Implement namespace isolation so that each team manages its own models without risk of naming collisions or accidental overwrites. Within each namespace, enforce role-based access: data scientists can register and tag models, ML engineers can promote to staging, and only designated production operators can approve production deployments.
Storage scaling requires attention as model counts grow. Implement automatic archival policies that move models in deprecated or archived stages to cheaper storage tiers after a configurable retention period. Keep the metadata indefinitely — it is small and serves audit requirements — but move the large artifact files to cold storage.
Finally, invest in a search and comparison interface. Teams need to quickly find the best-performing model for a given task, compare metrics across versions, and understand why one model outperforms another. A well-indexed metadata catalog with a simple query API (or even a lightweight web UI) dramatically reduces the friction of working with the registry and drives adoption across the organization.
Featured image by Albert Stoynov on Unsplash.