Automating Global Data Platform Workflows with Azure & Databricks
Challenge
The company was transitioning from a heterogeneous AWS-based data platform (SageMaker, Lambda, Redshift) to a unified Azure architecture centered around Databricks. While the cloud team provisioned base Azure resources and workspaces, the actual Databricks setup had to support a multi-region, multi-environment deployment model — with separate sandbox, non-production, and production environments per geographic region.
They needed a robust, repeatable way to automate the creation and configuration of each workspace, with consistent access controls, data structures, and secrets — across dozens of isolated environments.
Solution
Nubosas designed and implemented a GitOps-driven automation layer that provisioned and configured Databricks environments from the ground up, including:
- Workspace setup across all combinations of region and environment (e.g. EU-prod, US-sandbox)
- User group and access role configuration for each environment
- Catalog and schema creation aligned with a Medallion Architecture model (Bronze, Silver, Gold)
- Role-based permission templates tailored to data engineering, analytics, and operations teams
- Secure secret management via Azure Key Vault and Databricks secret scopes
- Azure Storage endpoints for SFTP-based ingestion
- Azure Data Factory pipelines for scheduled data ingestion from enterprise sources
- ADO pipelines to manage deployments, Git repositories, and CI/CD operations for infrastructure
- Terraform state split by Databricks unit for infrastructure (account, metastore and workspace)
All configurations were codified, version-controlled, and deployable through pipelines, ensuring consistency and auditability.
Results
- A fully reproducible and scalable setup supporting multiple regions and environments
- Strong separation of concerns and clear environment boundaries
- Faster onboarding of internal teams and accelerated migration from AWS
- Improved security and compliance posture via centralized secrets management
- Minimal manual intervention needed during provisioning or updates
Technologies
Azure, Databricks, Azure Key Vault, Azure DevOps (ADO), Azure Storage, Azure Data Factory, Terraform