Lead Data Engineer (Databricks)
Job Description:
Position: Lead Data Engineer
Contract Type: Fixed term / Contract
Contract Duration: Start Date: 25 May 2026 – End Date: December 2026
Work Model: Hybrid (2-3 days a week)
Work Location: Sandton, Johannesburg, South Africa (Hybrid / Office-based as required)
Role Overview
We are seeking a Lead / Senior Data Engineer to design, build, and operate modern Databricks and Lakehouse data platforms that support advanced analytics, AI, and Generative AI use cases.
This role is a senior individual contributor position, operating within product-aligned, cross‑functional squads. The successful candidate will deliver high-quality, governed, scalable data assets consumed by analytics platforms, machine learning models, and Generative AI solutions, including LLM- and agent-based systems.
Key Responsibilities
1. Databricks & Data Platform Engineering
Design, build, and operate data solutions using Databricks, including:
- Delta Lake
- Databricks Jobs and Workflows
- Unity Catalog
- Notebooks and shared libraries
- Develop scalable, reliable Lakehouse architectures supporting analytics and AI workloads.
2. Data Enablement & Consumption
Enable data consumption for:
- Generative AI use cases (e.g. Retrieval-Augmented Generation, AI services, agent workflows)
- Analytics and reporting platforms
- Downstream operational and business systems
- Support feature-style and curated data access patterns required by AI and GenAI workloads.
3. Generative AI Data Enablement
Build and maintain data pipelines that feed Generative AI applications, including:
- Curated knowledge and reference datasets
- Structured and semi-structured data sources
- Metadata, lineage, and traceability for AI consumption
- Enable common GenAI data patterns such as:
- Retrieval Augmented Generation (RAG)
- Contextual and prompt data preparation
- Model input, output, and feedback data flows
4. Engineering Standards & Best Practices
Develop production-grade data pipelines using:
- Python
- SQL
- Apache Spark
- Implement automated testing, CI/CD, and deployment practices for data workloads.
- Ensure data solutions are:
- Observable
- Resilient
- Performant
- Cost-efficient
- Continuously improve data quality, reliability, and operational stability.
5. Collaboration & Ways of Working
- Act as a senior engineer within a cross-functional product squad.
- Collaborate closely with:
- Product Owners
- AI / Machine Learning Engineers
- Analytics teams
- Platform and security teams
- Provide engineering input into design discussions and delivery decisions.
- Support peer reviews and contribute to shared engineering standards.
- Provide mentorship and technical guidance, including involvement in AI Engineer development.
6. Risk, Governance & Run
- Ensure all data solutions comply with enterprise security, risk, and governance standards.
- Support the operational stability of data pipelines used by analytics and AI workloads.
- Participate in incident resolution and root cause analysis.
- Maintain appropriate technical documentation and runbooks.
Required Background & Experience:
- 10–15 years of industry experience in data engineering or related fields.
- 5+ years' operating as a Senior or Lead Data Engineer.
- Mandatory Technical Skills (with minimum experience)
- Databricks (hands-on): 2+ years
- Enterprise data lake / lakehouse architecture: 5+ years
- Python: 5+ years
- SQL: 5+ years
- Apache Spark: 5+ years
- Production-grade data platforms: 3+ years
- Enterprise or regulated environments: 5+ years
Mandatory Skills Summary:
- Databricks
- Data lake and lakehouse architecture
- Python
- SQL
- Apache Spark
- Production-grade data platforms
- Enterprise or regulated environments
Desirable / Beneficial Skills:
- Experience enabling AI, ML, or Generative AI use cases from a data engineering perspective
Familiarity with:
- RAG data patterns
- Feature-style or AI-serving datasets
- Vector-based or embedding-ready data workflows
- Experience working in Agile, product-aligned squads
- Exposure to cloud-native data platforms such as AWS or Azure
Desired Skills Summary:
- AI, ML, or Generative AI
- RAG data patterns
- Feature-style or AI-serving datasets
- Vector or embedding-ready data workflows
- Cloud-native data platforms (AWS or Azure)