Job Overview
Lead a high-performing team of data engineers in building scalable and secure data solutions on Databricks and AWS.
* Develop and manage Databricks-based Lakehouse platforms (Delta Lake, Spark, MLflow)
* Integrate with various AWS services including S3, Glue, Lambda, and Step Functions
* Design and optimize large-scale ETL/ELT pipelines using Spark (Python/Scala)
* Automate infrastructure setup for efficient operations
* Optimize the performance of Spark jobs and cluster configurations for robust results
* Implement strong security governance using IAM, VPC, and Unity Catalog
Required Expertise:
* Databricks experience in production environments with emphasis on scalability and reliability
* Advanced knowledge of AWS services: S3, Glue, Lambda, VPC, IAM, EMR
* Proficiency in coding languages: Python (PySpark), Scala, and SQL
* Expertise in CI/CD pipeline automation, Git workflows, and automated testing
Key Qualifications
1. Proven track record of delivering successful projects with complex data engineering requirements
2. Ability to collaborate effectively with cross-functional teams
3. Excellent communication skills for conveying technical concepts to non-technical stakeholders