We are seeking a visionary leader to spearhead the development of scalable, secure, and high-performance data solutions on Databricks and AWS.
Key Responsibilities:
* Design and implement Databricks-based Lakehouse platforms (Delta Lake, Spark, MLflow) that drive business growth.
* Integrate with AWS services including S3, Glue, Lambda, and Step Functions to streamline data processing.
* Develop and optimize scalable ETL/ELT pipelines using Spark (Python/Scala) to ensure seamless data flow.
* Automate infrastructure with Terraform or CloudFormation to reduce manual efforts.
* Enhance robust performance tuning of Spark jobs and cluster configurations to maximize efficiency.
* Implement strong security governance using IAM, VPC, and Unity Catalog to safeguard sensitive data.
* Lead a high-performing engineering team through Agile delivery cycles to achieve strategic objectives.
Required Skills and Qualifications:
* Proven expertise in Data Engineering and leadership skills to inspire and motivate teams.
* Extensive experience with Databricks in production environments to drive scalability and reliability.
* Advanced knowledge of AWS services including S3, Glue, Lambda, VPC, IAM, EMR to integrate with existing infrastructure.
* Strong coding skills in Python (PySpark), Scala, and SQL to develop efficient data solutions.
* Expertise in CI/CD pipelines, Git-based workflows, and automated testing to ensure high-quality deliverables.
* Familiarity with data modeling and warehousing (e.g., Redshift, Postgres) to design optimized data architectures.
* Proficient in orchestration and workflow tools (e.g., Airflow, Step Functions) to automate complex tasks.