Job DescriptionCompany OverviewCitco is a global leader in financial services, delivering innovative solutions to some of the world's largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for aTech Lead – Data Engineeringwithextensive Databricks expertiseand AWS experience to lead mission-critical data initiativesRole SummaryAs theTech Lead – Data Engineering, you will be responsible forarchitecting, implementing, and optimizing end-to-end data solutionsonDatabricks(Spark, Delta Lake, MLflow, etc.) while integrating with core AWS services (S3, Glue, Lambda, etc.). You willlead a technical teamof data engineers, ensuring best practices in performance, security, and scalability. This role requires adeep, hands-on understanding of Databricksinternals and a track record ofdelivering large-scale data platformsin a cloud environment.ResponsibilitiesKey ResponsibilitiesDatabricks Platform & ArchitectureArchitect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning.Leverage Databricks SQL Analytics for interactive querying and report generation.Manage cluster lifecycle (provisioning, sizing, scaling) and optimize Spark jobs for cost and performance.Implement structured streaming pipelines for near real-time data ingestion and processing.Configure and administer Databricks Repos, notebooks, and job scheduling/orchestration to streamline development workflows.AWS Cloud IntegrationIntegrate Databricks with AWS S3 as the primary data lake storage layer.Design and implement ETL/ELT pipelines using AWS Glue catalog, AWS Lambda, and AWS Step Functions where needed.Ensure proper networking configuration (VPC, security groups, private links) for secure and compliant data access.Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.Data Pipeline & Workflow ManagementDevelop and maintain scalable, reusable ETL frameworks using Spark (Python/Scala).Orchestrate complex workflows, applying CI/CD principles (Git-based version control, automated testing).Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations.Integrate with MLflow (if applicable) for experiment tracking and model versioning, ensuring data lineage and reproducibility.Performance Tuning & OptimizationConduct advanced Spark job tuning (caching strategies, shuffle partitions, broadcast joins, memory optimization).Fine-tune Databricks clusters (autoscaling policies, instance types) to manage cost without compromising performance.Optimize I/O performance and concurrency for large-scale data sets.Security & GovernanceImplement Unity Catalog or equivalent Databricks features for centralized governance, access control, and data lineage.Ensure compliance with industry standards (e.g., GDPR, SOC, ISO) and internal security policies.Apply IAM best practices across Databricks and AWS to enforce least-privilege access.Technical Leadership & MentorshipLead and mentor a team of data engineers, conducting code reviews, design reviews, and knowledge-sharing sessions.Champion Agile or Scrum development practices, coordinating sprints and deliverables.Serve as a primary technical liaison, working closely with product managers, data scientists, DevOps, and external stakeholders.Monitoring & ReliabilityConfigure observability solutions (e.g., Datadog, CloudWatch, Prometheus) to proactively identify performance bottlenecks.Set up alerting mechanisms for latency, cost overruns, and cluster health.Maintain SLAs and KPIs for data pipelines, ensuring robust data quality and reliability.Innovation & Continuous ImprovementStay updated on Databricks roadmap and emerging data engineering trends (e.g., Photon, Lakehouse features).Evaluate new tools and technologies, driving POCs to improve data platform capabilities.Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.QualificationsQualificationsEducational BackgroundBachelor's or Master's degree in Computer Science, Data Science, Engineering, or equivalent experience.Technical ExperienceDatabricks Expertise: 5+ years of hands-on Databricks (Spark) experience, with a focus on building and maintaining production-grade pipelines.AWS Services: Proven track record with AWS S3, EC2, Glue, EMR, Lambda, Step Functions, and security best practices (IAM, VPC).Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.Data Warehousing & Modeling: Familiarity with RDBMS (e.g., Postgres, Redshift) and dimensional modeling techniques.Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.Version Control & CI/CD: Git-based workflows (GitHub/GitLab), Jenkins or similar CI/CD tools for automated builds and deployments.Leadership & Soft SkillsDemonstrated experience leading a team of data engineers in a complex, high-traffic data environment.Outstanding communication and stakeholder management skills, with the ability to translate technical jargon into business insights.Adept at problem-solving, with a track record of quickly diagnosing and resolving data performance issues.Certifications (Preferred)Databricks Certified Associate/Professional (e.g., Databricks Certified Professional Data Engineer).AWS Solutions Architect (Associate or Professional).