Overview
Join to apply for the Tech Lead – Data Engineering role at The Citco Group Limited.
Citco is a global leader in financial services, delivering innovative solutions to some of the world’s largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Tech Lead – Data Engineering with extensive Databricks expertise and AWS experience to lead mission-critical data initiatives.
Responsibilities
* Databricks Platform & Architecture: Architect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning. Leverage Databricks SQL Analytics for interactive querying and report generation. Manage cluster lifecycle (provisioning, sizing, scaling) and optimize Spark jobs for cost and performance. Implement structured streaming pipelines for near real-time data ingestion and processing. Configure and administer Databricks Repos, notebooks, and job scheduling/orchestration to streamline development workflows.
* AWS Cloud Integration: Integrate Databricks with AWS S3 as the primary data lake storage layer. Design and implement ETL/ELT pipelines using AWS Glue catalog, AWS Lambda, and AWS Step Functions where needed. Ensure proper networking configuration (VPC, security groups, private links) for secure and compliant data access. Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.
* Data Pipeline & Workflow Management: Develop and maintain scalable, reusable ETL frameworks using Spark (Python/Scala). Orchestrate complex workflows, applying CI/CD principles (Git-based version control, automated testing). Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations. Integrate with MLflow (if applicable) for experiment tracking and model versioning, ensuring data lineage and reproducibility.
* Performance Tuning & Optimization: Conduct advanced Spark job tuning (caching strategies, shuffle partitions, broadcast joins, memory optimization). Fine-tune Databricks clusters (autoscaling policies, instance types) to manage cost without compromising performance. Optimize I/O performance and concurrency for large-scale data sets.
* Security & Governance: Implement Unity Catalog or equivalent Databricks features for centralized governance, access control, and data lineage. Ensure compliance with industry standards (e.g., GDPR, SOC, ISO) and internal security policies. Apply IAM best practices across Databricks and AWS to enforce least-privilege access.
* Technical Leadership & Mentorship: Lead and mentor a team of data engineers, conducting code reviews, design reviews, and knowledge-sharing sessions. Champion Agile or Scrum development practices, coordinating sprints and deliverables. Serve as a primary technical liaison, working closely with product managers, data scientists, DevOps, and external stakeholders.
* Monitoring & Reliability: Configure observability solutions (e.g., Datadog, CloudWatch, Prometheus) to proactively identify performance bottlenecks. Set up alerting mechanisms for latency, cost overruns, and cluster health. Maintain SLAs and KPIs for data pipelines, ensuring robust data quality and reliability.
* Innovation & Continuous Improvement: Stay updated on Databricks roadmap and emerging data engineering trends (e.g., Photon, Lakehouse features). Evaluate new tools and technologies, driving POCs to improve data platform capabilities. Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.
Qualifications
* Educational Background: Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or equivalent experience.
* Databricks Expertise: 5+ years of hands-on Databricks (Spark) experience, with a focus on building and maintaining production-grade pipelines.
* AWS Services: Proven track record with AWS S3, EC2, Glue, EMR, Lambda, Step Functions, and security best practices (IAM, VPC).
* Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.
* Data Warehousing & Modeling: Familiarity with RDBMS (e.g., Postgres, Redshift) and dimensional modeling techniques.
* Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.
* Version Control & CI/CD: Git-based workflows (GitHub/GitLab), Jenkins or similar CI/CD tools for automated builds and deployments.
* Leadership & Soft Skills: Demonstrated experience leading a team of data engineers in a complex, high-traffic data environment. Outstanding communication and stakeholder management skills, with the ability to translate technical jargon into business insights. Adept at problem-solving, with a track record of quickly diagnosing and resolving data performance issues.
* Certifications (Preferred): Databricks Certified Associate/Professional (e.g., Databricks Certified Professional Data Engineer). AWS Solutions Architect (Associate or Professional).
Seniority level
* Mid-Senior level
Employment type
* Full-time
Job function
* Management and Manufacturing
#J-18808-Ljbffr