We are seeking Principal/Senior Engineers to architect, build, and optimize large-scale big data platforms. The role focuses on distributed processing, storage systems, data governance, and streaming pipelines. You will lead the design and implementation of high-performance analytics environments using Spark, Hive, Impala, Iceberg, Kafka, Ozone, and other storage technologies.
Responsibilities:
* Design, develop, and optimize distributed data pipelines for batch and streaming workloads.
* Manage and maintain data storage and governance frameworks, including Iceberg tables, Ozone object storage, and metadata management.
* Lead the deployment and tuning of Spark, Hive, and Impala clusters for high-performance analytics.
* Architect and implement streaming data pipelines using Kafka and other messaging platforms.
* Ensure data reliability, consistency, and compliance across enterprise-scale platforms.
* Collaborate with data engineers, analysts, and infrastructure teams to optimize platform performance and scalability.
* Mentor and provide technical guidance to junior engineers, promoting best practices in data architecture and governance
Requirements
* Extensive experience with Spark, Hive, Impala, and Iceberg for batch and streaming analytics.
* Hands-on expertise in Kafka, Ozone, distributed storage, and cloud/on-premises data platforms.
* Strong understanding of data modeling, governance, and metadata management.
* Experience optimizing big data pipelines for throughput, latency, and resource efficiency.
* Familiarity with data security, compliance, and access control frameworks.
* Proven experience in mentoring and leading technical teams.
* Excellent problem-solving skills and experience in large-scale enterprise environments