Minimum qualifications: Bachelor's degree in Engineering, Computer Science, a related field, or equivalent practical experience. 6 years of experience working with client-side web technologies (e.g., HTML, CSS, JavaScript, or HTTP). Experience troubleshooting technical issues for internal/external partners or customers. Preferred qualifications: Experience working directly with AI/ML computing hardware, including Graphics Processing Units (GPUs) or other accelerators. Experience working with large-scale distributed systems and with ML frameworks (e.g., TensorFlow, Pytorch). Familiarity with containerization and orchestration technologies like Kubernetes or Slurm in an on-premises or cloud environment. Familiarity with common solutions, design patterns, or best practices. Understanding of the AI/ML training and inference life-cycle. About the job Our Technical Solutions Engineers for AI Infrastructure own customer issues and provide specialized support to other teams. In this role, you will be a part of a global team that provides 24/7 support to ensure customers can seamlessly deploy their Artificial Intelligence (AI) and Machine Learning (ML) workloads on AI Infrastructure products. When customers encounter technical issues, you will ensure we have the expertise, tools, and processes to resolve the issue. You will troubleshoot technical problems with a mix of hardware and software debugging, networking, Linux system administration, coding/scripting, and updating documentation. You will help our customer's success in the AI/ML space by making improvements to the product, internal tools, processes, and documentation. You will help drive business growth by recognizing and advocating for our customers' issues related to AI Cloud accelerates every organization's ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google's cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems. Responsibilities Develop an in-depth understanding of AI/ML workloads and underlying hardware architectures by troubleshooting, reproducing, determining the root cause for customer reported issues, and building tools for faster diagnosis. Manage customer's problems through effective diagnosis, resolution, or implementation of new investigation tools to increase productivity for customer issues on AI/ML infrastructure. Act as a consultant and subject matter expert for internal stakeholders in Engineering, Sales, and customer organizations to resolve deployment and operational obstacles in AI infrastructure environments. Work closely with multiple Product and Engineering teams to find ways to improve the product, and interact with our Site Reliability Engineering (SRE) teams to drive high-quality production. Maintain availability for non-standard work hours or shifts, which may include weekends as needed. To be considered for this role you will be redirected to and must complete the application process on our careers page. To start the process, click the Apply button below to Login/Register.