
AI & HPC Infrastructure Engineer
at Accenture
Posted 6 days ago
No clicks
- Compensation
- Not specified
- City
- Not specified
- Country
- Not specified
Currency: Not specified
Design and implement HPC and AI infrastructure solutions across on-prem, cloud, and hybrid environments, aligning architecture with performance and scalability needs. Deploy XPU-based clusters (CPU/GPU/accelerators) using schedulers, VM/K8s orchestration, Slurm, and containerized platforms to deliver MaaS, GPUaaS, and AIaaS offerings. Optimize cluster performance, energy efficiency, and cost, while integrating AI/HPC platforms with existing IT systems, data pipelines, and security frameworks. Provide technical guidance and support to users for running HPC/AI workloads and large-scale models, and develop architecture diagrams, runbooks, and operational documentation. Travel may be required (25%–100%).
We Are:
The Global Infrastructure Engineering AI & HPC team is at the center of enabling infrastructure reinvention for the next era of digital solutions powered by AI and High-Performance Computing (HPC). We bring together deep technical expertise across cloud, on-prem, and hybrid environments to design, build, and operate accelerated infrastructure that powers high-performance workloads at scale. Our solutions enable some of our most strategic and mission-critical clients to unlock new levels of performance, efficiency, and innovation. Our remit spans the full lifecycle—from strategy and architecture through implementation and operations—driving modernization across the entire infrastructure stack. We collaborate across the ecosystem to harness emerging technologies, fuel growth, and transform industries. In this rapidly growing market, our team is leading the way in shaping how enterprises leverage AI and HPC to drive breakthrough innovation and reimagine what’s possible in infrastructure.
Key Responsibilities:
Design and implement HPC and AI infrastructure solutions, aligning system architecture and deployment roadmaps to industry-specific performance and scalability needs
Deploy, configure, and manage XPU-based clusters (CPU/GPU/accelerators) using schedulers, VM/K8s orchestration platforms, Slurm, and containerized platforms in scalable designs to provide Metal as a Service (MaaS), GPUaaS, AIaaS, and other offerings
Optimize cluster performance, scalability, energy, and cost efficiency across on-premises, cloud, and hybrid environments
Integrate AI and HPC platforms with existing IT systems, data pipelines, and security frameworks
Monitor, troubleshoot, and tune infrastructure to ensure high availability, low-latency networking, and workload resiliency
Develop and maintain documentation including architecture diagrams, configuration baselines, and operational runbooks
Provide technical guidance and support to users, enabling efficient execution of HPC/AI workloads, large-scale models, and simulations.
Travel may be required for this role. The amount of travel will vary from 25% to 100% depending on business need and client requirements.

