Industry not specified

Senior AI Compute Engineer – HPC, GPUs & Distributed Training

at Qualcomm

JuniorNo visa sponsorshipData Science/AI/ML

Posted 9 hours ago

No clicks

Compensation: Not specified
City: Not specified
Country: Not specified

We are seeking a Senior AI Compute Engineer to develop, optimize, and scale compute infrastructure for training and deploying ML and GenAI workloads. This role focuses on GPU acceleration, distributed training frameworks, and high-performance compute systems. Responsibilities include designing GPU-accelerated training pipelines, implementing distributed training strategies (e.g., PyTorch Distributed or DeepSpeed), working with HPC clusters and multi-GPU systems, profiling and optimizing performance, and collaborating with ML and platform teams to deploy scalable compute solutions. You will also develop tools for monitoring, scheduling, and managing large-scale training jobs and optimize CUDA kernels, memory usage, and compute flows where needed.

Company:

Qualcomm India Private Limited

Job Area:

Engineering Group, Engineering Group > Hardware Engineering

General Summary:

Minimum Qualifications:

• Bachelor's degree in Computer Science, Electrical/Electronics Engineering, Engineering, or related field and 4+ years of Hardware Engineering or related work experience.
OR
Master's degree in Computer Science, Electrical/Electronics Engineering, Engineering, or related field and 3+ years of Hardware Engineering or related work experience.
OR
PhD in Computer Science, Electrical/Electronics Engineering, Engineering, or related field and 2+ years of Hardware Engineering or related work experience.

Key Responsibilities

Design and optimize GPU‑accelerated training pipelines for ML and LLM workloads.
Implement distributed training strategies using frameworks like PyTorch Distributed or DeepSpeed.
Work with HPC clusters, multi‑GPU systems, and parallel computing architectures.
Profile, optimize, and troubleshoot compute performance bottlenecks.
Collaborate with ML and platform teams to integrate scalable compute solutions.
Develop tools for monitoring, scheduling, and managing large‑scale training jobs.
Optimize CUDA kernels, memory usage, and compute flows where needed.

Minimum Qualifications:

Bachelor’s or Master’s in Computer Science, Computational Engineering, or similar.
Strong expertise in GPU computing, CUDA, or parallel processing.
3–8 years of experience working with ML model training environments.
Hands‑on experience with distributed training frameworks.
Solid understanding of computer architecture and performance optimization.
Strong analytical and problem-solving skills.
Hands-on experience with supervised and unsupervised learning techniques (e.g., classification, clustering, dimensionality reduction).
Experience with ML frameworks such as scikit-learn, TensorFlow, or PyTorch

Preferred Qualifications:

Experience with multi‑node training, HPC clusters, or cloud GPU environments.
Experience in large Model Development & Training from the Scratch.
Familiarity with model parallelism, pipeline parallelism, or large‑scale DL training.
Experience with deep neural network architectures including RNNs, and Transformers.
GenAI, LLMs, RAG Optimization. LLM Finetuning, Distillation Experience.

Applicants: Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here. Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process. Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities. We will not respond here to requests for updates on applications or resume inquiries).

Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.

To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.

If you would like more information about this role, please contact Qualcomm Careers.

Back to all Data Science / AI / ML jobs

Apply now

Industry not specified

Senior AI Compute Engineer – HPC, GPUs & Distributed Training

at Qualcomm

JuniorNo visa sponsorshipData Science/AI/ML

Posted 9 hours ago

No clicks

Compensation: Not specified
City: Not specified
Country: Not specified

Company:

Qualcomm India Private Limited

Job Area:

Engineering Group, Engineering Group > Hardware Engineering

General Summary:

Minimum Qualifications:

Key Responsibilities

Design and optimize GPU‑accelerated training pipelines for ML and LLM workloads.
Implement distributed training strategies using frameworks like PyTorch Distributed or DeepSpeed.
Work with HPC clusters, multi‑GPU systems, and parallel computing architectures.
Profile, optimize, and troubleshoot compute performance bottlenecks.
Collaborate with ML and platform teams to integrate scalable compute solutions.
Develop tools for monitoring, scheduling, and managing large‑scale training jobs.
Optimize CUDA kernels, memory usage, and compute flows where needed.

Minimum Qualifications:

Bachelor’s or Master’s in Computer Science, Computational Engineering, or similar.
Strong expertise in GPU computing, CUDA, or parallel processing.
3–8 years of experience working with ML model training environments.
Hands‑on experience with distributed training frameworks.
Solid understanding of computer architecture and performance optimization.
Strong analytical and problem-solving skills.
Hands-on experience with supervised and unsupervised learning techniques (e.g., classification, clustering, dimensionality reduction).
Experience with ML frameworks such as scikit-learn, TensorFlow, or PyTorch

Preferred Qualifications:

Experience with multi‑node training, HPC clusters, or cloud GPU environments.
Experience in large Model Development & Training from the Scratch.
Familiarity with model parallelism, pipeline parallelism, or large‑scale DL training.
Experience with deep neural network architectures including RNNs, and Transformers.
GenAI, LLMs, RAG Optimization. LLM Finetuning, Distillation Experience.

If you would like more information about this role, please contact Qualcomm Careers.