Deep Learning Performance Architect - Intern - 2026
at Nvidia
Posted 3 hours ago
No clicks
- Compensation
- Not specified
- City
- Shanghai
- Country
- China
Currency: Not specified
NVIDIA seeks an intern Deep Learning system performance architect to help model, analyze, and optimize DL performance on state-of-the-art hardware for LLM workloads. The role involves analyzing modern DL networks, developing analytical models to inform processor and system architecture for performance and efficiency, and specifying hardware/software configurations and metrics to evaluate performance, power, and accuracy. The intern will collaborate with architecture, software, and product teams to guide next-generation HW/SW directions. This is an intern position focusing on AI performance modelling, analysis, and optimization.
NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company.
What you’ll be doing:
Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.
Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.
Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.
Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
What we need to see:
BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).
Strong programming skills in Python, C, C++.
Strong background in computer architecture.
Experience with performance modeling, architecture simulation, profiling, and analysis.
Prior experience with LLM or generative AI algorithms.
Ways to stand out from the crowd:
GPU Computing and parallel programming models such as CUDA and OpenCL.
Architecture of or workload analysis on other deep learning accelerators.
Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).
Open-source AI compilers (OpenAI Triton, MLIR, TVM, XLA, etc.).
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

