LOG IN
SIGN UP
Tech Job Finder - Find Software, Technology Sales and Product Manager Jobs.
Sign In
OR continue with e-mail and password
E-mail address
Password
Don't have an account?
Reset password
Join Tech Job Finder
OR continue with e-mail and password
E-mail address
First name
Last name
Username
Password
Confirm Password
How did you hear about us?
By signing up, you agree to our Terms & Conditions and Privacy Policy.

GPU Accelerator Returns Debug Engineer

at Advanced Micro Devices

Back to all Python jobs
A
Industry not specified

GPU Accelerator Returns Debug Engineer

at Advanced Micro Devices

Mid LevelNo visa sponsorshipPython

Posted 3 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
New York City
Country
United States

Role involves performing PCBA-level failure analysis on GPU accelerators to reproduce reported issues, isolate root causes, and drive corrective actions with cross-functional teams (design, firmware, validation, and manufacturing). You will develop and execute DoE-based tests to reproduce hard-to-find failures, build automation/tools to run tests, and document findings into the FA database with complete customer-facing reports. You will triage with contract manufacturers and AMD teams to converge on root cause, present findings to senior management, and help implement continuous improvements and test station setups for new products. Ideal candidates bring deep GPU architecture, PCB diagnostics, Python scripting, and cross-platform (Windows/Linux) experience, plus strong communication and documentation skills.

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: The Quality Returns Debug Team is looking for an experienced GPU PCBA Debug and Failure Analysis Engineer who will work with our engineering functions to perform board level (PCBA) failure analysis on customer and factory failures of GPU Accelerators to reproduce reported failures, isolate the cause of failure and work closely with cross-functional teams including design, validation, FW and manufacturing to drive root cause analysis and corrective actions. Your contributions will directly impact product quality, reliability, and customer satisfaction. THE PERSON: The ideal candidate is a skilled engineer with a strong analytical mindset and hands-on approach to technical problem-solving. They excel in both collaborative and independent environments, demonstrating initiative, adaptability, and a drive to tackle new challenges in fast-paced settings. With experience in system integration and High Performance Computing, they bring a proactive attitude and the ability to manage multiple tasks with limited supervision. Their excellent communication skills support effective teamwork and documentation, while their curiosity and persistence enable them to deliver high-quality solutions through thorough failure analysis and repair. KEY RESPONSIBILITIES: Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures for continuous yield & quality improvements, and customer quality support within expected timelines. Develop and execute DOE's that run targeted tests to reproduce and isolate hard to find failures. Develop Automation and tools to run tests and analyze results/logs. Perform triage and communicate with the contract manufacturer and/or internal AMD teams (such as Design, BIOS, firmware, memory, I/O, display, diagnostics, Test Engineering, Board operations, etc.) as needed to converge on failure reproduction efforts and root cause identification. Document all findings into FA database and create a complete failure analysis report for customer consumption as needed. Present findings to key stakeholders, including senior management. Implement ongoing continuous improvements of failure analysis process & techniques and create procedures of the steps to follow. Oversee the set-up of new products and test stations for Failure Analysis operations. PREFERRED EXPERIENCE: Deep expertise in GPU architecture, including debug, validation, and stress/functional test development. Skilled in using lab equipment (oscilloscopes, logic analyzers, custom test tools) for hardware validation. Strong background in PCBA diagnostics, failure analysis, and debug techniques, from NPI through production. Proficient in Python, shell scripting, and working across Windows and Linux environments. Solid understanding of firmware, drivers, and hardware interactions, with the ability to tune firmware as needed. Extensive experience in hardware verification and system integration. Familiarity with PCBA manufacturing processes and IPC-A-610 quality standards. Hands-on experience assembling, installing, and configuring computer systems and servers. Strong leadership, communication, documentation, and presentation skills. Able to read schematics, interpret datasheets, identify components, and perform soldering/rework for debug. Proficient in MS Excel for data analysis and reporting. Knowledge of high-speed digital design, memory interfaces (HBM, GDDR), PCIe, and display outputs (DP, HDMI). Experience with GPU data center infrastructure and AI/ML technologies is a plus. ACADEMIC CREDENTIALS: Bachelor’s or master's degree in electrical or computer engineering preferred. LOCATION: Secaucus, NJ #LI-BS1 Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

GPU Accelerator Returns Debug Engineer

at Advanced Micro Devices

Back to all Python jobs
A
Industry not specified

GPU Accelerator Returns Debug Engineer

at Advanced Micro Devices

Mid LevelNo visa sponsorshipPython

Posted 3 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
New York City
Country
United States

Role involves performing PCBA-level failure analysis on GPU accelerators to reproduce reported issues, isolate root causes, and drive corrective actions with cross-functional teams (design, firmware, validation, and manufacturing). You will develop and execute DoE-based tests to reproduce hard-to-find failures, build automation/tools to run tests, and document findings into the FA database with complete customer-facing reports. You will triage with contract manufacturers and AMD teams to converge on root cause, present findings to senior management, and help implement continuous improvements and test station setups for new products. Ideal candidates bring deep GPU architecture, PCB diagnostics, Python scripting, and cross-platform (Windows/Linux) experience, plus strong communication and documentation skills.

WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: The Quality Returns Debug Team is looking for an experienced GPU PCBA Debug and Failure Analysis Engineer who will work with our engineering functions to perform board level (PCBA) failure analysis on customer and factory failures of GPU Accelerators to reproduce reported failures, isolate the cause of failure and work closely with cross-functional teams including design, validation, FW and manufacturing to drive root cause analysis and corrective actions. Your contributions will directly impact product quality, reliability, and customer satisfaction. THE PERSON: The ideal candidate is a skilled engineer with a strong analytical mindset and hands-on approach to technical problem-solving. They excel in both collaborative and independent environments, demonstrating initiative, adaptability, and a drive to tackle new challenges in fast-paced settings. With experience in system integration and High Performance Computing, they bring a proactive attitude and the ability to manage multiple tasks with limited supervision. Their excellent communication skills support effective teamwork and documentation, while their curiosity and persistence enable them to deliver high-quality solutions through thorough failure analysis and repair. KEY RESPONSIBILITIES: Support internal and external requests to troubleshoot PCBA-level AMD GPU product failures for continuous yield & quality improvements, and customer quality support within expected timelines. Develop and execute DOE's that run targeted tests to reproduce and isolate hard to find failures. Develop Automation and tools to run tests and analyze results/logs. Perform triage and communicate with the contract manufacturer and/or internal AMD teams (such as Design, BIOS, firmware, memory, I/O, display, diagnostics, Test Engineering, Board operations, etc.) as needed to converge on failure reproduction efforts and root cause identification. Document all findings into FA database and create a complete failure analysis report for customer consumption as needed. Present findings to key stakeholders, including senior management. Implement ongoing continuous improvements of failure analysis process & techniques and create procedures of the steps to follow. Oversee the set-up of new products and test stations for Failure Analysis operations. PREFERRED EXPERIENCE: Deep expertise in GPU architecture, including debug, validation, and stress/functional test development. Skilled in using lab equipment (oscilloscopes, logic analyzers, custom test tools) for hardware validation. Strong background in PCBA diagnostics, failure analysis, and debug techniques, from NPI through production. Proficient in Python, shell scripting, and working across Windows and Linux environments. Solid understanding of firmware, drivers, and hardware interactions, with the ability to tune firmware as needed. Extensive experience in hardware verification and system integration. Familiarity with PCBA manufacturing processes and IPC-A-610 quality standards. Hands-on experience assembling, installing, and configuring computer systems and servers. Strong leadership, communication, documentation, and presentation skills. Able to read schematics, interpret datasheets, identify components, and perform soldering/rework for debug. Proficient in MS Excel for data analysis and reporting. Knowledge of high-speed digital design, memory interfaces (HBM, GDDR), PCIe, and display outputs (DP, HDMI). Experience with GPU data center infrastructure and AI/ML technologies is a plus. ACADEMIC CREDENTIALS: Bachelor’s or master's degree in electrical or computer engineering preferred. LOCATION: Secaucus, NJ #LI-BS1 Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.

SIMILAR OPPORTUNITIES

No similar jobs available at the moment.