LOG IN
SIGN UP
Tech Job Finder - Find Software, Technology Sales and Product Manager Jobs.
Sign In
OR continue with e-mail and password
E-mail address
Password
Don't have an account?
Reset password
Join Tech Job Finder
OR continue with e-mail and password
E-mail address
First name
Last name
Username
Password
Confirm Password
How did you hear about us?
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Senior DevOps Engineer

at Nvidia

Back to all Cloud & DevOps jobs
N
Industry not specified

Senior DevOps Engineer

at Nvidia

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 14 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
India

Senior DevOps Engineer to architect and scale Kubernetes-based development, compute and test environments across Windows and Linux. You will design end-to-end container management with Kubernetes, Docker and containerd, and automate operations for thousands of hosts with AI-assisted tooling. Collaborate with a fast-paced team to deploy new data center infrastructure and implement metrics, monitoring, and service automation.

NVIDIA is looking for an outstanding engineering Architect to join its Software Infrastructure and Operations team. The position will be part of a fast-paced crew that develops and maintains sophisticated Kubernetes based development, compute and test environments for a multitude of platforms including Windows and Linux. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. With your help we would forge the next generation of compute infrastructure multiplying the power of the CPU, GPU and DPU for the age of AI. We need a motivated, hardworking and focused individual who has a real passion for operational excellence, Infrastructure services, and automation.

What you’ll be doing:

  • Architect the scaling operation in our data centers. Deploy and Support end-to-end container management solution with Kubernetes, Docker, containerd. Design solutions with service discovery, networking, monitoring, logging, scheduling in Kubernetes.

  • Setup and Manage end to end Compute Infrastructure using PaaS & IaaS services - tools, plugins, nodes, user management, back up, restore, monitoring, etc. Design and develop AI tools needed for automating maintenance of 35000+ hosts with only 12 support engineers.

  • Design and build sophisticated automations and AI powered applications.

  • Use your depth in algorithms and system software background!

  • Work in teams to deploy new data center infrastructure.

  • Plan and implement critical metrics tracking using various data analytics mining methods and dashboards.

  • Reuse AI techniques to extract useful signals about machines and jobs from the data generated!

  • Take part in prototyping, crafting and developing cloud infrastructure for Nvidia.

What we need to see:

  • Strong Kubernetes understanding and background especially on-premises setup and extensive experience with Kubernetes components & subsystems.

  • Experience of maintaining large scale cloud/on-prim infrastructure applications using Kubernetes, Slurm and Open Stack

  • Proven programming background in python/Golang/java and/or relevant scripting languages

  • Excellent debugging and analytical skills and experience in Databases both SQL (MySQL ) and NoSQL (Elastic Search /MongoDB)

  • Proficient with configuration management tools like Ansible, Chef, Puppet and strong experience with Jenkins and/or other CI systems.

  • Hands-on experience with VMs, Dockers, Kubernetes Cluster.

  • Experience with analytics/visualization tools like Kibana, Grafana, Splunk etc. and experience with monitoring systems such as Zabbix and/or Nagios is nice to have

  • 10+ years of proven experience

  • Bachelors or Master's Degree or equivalent experience in CS, Software Engineering, or related field.

Ways to stand out from the crowd:

  • Previous experience with DevOps/SRE teams

  • Thrives in a multi-tasking environment with constantly evolving priorities and documents work well

  • Outstanding collaboration skills across organizational boundaries, experience with using and improving data centers and with computer algorithms and ability to choose the best possible algorithms to meet the scaling challenge

  • Ability to divide complex problems into simple sub problems and then reuse available solutions to implement most of those

  • Experience with designing simple systems that can work reliably without needing much support

Senior DevOps Engineer

at Nvidia

Back to all Cloud & DevOps jobs
N
Industry not specified

Senior DevOps Engineer

at Nvidia

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 14 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
India

Senior DevOps Engineer to architect and scale Kubernetes-based development, compute and test environments across Windows and Linux. You will design end-to-end container management with Kubernetes, Docker and containerd, and automate operations for thousands of hosts with AI-assisted tooling. Collaborate with a fast-paced team to deploy new data center infrastructure and implement metrics, monitoring, and service automation.

NVIDIA is looking for an outstanding engineering Architect to join its Software Infrastructure and Operations team. The position will be part of a fast-paced crew that develops and maintains sophisticated Kubernetes based development, compute and test environments for a multitude of platforms including Windows and Linux. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. With your help we would forge the next generation of compute infrastructure multiplying the power of the CPU, GPU and DPU for the age of AI. We need a motivated, hardworking and focused individual who has a real passion for operational excellence, Infrastructure services, and automation.

What you’ll be doing:

  • Architect the scaling operation in our data centers. Deploy and Support end-to-end container management solution with Kubernetes, Docker, containerd. Design solutions with service discovery, networking, monitoring, logging, scheduling in Kubernetes.

  • Setup and Manage end to end Compute Infrastructure using PaaS & IaaS services - tools, plugins, nodes, user management, back up, restore, monitoring, etc. Design and develop AI tools needed for automating maintenance of 35000+ hosts with only 12 support engineers.

  • Design and build sophisticated automations and AI powered applications.

  • Use your depth in algorithms and system software background!

  • Work in teams to deploy new data center infrastructure.

  • Plan and implement critical metrics tracking using various data analytics mining methods and dashboards.

  • Reuse AI techniques to extract useful signals about machines and jobs from the data generated!

  • Take part in prototyping, crafting and developing cloud infrastructure for Nvidia.

What we need to see:

  • Strong Kubernetes understanding and background especially on-premises setup and extensive experience with Kubernetes components & subsystems.

  • Experience of maintaining large scale cloud/on-prim infrastructure applications using Kubernetes, Slurm and Open Stack

  • Proven programming background in python/Golang/java and/or relevant scripting languages

  • Excellent debugging and analytical skills and experience in Databases both SQL (MySQL ) and NoSQL (Elastic Search /MongoDB)

  • Proficient with configuration management tools like Ansible, Chef, Puppet and strong experience with Jenkins and/or other CI systems.

  • Hands-on experience with VMs, Dockers, Kubernetes Cluster.

  • Experience with analytics/visualization tools like Kibana, Grafana, Splunk etc. and experience with monitoring systems such as Zabbix and/or Nagios is nice to have

  • 10+ years of proven experience

  • Bachelors or Master's Degree or equivalent experience in CS, Software Engineering, or related field.

Ways to stand out from the crowd:

  • Previous experience with DevOps/SRE teams

  • Thrives in a multi-tasking environment with constantly evolving priorities and documents work well

  • Outstanding collaboration skills across organizational boundaries, experience with using and improving data centers and with computer algorithms and ability to choose the best possible algorithms to meet the scaling challenge

  • Ability to divide complex problems into simple sub problems and then reuse available solutions to implement most of those

  • Experience with designing simple systems that can work reliably without needing much support

SIMILAR OPPORTUNITIES

No similar jobs available at the moment.