LOG IN
SIGN UP
Tech Job Finder - Find Software, Technology Sales and Product Manager Jobs.
Sign In
OR continue with e-mail and password
E-mail address
Password
Don't have an account?
Reset password
Join Tech Job Finder
OR continue with e-mail and password
E-mail address
Username
Password
Confirm Password
How did you hear about us?
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Site Reliability Engineer - Vice President

at Goldman Sachs

Back to all Cloud & DevOps jobs
Goldman Sachs logo
Bulge Bracket Investment Banks

Site Reliability Engineer - Vice President

at Goldman Sachs

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 19 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Hyderabad, Bengaluru
Country
India

Senior Site Reliability Engineer role responsible for ensuring availability, reliability, and scalability of critical platform applications and services. Lead architectural design, automation, incident response, capacity planning, and observability efforts across on-premises and multi-cloud environments. Provide technical vision, mentor senior engineers, and drive adoption of advanced SRE practices and tooling. Engage with cross-functional teams and executive stakeholders to embed reliability into application design and operations.

Engineering-L2-Hyderabad-Vice President-Software Engineering-Bengaluru/HyderabadHyderabad, Telangana, India
Opportunity Overview
CORPORATE TITLEVice President
OFFICE LOCATION(S)Hyderabad
JOB FUNCTIONSoftware Engineering
DIVISIONEngineering Division

Site Reliability Engineer - Vice President

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run scalable, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for improving the availability and reliability of the firm’s most critical platform services and ensures they meet the requirements of our internal and external users. It is also responsible for the firmwide policies and standards focused on firm’s digital resilience. We are looking for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.

 

The SRE team develops and maintains platforms and tools which help other Engineering teams in Goldman Sachs to build and operate reliable and resilient systems. These systems span on-premises datacenters and multiple public cloud environments.   The platforms we offer include central logging, monitoring, agents and alerting and we provide tools to drive adoption and improvements to capacity planning, operational readiness assessments, production incident postmortems, SLIs / SLOs, and deployment automation including canary releases.

 

The products and services we provide to our internal customers are used by thousands of engineers every day. We believe that reliability is the most important feature of any system, and we are devoted to giving our engineers the platforms and tools they need to build and operate reliable products.

  Role Overview

As a Site Reliability Engineer (SRE) at Goldman Sachs, you will be a pivotal leader in ensuring the availability, reliability, and scalability of the firm's most critical platform applications and services. You will combine deep software and systems engineering expertise to architect, build, and run large-scale, massively distributed, fault-tolerant systems. This role involves providing technical leadership, mentoring senior engineers, and collaborating closely with internal teams and executive stakeholders to build and operate sustainable production systems that can adapt to our dynamic global business environment. You will drive a culture of continuous improvement, championing the adoption of advanced SRE principles and best practices across the organization.

 Responsibilities

  • Strategic Reliability & Performance: Drive the strategic direction for availability, scalability, and performance of mission-critical applications and platform services, ensuring alignment with firm-wide objectives.
  • Architectural Leadership: Lead the design, build, and implementation of highly available, resilient, and scalable infrastructure and application architectures.
  • Advanced Automation & Tooling: Architect and develop sophisticated platforms, tools, and automation solutions to eliminate toil, optimize operational workflows, and enhance deployment processes across the enterprise.
  • Complex Incident Management & Post-Mortem Analysis: Lead critical incident response, conduct in-depth root cause analysis for systemic issues, and implement long-term preventative measures to significantly enhance system stability and resilience.
  • System Design & Capacity Planning: Partner with development teams to embed reliability into application design from inception, provide expert system design consulting, and lead comprehensive capacity planning initiatives for future growth.
  • Observability & Insights: Define and implement advanced monitoring, high volume logging with multi-user query capabilities, and tracing strategies to provide deep, actionable insights into application performance, infrastructure health, and user experience.
  • Technical Vision & Mentorship: Provide technical vision, lead complex technical projects, conduct rigorous code reviews, enforce SDLC best practices, and actively mentor and develop senior and staff-level engineers.
  • Technology Evaluation & Adoption: Stay at the forefront of industry trends and advancements, evaluating and integrating cutting-edge tools and frameworks to significantly improve operational efficiency and reliability.
  • On-Call Leadership: Participate in and lead on-call rotations, providing expert guidance and hands-on support for critical system incidents.
Qualifications
  • Experience: Minimum of 10-15 years of hands-on experience in Site Reliability Engineering, with a proven track record in architecting, designing, building, and maintaining highly available, scalable, and fault-tolerant systems at an enterprise level.
  • Technical Proficiency:
    • Exceptional programming skills in one or more major languages such as Java, Python, Go with a focus on building robust, scalable software.
    • Extensive hands-on experience with cloud platforms (e.g., AWS, GCP) and deep expertise in containerization and orchestration technologies (e.g., Docker, Kubernetes).
    • Mastery of Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and configuration management tools (e.g., Puppet, Chef, Ansible).
    • Profound understanding of Linux internals, networking, distributed systems, and advanced system performance tuning.
    • Expertise in designing and implementing comprehensive monitoring, alerting, logging and tracing solutions (e.g., Prometheus, Grafana, ELK stack, Datadog, PagerDuty).
    • Deep experience with CI/CD tools and practices (e.g., Jenkins, GitLab, Maven).
    • Strong foundation in databases and distributed systems.
    • Exceptional problem-solving abilities and analytical skills, with a track record of resolving complex technical challenges.
  • Preferred Experience:
    • Experience with Distributed Databases like Elastic Search
    • Experience with working on GCP Big Query
    • Experience with messaging Systems Like Kafka
  • Education: Advanced degree (Bachelor’s or Mas ter's or PhD) in Computer Science or a related technical field involving coding and/or systems engineering, or equivalent practical experience.
  • Soft Skills: Superior communication, collaboration, and interpersonal skills, with the ability to influence technical direction, lead cross-functional initiatives, and effectively engage with global teams and executive leadership. Proven ability to work independently, manage multiple complex stakeholders, and drive significant organizational change.
© The Goldman Sachs Group, Inc., 2023. All rights reserved.
Goldman Sachs is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, national origin, age, veterans status, disability, or any other characteristic protected by applicable law.

 

We Offer Best-In-Class Benefits
Goldman Sachs Benefits
Healthcare & Medical Insurance
Healthcare & Medical Insurance
We offer a wide range of health and welfare programs that vary depending on office location. These generally include medical, dental, short-term disability, long-term disability, life, accidental death, labor accident and business travel accident insurance.
Holiday & Vacation Policies
Holiday & Vacation Policies
We offer competitive vacation policies based on employee level and office location. We promote time off from work to recharge by providing generous vacation entitlements and a minimum of three weeks expected vacation usage each year.
Financial Wellness & Retirement
Financial Wellness & Retirement
We assist employees in saving and planning for retirement, offer financial support for higher education, and provide a number of benefits to help employees prepare for the unexpected. We offer live financial education and content on a variety of topics to address the spectrum of employees’ priorities.
Health Services
Health Services
We offer a medical advocacy service for employees and family members facing critical health situations, and counseling and referral services through the Employee Assistance Program (EAP). We provide Global Medical, Security and Travel Assistance and a Workplace Ergonomics Program. We also offer state-of-the-art on-site health centers in certain offices.
Fitness
Fitness
To encourage employees to live a healthy and active lifestyle, some of our offices feature on-site fitness centers. For eligible employees we typically reimburse fees paid for a fitness club membership or activity (up to a pre-approved amount).
Child Care & Family Care
Child Care & Family Care
We offer on-site child care centers that provide full-time and emergency back-up care, as well as mother and baby rooms and homework rooms. In every office, we provide advice and counseling services, expectant parent resources and transitional programs for parents returning from parental leave. Adoption, surrogacy, egg donation and egg retrieval stipends are also available.
Benefits at Goldman Sachs
Benefits at Goldman Sachs
Read more about the full suite of class-leading benefits our firm has to offer.

Site Reliability Engineer - Vice President

at Goldman Sachs

Back to all Cloud & DevOps jobs
Goldman Sachs logo
Bulge Bracket Investment Banks

Site Reliability Engineer - Vice President

at Goldman Sachs

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 19 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Hyderabad, Bengaluru
Country
India

Senior Site Reliability Engineer role responsible for ensuring availability, reliability, and scalability of critical platform applications and services. Lead architectural design, automation, incident response, capacity planning, and observability efforts across on-premises and multi-cloud environments. Provide technical vision, mentor senior engineers, and drive adoption of advanced SRE practices and tooling. Engage with cross-functional teams and executive stakeholders to embed reliability into application design and operations.

Engineering-L2-Hyderabad-Vice President-Software Engineering-Bengaluru/HyderabadHyderabad, Telangana, India
Opportunity Overview
CORPORATE TITLEVice President
OFFICE LOCATION(S)Hyderabad
JOB FUNCTIONSoftware Engineering
DIVISIONEngineering Division

Site Reliability Engineer - Vice President

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run scalable, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for improving the availability and reliability of the firm’s most critical platform services and ensures they meet the requirements of our internal and external users. It is also responsible for the firmwide policies and standards focused on firm’s digital resilience. We are looking for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.

 

The SRE team develops and maintains platforms and tools which help other Engineering teams in Goldman Sachs to build and operate reliable and resilient systems. These systems span on-premises datacenters and multiple public cloud environments.   The platforms we offer include central logging, monitoring, agents and alerting and we provide tools to drive adoption and improvements to capacity planning, operational readiness assessments, production incident postmortems, SLIs / SLOs, and deployment automation including canary releases.

 

The products and services we provide to our internal customers are used by thousands of engineers every day. We believe that reliability is the most important feature of any system, and we are devoted to giving our engineers the platforms and tools they need to build and operate reliable products.

  Role Overview

As a Site Reliability Engineer (SRE) at Goldman Sachs, you will be a pivotal leader in ensuring the availability, reliability, and scalability of the firm's most critical platform applications and services. You will combine deep software and systems engineering expertise to architect, build, and run large-scale, massively distributed, fault-tolerant systems. This role involves providing technical leadership, mentoring senior engineers, and collaborating closely with internal teams and executive stakeholders to build and operate sustainable production systems that can adapt to our dynamic global business environment. You will drive a culture of continuous improvement, championing the adoption of advanced SRE principles and best practices across the organization.

 Responsibilities

  • Strategic Reliability & Performance: Drive the strategic direction for availability, scalability, and performance of mission-critical applications and platform services, ensuring alignment with firm-wide objectives.
  • Architectural Leadership: Lead the design, build, and implementation of highly available, resilient, and scalable infrastructure and application architectures.
  • Advanced Automation & Tooling: Architect and develop sophisticated platforms, tools, and automation solutions to eliminate toil, optimize operational workflows, and enhance deployment processes across the enterprise.
  • Complex Incident Management & Post-Mortem Analysis: Lead critical incident response, conduct in-depth root cause analysis for systemic issues, and implement long-term preventative measures to significantly enhance system stability and resilience.
  • System Design & Capacity Planning: Partner with development teams to embed reliability into application design from inception, provide expert system design consulting, and lead comprehensive capacity planning initiatives for future growth.
  • Observability & Insights: Define and implement advanced monitoring, high volume logging with multi-user query capabilities, and tracing strategies to provide deep, actionable insights into application performance, infrastructure health, and user experience.
  • Technical Vision & Mentorship: Provide technical vision, lead complex technical projects, conduct rigorous code reviews, enforce SDLC best practices, and actively mentor and develop senior and staff-level engineers.
  • Technology Evaluation & Adoption: Stay at the forefront of industry trends and advancements, evaluating and integrating cutting-edge tools and frameworks to significantly improve operational efficiency and reliability.
  • On-Call Leadership: Participate in and lead on-call rotations, providing expert guidance and hands-on support for critical system incidents.
Qualifications
  • Experience: Minimum of 10-15 years of hands-on experience in Site Reliability Engineering, with a proven track record in architecting, designing, building, and maintaining highly available, scalable, and fault-tolerant systems at an enterprise level.
  • Technical Proficiency:
    • Exceptional programming skills in one or more major languages such as Java, Python, Go with a focus on building robust, scalable software.
    • Extensive hands-on experience with cloud platforms (e.g., AWS, GCP) and deep expertise in containerization and orchestration technologies (e.g., Docker, Kubernetes).
    • Mastery of Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation) and configuration management tools (e.g., Puppet, Chef, Ansible).
    • Profound understanding of Linux internals, networking, distributed systems, and advanced system performance tuning.
    • Expertise in designing and implementing comprehensive monitoring, alerting, logging and tracing solutions (e.g., Prometheus, Grafana, ELK stack, Datadog, PagerDuty).
    • Deep experience with CI/CD tools and practices (e.g., Jenkins, GitLab, Maven).
    • Strong foundation in databases and distributed systems.
    • Exceptional problem-solving abilities and analytical skills, with a track record of resolving complex technical challenges.
  • Preferred Experience:
    • Experience with Distributed Databases like Elastic Search
    • Experience with working on GCP Big Query
    • Experience with messaging Systems Like Kafka
  • Education: Advanced degree (Bachelor’s or Mas ter's or PhD) in Computer Science or a related technical field involving coding and/or systems engineering, or equivalent practical experience.
  • Soft Skills: Superior communication, collaboration, and interpersonal skills, with the ability to influence technical direction, lead cross-functional initiatives, and effectively engage with global teams and executive leadership. Proven ability to work independently, manage multiple complex stakeholders, and drive significant organizational change.
© The Goldman Sachs Group, Inc., 2023. All rights reserved.
Goldman Sachs is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, national origin, age, veterans status, disability, or any other characteristic protected by applicable law.

 

We Offer Best-In-Class Benefits
Goldman Sachs Benefits
Healthcare & Medical Insurance
Healthcare & Medical Insurance
We offer a wide range of health and welfare programs that vary depending on office location. These generally include medical, dental, short-term disability, long-term disability, life, accidental death, labor accident and business travel accident insurance.
Holiday & Vacation Policies
Holiday & Vacation Policies
We offer competitive vacation policies based on employee level and office location. We promote time off from work to recharge by providing generous vacation entitlements and a minimum of three weeks expected vacation usage each year.
Financial Wellness & Retirement
Financial Wellness & Retirement
We assist employees in saving and planning for retirement, offer financial support for higher education, and provide a number of benefits to help employees prepare for the unexpected. We offer live financial education and content on a variety of topics to address the spectrum of employees’ priorities.
Health Services
Health Services
We offer a medical advocacy service for employees and family members facing critical health situations, and counseling and referral services through the Employee Assistance Program (EAP). We provide Global Medical, Security and Travel Assistance and a Workplace Ergonomics Program. We also offer state-of-the-art on-site health centers in certain offices.
Fitness
Fitness
To encourage employees to live a healthy and active lifestyle, some of our offices feature on-site fitness centers. For eligible employees we typically reimburse fees paid for a fitness club membership or activity (up to a pre-approved amount).
Child Care & Family Care
Child Care & Family Care
We offer on-site child care centers that provide full-time and emergency back-up care, as well as mother and baby rooms and homework rooms. In every office, we provide advice and counseling services, expectant parent resources and transitional programs for parents returning from parental leave. Adoption, surrogacy, egg donation and egg retrieval stipends are also available.
Benefits at Goldman Sachs
Benefits at Goldman Sachs
Read more about the full suite of class-leading benefits our firm has to offer.