Bulge Bracket Investment Banks

Principal Site Reliability Engineer

at J.P. Morgan

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 19 days ago

No clicks

Compensation: Not specified
City: Seattle
Country: United States

Senior Site Reliability Engineer role focused on improving reliability, observability, and operational efficiency across Consumer and Community Banking infrastructure. You will design, implement, and manage infrastructure components, observability practices, and automation to support end-to-end software development and incident management. The role partners with cross-functional teams including Cybersecurity and Data to make data discoverable, reliable, and accessible, accelerating BI and AI/ML initiatives. It also leads medium-to-large projects and participates in support coverage for critical applications.

Location: Seattle, WA, United States

Join a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact.

As a Principal Site Reliability Engineer at JP Morgan Chase within the Consumer and Community Banking Infrastructure Platform Management team, you draw upon your advanced knowledge to identify new opportunities to influence critical incident management and improve the end-to-end lifecycle of software development for the firm. You will have the opportunity to manage, design, and implement infrastructure components to improve reliability and ensure operational efficiency. You will provide comprehensive data management solutions that make J.P Morgan Chase's data discoverable, understandable, observable, reliable, accessible, and interoperable for authorized users, thereby accelerating Business Intelligence and AI/ML initiatives with agility and speed.

Job responsibilities

Identifies and solves problems of high complexity
Works with development teams throughout the Software Development Life Cycle to ensure sustainable software releases
Leads medium to large projects by bringing together the proper perspective, identifying roadblocks, and integrating feedback from team members and subject matter experts at the firm
Participates in support responsibilities for coverage of critical applications
Sees problems as opportunities to improve
Applies a wide range of tactics and strategies to guide internal executive decisions to achieve substantial goals
Implements innovative methods, techniques, and evaluation criteria for projects and people working on highly complex business issues

Required qualifications, capabilities, and skills

Formal training or certification on site reliability/software engineering concepts and 10+ years applied experience.
Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines
Ability to determine how each system relates to each other and use breadth of tools to build automation to improve reliability for the firm
Understands and leads partnerships across job functions (e.g., Cybersecurity and Data) to develop efficient and developer-friendly systems
Engages team members and expresses complex ideas with appropriate level of detail, while also providing constructive feedback
Expertise in monitoring tools (e.g., Prometheus, Grafana, Nagios) and logging systems (e.g., ELK stack, Splunk).
Ability to implement and manage observability practices to ensure system reliability.
Proficiency in cloud platforms (e.g., AWS, Azure, Google Cloud) and their services.
Experience in implementing SRE principles and practices to improve system reliability and availability.
Proficiency in SQL, NoSQL databases, and data warehousing solutions
Experience leading complex projects supporting site reliability engineering design, scaling, resilience, and system performance assessments

Preferred qualifications, capabilities, and skills

Knowledge of data governance frameworks and best practices.
Familiarity with data privacy regulations (e.g., GDPR, CCPA)
Skills in identifying and resolving performance bottlenecks.
Experience with load testing and capacity planning.

Provide site reliability expertise to influence critical incident management and improve the software development end-to-end lifecycle

Back to all Cloud & DevOps jobs

Apply now

Bulge Bracket Investment Banks