LOG IN
SIGN UP
Tech Job Finder - Find Software, Technology Sales and Product Manager Jobs.
Sign In
OR continue with e-mail and password
E-mail address
Password
Don't have an account?
Reset password
Join Tech Job Finder
OR continue with e-mail and password
E-mail address
First name
Last name
Username
Password
Confirm Password
How did you hear about us?
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Site Reliability Engineer III- Data and AWS

at J.P. Morgan

Back to all Cloud & DevOps jobs
J.P. Morgan logo
Bulge Bracket Investment Banks

Site Reliability Engineer III- Data and AWS

at J.P. Morgan

Mid LevelNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 7 days ago

No clicks

Compensation
Not specified GBP

Currency: £ (GBP)

City
Glasgow
Country
United Kingdom

Join JPMorgan Chase as a Site Reliability Engineer III within the AIML Data Platforms and Chief Data and Analytics Team. You will configure, monitor, and optimize data platform applications and their infrastructure on AWS, contributing to end-to-end reliability and scalability. The role emphasizes observability, automation, incident response, and collaboration with data science, engineering, and operations teams. This position is based in Glasgow, United Kingdom.

Location: GLASGOW, LANARKSHIRE, United Kingdom

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the AIML Data Platforms and Chief Data and Analytics Team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Job responsibilities

  • Assists in operating and maintaining the managed AWS and Data platforms; provides day-to-day engineering and operational support to SRE and application teams under guidance.
  • Supports platform design, setup, and configuration; performs workspace administration, resource monitoring, and basic troubleshooting for data engineering, Data Science/ML, and application/integration teams.
  • Participates in evaluation activities with external vendors, startups, and internal teams; documents findings and recommendations for senior review.
  • Contributes to improvements in system observability, alerting, and capacity planning by building dashboards, updating runbooks, and implementing basic automation.
  • Collaborates with engineering and data teams to optimize infrastructure and deployment processes, focusing on automation and operational excellence; writes and maintains scripts or pipelines following standards.
  • Implements and troubleshoots software solutions; contributes to design and development tasks and escalates complex issues appropriately.
  • Writes secure, high-quality production code for features and fixes; performs basic peer reviews and debugs own code when needed.
  • Identifies recurring issues and proposes or implements automation and remediation steps to improve operational stability of applications and systems.
  • Contributes to a team culture of inclusion, respect, and continuous learning.
  • Applies Site Reliability Engineering best practices (e.g., SLIs/SLOs, error budgets, incident response) with direction from senior engineers to support reliability, scalability, and performance of data platforms.
  • Participates in incident response following established procedures; assists with root-cause analysis, postmortem documentation, and implementation of corrective actions.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and applied experience
  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Understanding of SRE principles, including SLIs, SLOs, error budgets, and incident management.
  • Experience with monitoring tools, automation frameworks, and CI/CD pipelines.
  • Experience writing Python applications or scripts and using automated unit testing frameworks.
  • Experience with terraform development and understanding of terraform enterprise.
  • Experience contributing to system design discussions, application development, testing, and supporting operational stability.
  • Familiarity with big data distributed compute frameworks such as Apache Spark, AWS Glue, and MapReduce.
  • Strong troubleshooting, analytical, and communication skills.
  •  
Preferred qualifications, capabilities, and skills
  • Familiarity with distributed systems and large-scale data processing.
  • Experienced with AWS and Python
  • Knowledge of containerization (Docker, Kubernetes) and orchestration.

 

Apply your skillsets to drive innovation and modernize the world's most complex and mission-critical systems

Site Reliability Engineer III- Data and AWS

at J.P. Morgan

Back to all Cloud & DevOps jobs
J.P. Morgan logo
Bulge Bracket Investment Banks

Site Reliability Engineer III- Data and AWS

at J.P. Morgan

Mid LevelNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 7 days ago

No clicks

Compensation
Not specified GBP

Currency: £ (GBP)

City
Glasgow
Country
United Kingdom

Join JPMorgan Chase as a Site Reliability Engineer III within the AIML Data Platforms and Chief Data and Analytics Team. You will configure, monitor, and optimize data platform applications and their infrastructure on AWS, contributing to end-to-end reliability and scalability. The role emphasizes observability, automation, incident response, and collaboration with data science, engineering, and operations teams. This position is based in Glasgow, United Kingdom.

Location: GLASGOW, LANARKSHIRE, United Kingdom

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the AIML Data Platforms and Chief Data and Analytics Team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Job responsibilities

  • Assists in operating and maintaining the managed AWS and Data platforms; provides day-to-day engineering and operational support to SRE and application teams under guidance.
  • Supports platform design, setup, and configuration; performs workspace administration, resource monitoring, and basic troubleshooting for data engineering, Data Science/ML, and application/integration teams.
  • Participates in evaluation activities with external vendors, startups, and internal teams; documents findings and recommendations for senior review.
  • Contributes to improvements in system observability, alerting, and capacity planning by building dashboards, updating runbooks, and implementing basic automation.
  • Collaborates with engineering and data teams to optimize infrastructure and deployment processes, focusing on automation and operational excellence; writes and maintains scripts or pipelines following standards.
  • Implements and troubleshoots software solutions; contributes to design and development tasks and escalates complex issues appropriately.
  • Writes secure, high-quality production code for features and fixes; performs basic peer reviews and debugs own code when needed.
  • Identifies recurring issues and proposes or implements automation and remediation steps to improve operational stability of applications and systems.
  • Contributes to a team culture of inclusion, respect, and continuous learning.
  • Applies Site Reliability Engineering best practices (e.g., SLIs/SLOs, error budgets, incident response) with direction from senior engineers to support reliability, scalability, and performance of data platforms.
  • Participates in incident response following established procedures; assists with root-cause analysis, postmortem documentation, and implementation of corrective actions.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and applied experience
  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Understanding of SRE principles, including SLIs, SLOs, error budgets, and incident management.
  • Experience with monitoring tools, automation frameworks, and CI/CD pipelines.
  • Experience writing Python applications or scripts and using automated unit testing frameworks.
  • Experience with terraform development and understanding of terraform enterprise.
  • Experience contributing to system design discussions, application development, testing, and supporting operational stability.
  • Familiarity with big data distributed compute frameworks such as Apache Spark, AWS Glue, and MapReduce.
  • Strong troubleshooting, analytical, and communication skills.
  •  
Preferred qualifications, capabilities, and skills
  • Familiarity with distributed systems and large-scale data processing.
  • Experienced with AWS and Python
  • Knowledge of containerization (Docker, Kubernetes) and orchestration.

 

Apply your skillsets to drive innovation and modernize the world's most complex and mission-critical systems