Bulge Bracket Investment Banks

Lead Software Engineer - Java / AWS

at J.P. Morgan

Tech LeadNo visa sponsorshipJava

Posted a month ago

No clicks

Compensation: Not specified
City: Wilmington
Country: United States

Senior engineering role responsible for designing and delivering scalable, secure API-driven solutions on AWS using Java. Lead SRE-focused initiatives including observability, SLO/SLA design, incident management, performance engineering, and automation across CI/CD and IaC. Drive reliability improvements through testing, chaos engineering, capacity planning, and remediation programs while partnering with product and agile teams. Hands-on with Terraform, AWS services, monitoring tools (Datadog, CloudWatch, Prometheus, Grafana, etc.), and production readiness practices.

Location: Wilmington, DE, United States

We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.

As a Lead Software Engineer at JPMorgan Chase within the Consumer and Community banking technology team, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for carrying out critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.

Job responsibilities

Engage with development team throughout agile sprints to develop software for reliability and scale, ensuring minimal refactoring or changes
Identify application patterns and analytics in support of better service level objectives. Design automated software and product upgrades, change management, and release management solutions.
Deep Experience in Operating Services in Public Cloud, Strong grasp of SRE principles; SLIs/ SLOs, error budgets, incident management, observability, and resilience patterns. Hands-on with observability and incident tooling, Proficiency with CI/CD and deployment strategies
Perform year-over-year analysis of production issues (e.g., P1–P3) to identify top failure modes, recurrence patterns, and control gaps
Drive prioritized remediation programs across change/configuration, capacity/performance, dependency resilience, and code quality.
Troubleshoot priority and escalation incidents, facilitate blameless post-mortems and ensure permanent closure of incidents and subsequent problem tasks.
Establish comprehensive automated functional testing with dependable regression suites integrated into CI/CD to gate releases; improve reliability and speed through robust test data and include non-functional checks (performance, resilience, accessibility) in pre‑prod and readiness reviews.
Implement demand forecasting, load testing, and performance engineering in pre-prod; validate scale assumptions before peak events.
Run game days and chaos experiments to validate failover, degraded-mode operation, and dependency timeouts.
Embed shift‑left quality and partner with Product to mature testing practices: co‑define clear acceptance criteria and Definition of Ready/Done, align coverage to critical user journeys, and track quality KPIs (defect escape rate, automated coverage on key paths, change failure rate) tied to service objectives and release readiness.
Cloud platform and automation

Required qualifications, capabilities, and skills

Formal training or certification on software engineering concepts and 5+ years applied experience
Experience in AWS-based API development using Java, with proficiency in RESTful API development and related tools such as Postman and Swagger/OpenAPI.
Proficient in utilizing AWS services (e.g., Lambda, API Gateway, S3, EC2, IAM, Event Bridge) to design and deploy API-driven solutions, with a focus on cost-efficiency, scalability, and performance.
Skilled in implementing Infrastructure as Code (IaC) using tools like Terraform.
Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
Experience with AWS cloud monitoring tools like Datadog and CloudWatch.
An advanced understanding of site reliability culture and principles and a track record of demonstrating how to implement site reliability within an application or platform and usage of key SRE concepts such as SLOs and Error Budgets
Advanced knowledge and experience in observability capabilities across applications (metrics, tracing, SLOs), alerting, telemetry collection and ability to design critical and golden signal monitoring and dashboards
Solid understanding of agile methodologies, including CI/CD, application resiliency, and security, with experience in developing, debugging, and maintaining code in a large corporate environment using modern programming and database querying languages.

Preferred qualifications, capabilities, and skills

Experience instituting production readiness standards and error-budget policies across multiple product teams.
Background in performance engineering and capacity planning for high-traffic, customer-facing systems
Thorough understanding of Automated Functional Testing / Regression Testing and integration of the same in TrueCD
Cloud / SRE certifications

Design and deliver market-leading technology products in a secure and scalable way as a seasoned member of an agile team

Back to all Java jobs

Apply now

Bulge Bracket Investment Banks