
Lead Software Engineer - Java / AWS
at J.P. Morgan
Posted a month ago
No clicks
- Compensation
- Not specified
- City
- Wilmington
- Country
- United States
Currency: Not specified
Senior engineering role responsible for designing and delivering scalable, secure API-driven solutions on AWS using Java. Lead SRE-focused initiatives including observability, SLO/SLA design, incident management, performance engineering, and automation across CI/CD and IaC. Drive reliability improvements through testing, chaos engineering, capacity planning, and remediation programs while partnering with product and agile teams. Hands-on with Terraform, AWS services, monitoring tools (Datadog, CloudWatch, Prometheus, Grafana, etc.), and production readiness practices.
Location: Wilmington, DE, United States
We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.
As a Lead Software Engineer at JPMorgan Chase within the Consumer and Community banking technology team, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for carrying out critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.
Job responsibilities
- Engage with development team throughout agile sprints to develop software for reliability and scale, ensuring minimal refactoring or changes
- Identify application patterns and analytics in support of better service level objectives. Design automated software and product upgrades, change management, and release management solutions.
- Deep Experience in Operating Services in Public Cloud, Strong grasp of SRE principles; SLIs/ SLOs, error budgets, incident management, observability, and resilience patterns. Hands-on with observability and incident tooling, Proficiency with CI/CD and deployment strategies
- Perform year-over-year analysis of production issues (e.g., P1–P3) to identify top failure modes, recurrence patterns, and control gaps
- Drive prioritized remediation programs across change/configuration, capacity/performance, dependency resilience, and code quality.
- Troubleshoot priority and escalation incidents, facilitate blameless post-mortems and ensure permanent closure of incidents and subsequent problem tasks.
- Establish comprehensive automated functional testing with dependable regression suites integrated into CI/CD to gate releases; improve reliability and speed through robust test data and include non-functional checks (performance, resilience, accessibility) in pre‑prod and readiness reviews.
- Implement demand forecasting, load testing, and performance engineering in pre-prod; validate scale assumptions before peak events.
- Run game days and chaos experiments to validate failover, degraded-mode operation, and dependency timeouts.
- Embed shift‑left quality and partner with Product to mature testing practices: co‑define clear acceptance criteria and Definition of Ready/Done, align coverage to critical user journeys, and track quality KPIs (defect escape rate, automated coverage on key paths, change failure rate) tied to service objectives and release readiness.
- Cloud platform and automation
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience
- Experience in AWS-based API development using Java, with proficiency in RESTful API development and related tools such as Postman and Swagger/OpenAPI.
- Proficient in utilizing AWS services (e.g., Lambda, API Gateway, S3, EC2, IAM, Event Bridge) to design and deploy API-driven solutions, with a focus on cost-efficiency, scalability, and performance.
- Skilled in implementing Infrastructure as Code (IaC) using tools like Terraform.
- Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Experience with AWS cloud monitoring tools like Datadog and CloudWatch.
- An advanced understanding of site reliability culture and principles and a track record of demonstrating how to implement site reliability within an application or platform and usage of key SRE concepts such as SLOs and Error Budgets
- Advanced knowledge and experience in observability capabilities across applications (metrics, tracing, SLOs), alerting, telemetry collection and ability to design critical and golden signal monitoring and dashboards
- Solid understanding of agile methodologies, including CI/CD, application resiliency, and security, with experience in developing, debugging, and maintaining code in a large corporate environment using modern programming and database querying languages.
Preferred qualifications, capabilities, and skills
- Experience instituting production readiness standards and error-budget policies across multiple product teams.
- Background in performance engineering and capacity planning for high-traffic, customer-facing systems
- Thorough understanding of Automated Functional Testing / Regression Testing and integration of the same in TrueCD
- Cloud / SRE certifications




