
AWS Java Microservices Lead Software Engineer
at J.P. Morgan
Posted a day ago
No clicks
- Compensation
- Not specified
- City
- Not specified
- Country
- United States
Currency: Not specified
Lead Software Engineer at JPMorganChase focusing on AWS-based Java microservices, delivering reliable, scalable services with strong SRE and observability practices. You will drive automation, testing, release management, incident response, and performance engineering across agile sprints. The role emphasizes cloud-native design, CI/CD, and close collaboration with product teams to mature testing and readiness for production.
Location: Wilmington, DE, United States
We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.
As a Lead Software Engineer at JPMorganChase within the [insert LOB or sub LOB], you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.
Job responsibilities
- Engage with development team throughout agile sprints to develop software for reliability and scale, ensuring minimal refactoring or changes
- Identify application patterns and analytics in support of better service level objectives. Design automated software and product upgrades, change management, and release management solutions.
- Deep Experience in Operating Services in Public Cloud, Strong grasp of SRE principles; SLIs/ SLOs, error budgets, incident management, observability, and resilience patterns. Hands-on with observability and incident tooling, Proficiency with CI/CD and deployment strategies
- Perform year-over-year analysis of production issues (e.g., P1–P3) to identify top failure modes, recurrence patterns, and control gaps
- Drive prioritized remediation programs across change/configuration, capacity/performance, dependency resilience, and code quality.
- Troubleshoot priority and escalation incidents, facilitate blameless post-mortems and ensure permanent closure of incidents and subsequent problem tasks.
- Establish comprehensive automated functional testing with dependable regression suites integrated into CI/CD to gate releases; improve reliability and speed through robust test data and include non-functional checks (performance, resilience, accessibility) in pre‑prod and readiness reviews.
- Implement demand forecasting, load testing, and performance engineering in pre-prod; validate scale assumptions before peak events.
- Run game days and chaos experiments to validate failover, degraded-mode operation, and dependency timeouts.
- Embed shift‑left quality and partner with Product to mature testing practices: co‑define clear acceptance criteria and Definition of Ready/Done, align coverage to critical user journeys, and track quality KPIs (defect escape rate, automated coverage on key paths, change failure rate) tied to service objectives and release readiness.
- Cloud platform and automation
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 5+ years applied experience
- Experience in AWS-based API development using Java, with proficiency in RESTful API development and related tools such as Postman and Swagger/OpenAPI.
- Proficient in utilizing AWS services (e.g., Lambda, API Gateway, S3, EC2, IAM, Event Bridge) to design and deploy API-driven solutions, with a focus on cost-efficiency, scalability, and performance.
- Skilled in implementing Infrastructure as Code (IaC) using tools like Terraform.
- Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
- Experience with AWS cloud monitoring tools like Datadog and CloudWatch.
- An advanced understanding of site reliability culture and principles and a track record of demonstrating how to implement site reliability within an application or platform and usage of key SRE concepts such as SLOs and Error Budgets
- Advanced knowledge and experience in observability capabilities across applications (metrics, tracing, SLOs), alerting, telemetry collection and ability to design critical and golden signal monitoring and dashboards
- Solid understanding of agile methodologies, including CI/CD, application resiliency, and security, with experience in developing, debugging, and maintaining code in a large corporate environment using modern programming and database querying languages.
- Experience instituting production readiness standards and error-budget policies across multiple product teams.
- Background in performance engineering and capacity planning for high-traffic, customer-facing systems
- Thorough understanding of Automated Functional Testing / Regression Testing and integration of the same in TrueCD
- Cloud / SRE certifications

