Hedge Funds

Senior Site Reliability Engineer (SRE)

at Qube

Mid LevelNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 20 days ago

No clicks

Compensation: Not specified
City: Not specified
Country: Not specified

Join the Platform team to improve reliability, observability, and operability for a growing engineering platform at Qube Research & Technologies. You will own the observability platform, build low-noise dashboards and alerts, improve incident detection and response, and define SLIs/SLOs to drive operational decisions. The role involves hands-on engineering to improve scalability and automation, applying Infrastructure as Code and developing tooling (Go preferred, Python acceptable). You will partner with service teams to deliver measurable reliability improvements while keeping long-term service ownership with those teams.

Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology- and data-driven group implementing a scientific approach to investing. Combining data, research, technology, and trading expertise has shaped our collaborative mindset, which enables us to solve the most complex challenges. QRT’s culture of innovation continuously drives our ambition to deliver high-quality returns for our investors.

You will join the Platform team focused on improving reliability and day-to-day operability for an actively used and growing engineering platform. The team works closely with software engineers and platform owners to improve observability, incident response, and reliability outcomes, while keeping long-term service ownership with the teams that build and run the services.

Your Future Role within QRT

You will:

Own the effectiveness of the observability platform, ensuring high-quality signals, alert fidelity, and ongoing suitability as the platform scales
Build and maintain actionable, low-noise dashboards and alerting across metrics and logs
Improve incident detection, response, and follow-up, ensuring corrective actions are implemented in systems, configuration, or automation
Define and apply SLIs and SLOs where they support operational decision-making
Improve reliability, scalability, and operability of core services through hands-on engineering changes
Identify recurring failure patterns and reduce manual operational work through automation and improved defaults
Apply Infrastructure as Code across observability and supporting systems
Develop tooling and automation in Go (preferred) or Python
Introduce shared patterns, defaults, and documentation that reduce repeated bespoke work
Partner with service-owning teams to deliver measurable reliability improvements without transferring long-term service ownership to SRE

Your Present Skillset

Strong practical experience applying Site Reliability Engineering principles in production environments
Strong Linux systems knowledge
Experience building and operating containerised workloads (Docker or Podman)
Strong development experience in Go (preferred) or Python
Strong experience querying and reasoning about metrics using PromQL
Hands-on experience with Grafana, including dashboarding and alerting
Experience deploying and operating centralised logging systems
Strong Infrastructure as Code experience
OpenTelemetry experience (metrics, logs, traces)
Terraform and/or Ansible experience, plus familiarity with CI/CD pipelines
Kubernetes and cloud-native platform experience
Exposure to datacentre services and compute/hardware-backed platforms
AWS infrastructure configuration and deployment experience
Evidence of reducing operational load and recurring incidents in growing systems

QRT is an equal opportunity employer. We welcome diversity as essential to our success. QRT empowers employees to work openly and respectfully to achieve collective success. In addition to professional achievement, we are offering initiatives and programs to enable employees achieve a healthy work-life balance.

Back to all Cloud & DevOps jobs

Apply now

Hedge Funds