Senior Staff Engineer, Microservice Governance
at OKX
Posted 5 hours ago
No clicks
- Compensation
- Not specified
- City
- Singapore
- Country
- Singapore
Currency: Not specified
Lead the design and long-term planning of OKX's unified Microservice governance platform, covering RPC, messaging, and gateways to create a stable, efficient service governance system. Spearhead research and development of adaptive governance algorithms such as adaptive rate limiting, circuit breaking, and load balancing based on system metrics. Conduct deep research into mainstream RPC frameworks (e.g., Dubbo3, gRPC) and guide customization and performance optimization, while ensuring high availability of core middleware like configuration centers (Apollo), service registries (etcd, Nacos, ZK), and distributed task schedulers. Promote platform-level release strategies such as canary, blue-green, and traffic dyeing to improve R&D delivery efficiency and system stability; build a resilience-oriented architecture with chaos engineering to identify vulnerabilities and continuously improve anti-fragility.
Who We Are
About the Team
What You’ll Be Doing
- Lead the top-level design and long-term planning of the company's unified Microservice governance system, covering various communication methods like RPC, messaging, and gateways, to build a stable, efficient, and intelligent service governance platform.
- Spearhead the R&D and implementation of adaptive governance algorithms, including but not limited to adaptive rate limiting based on queuing theory or Little's Law, adaptive circuit breaking based on error rates and response times, and adaptive load balancing based on node load and health status.
- Conduct in-depth research into the core mechanisms of mainstream RPC frameworks (e.g., Dubbo3, gRPC), including service discovery, load balancing, serialization protocols, and threading models, and lead the deep customization and performance optimization of these frameworks Take charge of the architectural evolution and high-availability construction of core Middleware such as configuration centers (Apollo), service registries (etcd, Nacos, ZK), and distributed task schedulers (XXL-JOB、Argo Workflow), ensuring the ultimate stability of foundational services Design and promote the platform-level implementation of advanced release strategies like canary release, blue-green deployment, and traffic dyeing to improve R&D delivery efficiency and the stability of system changes.
- Build a resilience-oriented architecture with a "desired state" mindset, introducing chaos engineering principles and tools to proactively identify system vulnerabilities and continuously enhance the system's anti-fragility.
- As a Middleware architect, provide expert-level guidance to business units on Microservice decomposition, high-availability architecture design, and performance/capacity planning.
What We Look For In You
- Bachelor's degree or higher in Computer Science or a related field, with 8+ years of R&D experience in Middleware or distributed systems.
- Proficient in Java or Golang, with a deep understanding of JVM/GC tuning or the Go runtime, and extensive experience in online troubleshooting and performance optimization.
- Systematic knowledge and profound practical experience in Microservice governance areas such as rate limiting, circuit breaking, degradation, isolation, retries, and load balancing, with in-depth research into the underlying algorithmic principles.
- In-depth understanding of the source code and design philosophy of at least one mainstream RPC framework (Dubbo3, gRPC) or service governance framework (Spring Cloud, Sentinel) Familiarity with the implementation principles of service registries/configuration centers like Nacos, etcd, and Zookeeper, with a solid foundation in CAP/BASE theory and consensus algorithms like Raft/ZAB.
- Rich architectural design experience in the Middleware domain, capable of designing and implementing complex distributed systems from scratch (0 to 1).
- Proven experience and successful cases in areas like adaptive algorithms, full-link stress testing, or chaos engineering are highly preferred.
- Excellent abstraction skills and architectural thinking, adept at modeling complex problems and designing elegant, scalable systems.
Perks & Benefits
- Competitive total compensation package
- L&D programs and Education subsidy for employees' growth and development
- Various team building programs and company events
- Wellness and meal allowances
- Comprehensive healthcare schemes for employees and dependants
- More that we love to tell you along the process!

