LOG IN
SIGN UP
Tech Job Finder - Find Software, Technology Sales and Product Manager Jobs.
Sign In
OR continue with e-mail and password
E-mail address
Password
Don't have an account?
Reset password
Join Tech Job Finder
OR continue with e-mail and password
E-mail address
First name
Last name
Username
Password
Confirm Password
How did you hear about us?
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Senior Engineer - SRE Engineer

at OKX

Back to all Cloud & DevOps jobs
O
Industry not specified

Senior Engineer - SRE Engineer

at OKX

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 5 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
Not specified

Join OKX as a Senior SRE Engineer responsible for keeping large-scale data platforms and services stable and highly available. You will optimize big data stacks (Alibaba Cloud DataWorks, AWS EMR, Databricks, Spark) and data warehouses, advance middleware reliability, and lead chaos engineering and incident response. You will optimize runtime environments (KVM, Docker, Kubernetes, JVM) and drive automation and infrastructure intelligence, collaborating with development teams to enable continuous product innovation. Requires 8+ years of experience in large-scale internet or cloud platform operations.

Who We Are

At OKX, we believe that the future will be reshaped by Crypto, ultimately contributing to every individual's freedom. OKX began as a crypto exchange giving millions of people access to crypto trading and over time becoming among the largest platforms in the world. In recent years, we have developed one of the most connected Web3 wallets used by millions to access decentralized crypto applications (dApps). OKX is a trusted brand by hundreds of large institutions seeking access to crypto markets on a reliable platform that seamlessly connects with global banking and payments. In the last year, OKX has expanded into new markets including Australia, Brazil, Netherlands, Singapore and Turkey, with plans to launch in the US, Belgium and the UAE.
We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology. This is why we publish proof of reserves monthly, and continue to ship new innovative security features.

What You’ll Be Doing:

  • - Ensure stability and optimize big data platforms (Alibaba Cloud DataWorks, AWS EMR, AWS DataBricks, Spark, Flink) and data warehouses (MaxCompute, Hologres, Hive, Clickhouse, StarRocks, etc.).
  • Deeply understand the architecture and principles of middleware (Kafka, Spring Cloud, Nacos, Apollo, Kong Gateway, etc.), ensuring high performance and availability.
  • Effectively optimize existing runtime environments (KVM, Docker, K8S, JVM, etc.) to ensure efficient resource utilization and stable service operation.
  • Comprehend network architecture and security, providing guidance on infrastructure stability based on network architecture and security layers, ensuring secure, stable, and efficient network communications.
  • Lead chaos engineering exercises, coordinating with business units to validate system robustness and recovery capabilities through simulated failure scenarios.
  • Participate in rapid response and troubleshooting of system failures, continuously optimize monitoring strategies to reduce system downtime and ensure service continuity and stability.
  • Drive infrastructure automation and intelligence to improve SRE work efficiency and quality.
  • Collaborate closely with development teams, providing technical support and advice on infrastructure to jointly promote continuous product improvement and innovation.

What We Look For In You:

  • Bachelor's degree or above in Computer Science or related field, with 8+ years of experience in large-scale internet or cloud computing platform development/SRE/operations.
  • In-depth understanding of big data platforms, data warehouses, middleware, runtime environments, and network technology principles and architectures, with rich practical experience and troubleshooting skills.
  • Proficient in Linux system management and optimization, familiar with scripting languages such as Shell/Python, able to write automation tools and scripts.
  • Familiar with container and cloud-native technologies like KVM, Docker, and K8S, including their architectures and principles, with extensive experience in handling common issues and failures.
  • Familiar with network protocols such as TCP/UDP/QUIC, proficient in using network commands like TcpDump, TraceRoute, Netstat, and tools like Wireshark, with rich practical experience in troubleshooting common network issues.
  • Rich experience with Alibaba Cloud and AWS cloud products, from architecture to usage, with extensive practice in dealing with common issues and failures.
  • Practitioners with experience in service governance system construction, architecture optimization, stability assurance construction, capacity management, activity support, and chaos engineering are preferred.
  • Strong sense of responsibility and team spirit, with excellent problem-solving and analytical skills.
  • Must have Chinese communication skills; proficiency in both Chinese and English communication is preferred.

Perks & Benefits

  • Competitive total compensation package
  • L&D programs and Education subsidy for employees' growth and development
  • Various team building programs and company events
  • Wellness and meal allowances
  • Comprehensive healthcare schemes for employees and dependants
  • More that we love to tell you along the process!
Disclaimer: Please note that Hong Kong is a group-level service hub, and OKX does not carry on a business of operating a virtual asset trading platform in Hong Kong.

#LI-KARL

Notice:
All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website.
Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.

Senior Engineer - SRE Engineer

at OKX

Back to all Cloud & DevOps jobs
O
Industry not specified

Senior Engineer - SRE Engineer

at OKX

Tech LeadNo visa sponsorshipAWS/GCP/Azure DevOps

Posted 5 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
Not specified

Join OKX as a Senior SRE Engineer responsible for keeping large-scale data platforms and services stable and highly available. You will optimize big data stacks (Alibaba Cloud DataWorks, AWS EMR, Databricks, Spark) and data warehouses, advance middleware reliability, and lead chaos engineering and incident response. You will optimize runtime environments (KVM, Docker, Kubernetes, JVM) and drive automation and infrastructure intelligence, collaborating with development teams to enable continuous product innovation. Requires 8+ years of experience in large-scale internet or cloud platform operations.

Who We Are

At OKX, we believe that the future will be reshaped by Crypto, ultimately contributing to every individual's freedom. OKX began as a crypto exchange giving millions of people access to crypto trading and over time becoming among the largest platforms in the world. In recent years, we have developed one of the most connected Web3 wallets used by millions to access decentralized crypto applications (dApps). OKX is a trusted brand by hundreds of large institutions seeking access to crypto markets on a reliable platform that seamlessly connects with global banking and payments. In the last year, OKX has expanded into new markets including Australia, Brazil, Netherlands, Singapore and Turkey, with plans to launch in the US, Belgium and the UAE.
We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology. This is why we publish proof of reserves monthly, and continue to ship new innovative security features.

What You’ll Be Doing:

  • - Ensure stability and optimize big data platforms (Alibaba Cloud DataWorks, AWS EMR, AWS DataBricks, Spark, Flink) and data warehouses (MaxCompute, Hologres, Hive, Clickhouse, StarRocks, etc.).
  • Deeply understand the architecture and principles of middleware (Kafka, Spring Cloud, Nacos, Apollo, Kong Gateway, etc.), ensuring high performance and availability.
  • Effectively optimize existing runtime environments (KVM, Docker, K8S, JVM, etc.) to ensure efficient resource utilization and stable service operation.
  • Comprehend network architecture and security, providing guidance on infrastructure stability based on network architecture and security layers, ensuring secure, stable, and efficient network communications.
  • Lead chaos engineering exercises, coordinating with business units to validate system robustness and recovery capabilities through simulated failure scenarios.
  • Participate in rapid response and troubleshooting of system failures, continuously optimize monitoring strategies to reduce system downtime and ensure service continuity and stability.
  • Drive infrastructure automation and intelligence to improve SRE work efficiency and quality.
  • Collaborate closely with development teams, providing technical support and advice on infrastructure to jointly promote continuous product improvement and innovation.

What We Look For In You:

  • Bachelor's degree or above in Computer Science or related field, with 8+ years of experience in large-scale internet or cloud computing platform development/SRE/operations.
  • In-depth understanding of big data platforms, data warehouses, middleware, runtime environments, and network technology principles and architectures, with rich practical experience and troubleshooting skills.
  • Proficient in Linux system management and optimization, familiar with scripting languages such as Shell/Python, able to write automation tools and scripts.
  • Familiar with container and cloud-native technologies like KVM, Docker, and K8S, including their architectures and principles, with extensive experience in handling common issues and failures.
  • Familiar with network protocols such as TCP/UDP/QUIC, proficient in using network commands like TcpDump, TraceRoute, Netstat, and tools like Wireshark, with rich practical experience in troubleshooting common network issues.
  • Rich experience with Alibaba Cloud and AWS cloud products, from architecture to usage, with extensive practice in dealing with common issues and failures.
  • Practitioners with experience in service governance system construction, architecture optimization, stability assurance construction, capacity management, activity support, and chaos engineering are preferred.
  • Strong sense of responsibility and team spirit, with excellent problem-solving and analytical skills.
  • Must have Chinese communication skills; proficiency in both Chinese and English communication is preferred.

Perks & Benefits

  • Competitive total compensation package
  • L&D programs and Education subsidy for employees' growth and development
  • Various team building programs and company events
  • Wellness and meal allowances
  • Comprehensive healthcare schemes for employees and dependants
  • More that we love to tell you along the process!
Disclaimer: Please note that Hong Kong is a group-level service hub, and OKX does not carry on a business of operating a virtual asset trading platform in Hong Kong.

#LI-KARL

Notice:
All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website.
Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.

SIMILAR OPPORTUNITIES

No similar jobs available at the moment.