
Site Reliability Engineer
at Fortinet
Posted 8 hours ago
No clicks
- Compensation
- Not specified
- City
- Not specified
- Country
- United States
Currency: Not specified
Automate and improve the Lacework Cloud Security Platform by designing, building, and operating scalable cloud infrastructure and internal tooling. You will automate operational workflows, implement infrastructure as code, and develop monitoring to predict and prevent issues, while supporting Kubernetes, cloud services, and on-call rotations. The role emphasizes cross-team collaboration to improve reliability, scalability, and observability across the Lacework platform.
Location: Sunnyvale, CA, United States
At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers.
Our team members enjoy solving complex problems, and obsess over getting the details right. We love what we do and are proud of our work to secure clouds and container environments for thousands of b2b customers worldwide.
Our team is growing, and we are looking for engineers with passion for automation. You will help support the Lacework platform and play a key role in building, operating, and improving the Lacework Cloud Security Platform, the world's best real-time cloud-native threat detection system.
Our team develops and supports the infrastructure layers spanning our cloud accounts, network/connectivity, workload management, observability, and storage services. We build tooling to perform automated operations in order to scale the Lacework infrastructure and service. To be successful you will design, define, develop, deploy and operate internal tooling, APIs, and frameworks which streamline our workflows and automate our infrastructure.
The Role:
- Automate as much as reasonable to significantly improve operational efficiency of the Lacework platform
- Design, build and improve our infrastructure to enhance service scalability, resiliency, and efficiency across the company.
- Identify mission-critical problems and solve them via automation, tooling, communication, and informed design.
- Build and improve monitoring and instrumentation to predict future scalability or failure risks and solve them before they manifest into customer-facing issues.
- Facilitate company-wide visibility into key metrics, SLAs, and milestones so that scale and resiliency are a part of every conversation.
- Develop best practices alongside engineering/operations teams to improve the scalability and reliability of internal processes.
- Participate in an on-call rotation.
Minimum Qualifications:
- 3 years of Devops/SRE experience with production systems (depending on level)
- Strong development and automation skills.
- Extensive experience with Infrastructure as Code (Terraform, etc), as well as supporting tooling (Atlantis, ArgoCD, etc)
- Extensive experience with Kubernetes and supporting tooling (Helm, operators, etc)
- Extensive experience with a variety of cloud managed services and providers
- AWS: EKS, EC2, S3, RDS, Secrets Manager, etc.
- Experience building production quality cloud infrastructure that enables reliable and rapid deployment of microservices with effective monitoring and built in high availability and/or fault tolerance.
- Strong passion for using automation to create simple repeatable dev and ops patterns that ensures a stable, reliable experience for customers.
- Strong cross-team communication skills.
- Experience with the building blocks of large-scale systems including load balancing, distributed/cloud computing, containers, instrumentation, and monitoring.
- Knowledge of cloud networking, including VPC configuration and cross-cloud connectivity.
- Familiarity with one or more programming languages (Python, Golang, etc.).
Preferred Qualifications:
- Experience with monitoring and observability systems and tools (Prometheus, Grafana, New Relic, DataDog, etc.)
- Believe everything should be "as code"
- Experience with Java application servers and JVM configuration
At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess over getting the details right. We love what we do and are proud of our work to secure clouds and container environments for thousands of B2B customers worldwide. We are looking for a highly skilled Site Reliability Engineering (SRE) Manager to lead our SRE team in building scalable, reliable, and secure infrastructure that ensures the highest levels of availability and performance.

