
Staff Distributed Framework Engineer, GenAI
at Adobe
Posted 15 hours ago
No clicks
- Compensation
- $173,500 – $331,050 USD
- City
- Not specified
- Country
- United States
Currency: $ (USD)
Lead the design and ownership of major components of a distributed framework, including execution abstractions, configuration, checkpointing, and cluster-side services. Design, implement, and operate large-scale distributed execution strategies across multi-node GPU environments for training and inference workloads, ensuring correctness, stability, and scalability. Improve reliability, autoscaling, fault tolerance, and Kubernetes integration to deliver cost-efficient platform infrastructure used to train, deploy, and operate models at scale. Collaborate with applied research and ML teams to enable safe production deployment and iteration on evolving requirements, delivering robust platform systems for multiple teams.
The Opportunity
Adobe Applied Science & Machine Learning (ASML) is seeking a Staff Distributed Framework Engineer to play a critical role in building and scaling the core distributed systems and platforms that support Adobe’s generative AI training and inference workloads.
In this role, you will serve as a senior technical owner for key components of our distributed execution frameworks and cluster‑side systems, translating product and research requirements into reliable, scalable, and secure platform infrastructure. Rather than focusing on model development, your work will enable multiple multimodal and video foundation models by strengthening the shared systems used to train, deploy, and operate models at scale.
You will operate at the intersection of applied research needs and large‑scale systems execution, ensuring that training and inference platforms are robust, reproducible, performant, and cost‑efficient across large GPU clusters. This role is ideal for a senior systems engineer who thrives on distributed systems problem solving, platform ownership, and operational excellence.
Job Responsibilities
- Distributed Framework Ownership: Own the design and implementation of major components of the distributed framework, including execution abstractions, configuration management, checkpointing, experiment and job lifecycle management, cluster‑side services, and training‑to‑inference handoff mechanisms.
- Large‑Scale Distributed Execution: Design, implement, and operate distributed execution strategies for large‑scale workloads, ensuring correctness, stability, and scalability across multi‑node GPU environments and shared training and inference clusters.
- Reliability, Autoscaling & Fault Tolerance: Improve the resilience of long‑running training jobs and inference services by strengthening resumability, state management, autoscaling behavior, and failure handling at both the framework and Kubernetes orchestration levels.
- Performance & Cost‑Aware Platform Design: Identify and address framework‑level inefficiencies related to memory usage, scheduling, communication, and execution orchestration across training and inference workloads, with a focus on resource efficiency and COGS reduction.
- Platform Enablement: Partner with applied research and ML teams to support evolving requirements, while owning the platform systems that allow models to be deployed, operated, and iterated on safely in production.
- Framework Integration & CI/CD: Collaborate with infrastructure and platform teams to integrate the distributed framework with scheduling, storage, monitoring, logging, CI/CD pipelines, security controls, and Kubernetes‑based deployment environments used in production systems.
What You’ll Need to Succeed
- Education: Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience.
- Strong Systems Engineering Skills: Proficiency in Python and one or more systems languages (e.g., C++, Go , Rust), with experience building and owning large‑scale, production‑critical distributed systems and custom resources for scaling deployments of large ML models.
- Distributed Systems Expertise: Deep understanding of synchronization, state management, fault tolerance, scheduling, and performance tradeoffs in distributed systems, including containerized and Kubernetes‑based environments.
- Platform & Operations Experience: Hands‑on experience operating complex systems in production, including, kubectl, cluster management, and debugging at scale.
- Senior‑Level Execution: Demonstrated ability to independently own complex platform areas, drive cross‑team execution, and deliver reliable systems used by many teams.
Preferred Experience
- Experience building or operating distributed platforms that support ML training and/or inference workloads.
- Experience with Kubernetes‑native systems, autoscaling, and shared compute environments.
- Familiarity with CI/CD, automation, and reproducible deployment workflows.
- Experience improving platform reliability, security, and operational simplicity.
- Exposure to performance profiling and optimization in large‑scale distributed systems.
About Adobe
Adobe empowers everyone to create through innovative platforms and tools that unleash creativity, productivity and personalized customer experiences. Adobe’s industry-leading offerings including Adobe Acrobat Studio, Adobe Express, Adobe Firefly, Creative Cloud, Adobe Experience Platform, Adobe Experience Manager, and GenStudio enable people and businesses to turn ideas into impact, powered by AI and driven by human ingenuity.
Our 30,000+ employees worldwide are creating the future and raising the bar as we drive the next decade of growth. We’re on a mission to hire the very best and believe in creating a company culture where all employees are empowered to make an impact. At Adobe, we believe that great ideas can come from anywhere in the organization. The next big idea could be yours.
Let’s Adobe together
At Adobe, we believe in creating a company culture where all employees are empowered to make an impact. Learn more about Adobe life, including our values and culture, focus on people, purpose and community, Adobe for All, comprehensive benefits programs, the stories we tell, the customers we serve, and how you can help us advance our mission of empowering everyone to create.
Adobe is proud to be an Equal Employment Opportunity employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other protected characteristic. Learn more.
Adobe aims to make our Careers website and recruiting process accessible to any and all users. If you have a disability or special need that requires accommodation to navigate our website or complete the application process, email accommodations@adobe.com or call +1 408-536-3015.
AI Use Guidelines for Interviews:
Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process.
At Adobe, we empower employees to innovate with AI — and we look for candidates eager to do the same. As part of the hiring experience, we provide clear guidance on where AI is encouraged during the process and where it’s restricted during live interviews. See how we think about AI in the hiring experience.
Expected Pay Range:
Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $173,500 -- $331,050 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process. In California, the pay range for this position is $228,600 - $331,050At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission), and short-term incentives are in the form of sales commission plans. Non-sales roles starting salaries are expressed as base salary and short-term incentives are in the form of the Annual Incentive Plan (AIP).
In addition, certain roles may be eligible for long-term incentives in the form of a new hire equity award.
State-Specific Notices:
California:
Fair Chance Ordinances
Adobe will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and “fair chance” ordinances.
Colorado:
Application Window Notice
If this role is open to hiring in Colorado (as listed on the job posting), the application window will remain open until at least the date and time stated above in Pacific Time, in compliance with Colorado pay transparency regulations. If this role does not have Colorado listed as a hiring location, no specific application window applies, and the posting may close at any time based on hiring needs.
Massachusetts:
Massachusetts Legal Notice
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.

