Own the architecture design and core capability development of intelligent tools and platforms for ByteDance's global Video & Edge business, including AI Agent systems and AIOps platforms. Lead the design and development of an intelligent Agent framework built on Large Language Models (LLMs), leveraging Prompt Engineering, RAG, MCP, Skill, Function Calling, and related techniques to deliver end-to-end intelligent fault diagnosis, root cause analysis, automated remediation, and intelligent customer service. Lead or deeply contribute to intelligent traffic management and AIOps platform initiatives within the audio/video and CDN domain, designing systems that manage millions of machines and handle millions of concurrent requests, with accountability for stability, performance, and scalability. Drive engineering best practices including observability, SLO/SLI frameworks, change management, and capacity planning to ensure high code quality and system reliability. Stay on top of industry trends, rapidly validate and productionize emerging technologies in AI Coding, AIOps, and related fields, empowering the team to improve overall R&D efficiency.
About the Team The Video & Edge division owns ByteDance's media content distribution infrastructure and technology platform, powering VOD, live streaming, real-time communication, image processing, and other rich-media services across all ByteDance products. The proven technologies and tools developed through this journey are also offered externally via Volcano Engine or BytePlus, providing video cloud products and services to customers across industries. Our mission is to deliver the lowest-cost, highest-quality, lowest-latency, and most secure and reliable rich-media content distribution solutions — helping our business partners reduce costs, boost efficiency, and achieve sustainable growth. Responsibilities - Own the architecture design and core capability development of intelligent tools and platforms for ByteDance's global Video & Edge business, including but not limited to AI Agent systems and AIOps platforms — covering end-to-end solution design, technology selection, and production delivery. - Lead the design and development of an intelligent Agent framework built on Large Language Models (LLMs), leveraging Prompt Engineering, RAG, MCP, Skill, Function Calling, and related techniques to deliver end-to-end intelligent fault diagnosis, root cause analysis, automated remediation, and intelligent customer service — with a focus on continuously improving accuracy and reliability. - Lead or deeply contribute to intelligent traffic management and AIOps platform initiatives within the audio/video and CDN domain, participating in the design and optimization of systems managing millions of machines and handling millions of concurrent requests, with accountability for system stability, performance, and scalability. - Drive engineering best practices including observability, SLO/SLI frameworks, change management, and capacity planning to ensure high code quality and system reliability. - Stay on top of industry trends, rapidly validate and productionize emerging technologies in AI Coding, AIOps, and related fields, empowering the team to improve overall R&D efficiency.