LOG IN
SIGN UP
Tech Job Finder - Find Software, Technology Sales and Product Manager Jobs.
Sign In
OR continue with e-mail and password
E-mail address
Password
Don't have an account?
Reset password
Join Tech Job Finder
OR continue with e-mail and password
E-mail address
First name
Last name
Username
Password
Confirm Password
How did you hear about us?
By signing up, you agree to our Terms & Conditions and Privacy Policy.

Principal Software Engineer

at Microsoft

Back to all Data Engineering jobs
Microsoft logo
Industry not specified

Principal Software Engineer

at Microsoft

Tech LeadNo visa sponsorshipData Engineering

Posted 5 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
Not specified

Lead the design and implementation of real-time streaming ETL and feature pipelines (Flink/Spark) to feed online stores, caches, and ML inference serving. Build and operate reliable messaging with Kafka/Pulsar and own data contracts, backfill workflows, and SLOs with strong observability. Optimize end-to-end performance and cost across compute, storage, and serving, and collaborate with applied scientists on feature/embedding definitions and validation. Ship CI/CD, testing, and incident response practices to maintain production quality.

Overview

Modern ads platforms run on always-on, real-time data: streaming events, feature computation, near-real-time aggregations, and low-latency serving to power ML models that operate at massive scale under strict freshness, cost, and reliability requirements.

Microsoft Ads builds and operates large-scale, latency-sensitive systems that serve billions of requests. We are looking for a Principal Software Engineer who is hands-on with production coding and system design to build the real-time data pipelines and feature/embedding materialization systems that feed online stores/caches and integrate tightly with ML inference serving.

This role is ideal for engineers who enjoy:

  • building robust streaming + ETL systems (correctness, idempotency, backfills, late data),
  • owning SLOs with strong observability and operational maturity,
  • and optimizing end-to-end performance and cost across compute, storage, and serving integrations.

Primary success metrics are freshness, correctness, latency, reliability, and cost in production.



Responsibilities
  • Design and implement real-time streaming ETL / feature pipelines (e.g., Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints.
  • Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling).
  • Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout.
  • Implement production-grade backfill/replay workflows
  • Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness.
  • Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls, online/offline consistency patterns).
  • Partner with applied scientists on feature/embedding definitions, validation, and end-to-end quality measurement.
  • Optimize end-to-end performance and efficiency: CPU/memory/I/O, serialization, caching, network overhead, concurrency, and pipeline compute cost.
  • Contribute to serving/inference integrations where needed (e.g., Triton/ONNX Runtime/TensorRT) including batching and latency/cost tradeoffs.
  • Ship safely with CI/CD, automated testing (unit/integration/data quality), and operational playbooks/runbooks.


Qualifications

Required Qualification:

  • Bachelor’s or Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, with 8+ years of related experience.
  • Strong programming skills in language C++,C# or Python (at least one required).
  • Hands-on experience in one or more:
    • Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming),
    • Distributed systems engineering with strong reliability and operational rigor,
    • Messaging systems such as Kafka/Pulsar.
  • Experience operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks).
  • Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana.
  • Ability to debug complex production issues using logs/metrics/traces and performance profiling.
  • Strong communication and collaboration skills, with experience working across engineering, applied science/ML, and product/business stakeholders.

Preferred Qualifications:

  • Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation).
  • Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing.
  • Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction).
  • understanding of pipeline correctness patterns: idempotency, dedup, watermarking, late data, exactly-once vs at-least-once tradeoffs.
  • Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms.
  • Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are jointly optimized.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.



Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Principal Software Engineer

at Microsoft

Back to all Data Engineering jobs
Microsoft logo
Industry not specified

Principal Software Engineer

at Microsoft

Tech LeadNo visa sponsorshipData Engineering

Posted 5 hours ago

No clicks

Compensation
Not specified

Currency: Not specified

City
Not specified
Country
Not specified

Lead the design and implementation of real-time streaming ETL and feature pipelines (Flink/Spark) to feed online stores, caches, and ML inference serving. Build and operate reliable messaging with Kafka/Pulsar and own data contracts, backfill workflows, and SLOs with strong observability. Optimize end-to-end performance and cost across compute, storage, and serving, and collaborate with applied scientists on feature/embedding definitions and validation. Ship CI/CD, testing, and incident response practices to maintain production quality.

Overview

Modern ads platforms run on always-on, real-time data: streaming events, feature computation, near-real-time aggregations, and low-latency serving to power ML models that operate at massive scale under strict freshness, cost, and reliability requirements.

Microsoft Ads builds and operates large-scale, latency-sensitive systems that serve billions of requests. We are looking for a Principal Software Engineer who is hands-on with production coding and system design to build the real-time data pipelines and feature/embedding materialization systems that feed online stores/caches and integrate tightly with ML inference serving.

This role is ideal for engineers who enjoy:

  • building robust streaming + ETL systems (correctness, idempotency, backfills, late data),
  • owning SLOs with strong observability and operational maturity,
  • and optimizing end-to-end performance and cost across compute, storage, and serving integrations.

Primary success metrics are freshness, correctness, latency, reliability, and cost in production.



Responsibilities
  • Design and implement real-time streaming ETL / feature pipelines (e.g., Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints.
  • Build and operate reliable messaging and ingestion with Kafka/Pulsar (partitioning strategy, retries, ordering guarantees, DLQs, backpressure handling).
  • Own data contracts between producers, pipelines, and consumers: schema evolution, versioning, compatibility, validation, and safe rollout.
  • Implement production-grade backfill/replay workflows
  • Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, dashboards, alerting, and incident response readiness.
  • Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls, online/offline consistency patterns).
  • Partner with applied scientists on feature/embedding definitions, validation, and end-to-end quality measurement.
  • Optimize end-to-end performance and efficiency: CPU/memory/I/O, serialization, caching, network overhead, concurrency, and pipeline compute cost.
  • Contribute to serving/inference integrations where needed (e.g., Triton/ONNX Runtime/TensorRT) including batching and latency/cost tradeoffs.
  • Ship safely with CI/CD, automated testing (unit/integration/data quality), and operational playbooks/runbooks.


Qualifications

Required Qualification:

  • Bachelor’s or Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, with 8+ years of related experience.
  • Strong programming skills in language C++,C# or Python (at least one required).
  • Hands-on experience in one or more:
    • Building and operating streaming data pipelines in production (Flink or Spark Structured Streaming),
    • Distributed systems engineering with strong reliability and operational rigor,
    • Messaging systems such as Kafka/Pulsar.
  • Experience operating services with Kubernetes/containers and production readiness practices (deployments, scaling, rollbacks).
  • Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana.
  • Ability to debug complex production issues using logs/metrics/traces and performance profiling.
  • Strong communication and collaboration skills, with experience working across engineering, applied science/ML, and product/business stakeholders.

Preferred Qualifications:

  • Experience with feature stores, embedding pipelines, and online/offline consistency (freshness guarantees, correctness validation).
  • Experience with data lakehouse/table formats and optimizations eg partitioning, compaction, and incremental processing.
  • Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques (batching, request shaping, tail-latency reduction).
  • understanding of pipeline correctness patterns: idempotency, dedup, watermarking, late data, exactly-once vs at-least-once tradeoffs.
  • Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms.
  • Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are jointly optimized.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.



Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

SIMILAR OPPORTUNITIES

No similar jobs available at the moment.