Bulge Bracket Investment Banks

Pyspark Data Engineer

at Citi

Mid LevelNo visa sponsorshipData Engineering

Posted 3 days ago

No clicks

Compensation: Not specified
City: Chennai
Country: India

Citi is seeking a Python/PySpark Data Engineer to design and implement data migration, profiling, and processing pipelines on large-scale data platforms. The role focuses on PySpark distributed processing, SQL queries (Oracle), JDBC integration, and real-time streaming. You'll collaborate with data architects, data engineers, and business stakeholders to translate requirements into robust data solutions while ensuring data quality and performance.

Pyspark Data Engineer

Apply (opens in new window)

Save

Job Req Id:

26936201

Location(s):

Chennai, Tamil Nadu, India

Job Type:

Hybrid

Posted:

Feb. 16, 2026

Discover your future at Citi

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you’ll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

We are seeking a highly motivated and intuitive Python Developer to join our dynamic team, focusing on critical data migration and profiling initiatives. The ideal candidate will be a self-starter with strong engineering principles, capable of designing and implementing robust solutions for handling large datasets and complex data flows. This role offers an exciting opportunity to work on challenging projects that drive significant impact within our data ecosystem.

Responsibilities:

Develop, test, and deploy high-quality Python code for data migration, data profiling, and data processing.
Design and implement scalable solutions for working with large and complex datasets, ensuring data integrity and performance.
Utilize PySpark for distributed data processing and analytics on large-scale data platforms.
Develop and optimize SQL queries for various database systems, including Oracle, to extract, transform, and load data efficiently.
Integrate Python applications with JDBC-compliant databases (e.g., Oracle) for seamless data interaction.
Implement data streaming solutions to process real-time or near real-time data efficiently.
Perform in-depth data analysis using Python libraries, especially Pandas, to understand data characteristics, identify anomalies, and support profiling efforts.
Collaborate with data architects, data engineers, and business stakeholders to understand requirements and translate them into technical specifications.
Contribute to the design and architecture of data solutions, ensuring best practices in data management and engineering.
Troubleshoot and resolve technical issues related to data pipelines, performance, and data quality.

Qualifications:

4-7 years of relevant experience in the Financial Service industry
Strong Proficiency in Python:
Excellent command of Python programming, including object-oriented principles, data structures, and algorithms.
PySpark Experience:
Demonstrated experience with PySpark for big data processing and analysis.
Database Expertise:
Proven experience working with relational databases, specifically Oracle, andconnecting applications using JDBC.
SQL Mastery:
Advanced SQL querying skills for complex data extraction, manipulation, andoptimization.
Big Data Handling:
Experience in working with and processing large datasets efficiently.
Data Streaming:
Familiarity with data streaming concepts and technologies (e.g., Kafka, SparkStreaming) for processing continuous data flows.
Data Analysis Libraries:
Proficient in using data analysis libraries such as Pandas for data manipulationand exploration.
Software Engineering Principles:
Solid understanding of software engineering best practices,including version control (Git), testing, and code review.
Problem-Solving:
Intuitive problem-solver with a self-starter mindset and the ability to work independently and as part of a team.

Education:

Bachelor’s degree/University degree or equivalent experience

This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required.

Preferred Skills & Qualifications (Good to Have):
Experience in developing and maintaining reusable Python packages or libraries for data engineering tasks.
Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and their data services.
Knowledge of data warehousing concepts and ETL/ELT processes.
Experience with CI/CD pipelines for automated deployment.

------------------------------------------------------

Job Family Group:

Technology

------------------------------------------------------

Job Family:

Applications Development

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Most Relevant Skills

Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills

For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.

View Citi’s EEO Policy Statement and the Know Your Rights poster.

Apply (opens in new window)

Save

Back to all Data Engineering jobs

Apply now

Bulge Bracket Investment Banks