Data Engineering Senior Associate

@PwC Service Delivery Center posted 3 months ago

Post Date : March 5, 2025
3 Click(s)
View(s) 1956

Job Description

Line of Service :Advisory

Industry/Sector N:ot Applicable

Specialism :Advisory – Other

Management Level :Senior Associate

Job Description & Summary

At PwC, our people in data and analytics engineering focus on leveraging advanced technologies and techniques to design and develop robust data solutions for clients. They play a crucial role in transforming raw data into actionable insights, enabling informed decision-making and driving business growth.

In data engineering at PwC, you will focus on designing and building data infrastructure and systems to enable efficient data processing and analysis. You will be responsible for developing and implementing data pipelines, data integration, and data transformation solutions.

Minimum years’ experience required 4-7 of experience in Programming Language (Any of Python, Scala, Java) (Python Preferred), Apache Spark, ADF, Azure Databricks, Postgres, Knowhow of NoSQL is desirable, ETL (Batch/Streaming), Git , Familiarity with Agile.

Required Qualification: BE / master’s in design / B – Design / B.Tech / HCI – Certification (Preferred)

Job Description and Key Responsibilities

Design, develop, and maintain robust, scalable ETL pipelines using tools like Apache Spark, Kafka, and other big data technologies.
Data Architecture design – Design scalable and reliable data architectures, including Lakehouse, hybrid batch/streaming systems, Lambda, and Kappa architectures.
Demonstrate proficiency in Python, PySpark, Spark, and a solid understanding of design patterns (e.g., SOLID).
Ingest, process, and store structured, semi-structured, and unstructured data from various sources.
Cloud experiece: Hands-on experience with setting up data pipelines using cloud offerings (AWS, Azure, GCP).
Optimize ETL processes to ensure scalability and efficiency.
Work with various file formats, such as JSON, CSV, Parquet, and Avro.
Possess deep knowledge of RDBMS, NoSQL databases, and CAP theorem principles
Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and optimize data models for performance and scalability.
Document data processes, architectures, and models comprehensively to facilitate cross-team understanding and maintenance.
Implement and maintain CI/CD pipelines using tools like Docker, Kubernetes, and GitHub.
Ensure data quality, integrity, and security across all systems and processes.
Implement and monitor data governance best practices.
Stay up-to-date with emerging data technologies and trends, and identify opportunities for innovation and improvement.
Knowledge of other Cloud Data/Integration/Orchestration Platforms- Snowflake, Databricks, Azure Data Factory etc. is good to have

GenAI Skills

Leverage Large Language Models (LLMs) to generate and manage synthetic datasets for training AI models.
Integrate Generative AI tools into data pipelines while critically analyzing and validating Gen AI-generated solutions to ensure reliability and adherence to best practices.

Other jobs you may like

Proto Management – Sheet Metal
- @ Hero MotoCorp Ltd
- Jaipur, Rajasthan, India
Full-Time