
Data Engineering Senior Associate
@PwC Service Delivery Center posted 3 months ago Shortlist Email JobJob Description
Line of Service :Advisory
Industry/Sector N:ot Applicable
Specialism :Advisory – Other
Management Level :Senior Associate
Job Description & Summary
At PwC, our people in data and analytics engineering focus on leveraging advanced technologies and techniques to design and develop robust data solutions for clients. They play a crucial role in transforming raw data into actionable insights, enabling informed decision-making and driving business growth.
In data engineering at PwC, you will focus on designing and building data infrastructure and systems to enable efficient data processing and analysis. You will be responsible for developing and implementing data pipelines, data integration, and data transformation solutions.
Minimum years’ experience required 4-7 of experience in Programming Language (Any of Python, Scala, Java) (Python Preferred), Apache Spark, ADF, Azure Databricks, Postgres, Knowhow of NoSQL is desirable, ETL (Batch/Streaming), Git , Familiarity with Agile.
Required Qualification: BE / master’s in design / B – Design / B.Tech / HCI – Certification (Preferred)
Job Description and Key Responsibilities
- Design, develop, and maintain robust, scalable ETL pipelines using tools like Apache Spark, Kafka, and other big data technologies.
- Data Architecture design – Design scalable and reliable data architectures, including Lakehouse, hybrid batch/streaming systems, Lambda, and Kappa architectures.
- Demonstrate proficiency in Python, PySpark, Spark, and a solid understanding of design patterns (e.g., SOLID).
- Ingest, process, and store structured, semi-structured, and unstructured data from various sources.
- Cloud experiece: Hands-on experience with setting up data pipelines using cloud offerings (AWS, Azure, GCP).
- Optimize ETL processes to ensure scalability and efficiency.
- Work with various file formats, such as JSON, CSV, Parquet, and Avro.
- Possess deep knowledge of RDBMS, NoSQL databases, and CAP theorem principles
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and optimize data models for performance and scalability.
- Document data processes, architectures, and models comprehensively to facilitate cross-team understanding and maintenance.
- Implement and maintain CI/CD pipelines using tools like Docker, Kubernetes, and GitHub.
- Ensure data quality, integrity, and security across all systems and processes.
- Implement and monitor data governance best practices.
- Stay up-to-date with emerging data technologies and trends, and identify opportunities for innovation and improvement.
- Knowledge of other Cloud Data/Integration/Orchestration Platforms- Snowflake, Databricks, Azure Data Factory etc. is good to have
GenAI Skills
- Leverage Large Language Models (LLMs) to generate and manage synthetic datasets for training AI models.
- Integrate Generative AI tools into data pipelines while critically analyzing and validating Gen AI-generated solutions to ensure reliability and adherence to best practices.
Other jobs you may like
-
Proto Management – Sheet Metal
- @ Hero MotoCorp Ltd
- Jaipur, Rajasthan, India