Prepare for Your Wadhwani Foundation Interview with Real Experiences!
View interviews76 Wadhwani Foundation Jobs
Wadhwani Foundation - Senior Data Engineer - ETL (5-10 yrs)
Wadhwani Foundation
posted 1 week ago
Flexible timing
Key skills for the job
The Role Context :
We are seeking a highly motivated and detail-oriented individual to join our team as a Data Engineer with experience in the designing, constructing, and maintaining the architecture and infrastructure necessary for data generation, storage and processing.
Key Responsibilities :
- Data Architecture Design : Design, develop, and maintain scalable data pipelines and infrastructure for ingesting, processing, storing, and analyzing large volumes of data efficiently. This involves understanding business requirements and translating them into technical solutions.
- Data Integration : Integrate data from various sources such as databases, APIs, streaming platforms, and third-party systems.
- Should ensure the data is collected reliably and efficiently, maintaining data quality and integrity throughout the process as per the Ministries/government data standards.
- Data Modeling : Design and implement data models to organize and structure data for efficient storage and retrieval.
- They use techniques such as dimensional modeling, normalization, and denormalization depending on the specific requirements of the project.
- Data Pipeline Development/ ETL (Extract, Transform, Load) : Develop data pipeline/ETL processes to extract data from source systems, transform it into the desired format, and load it into the target data systems.
- This involves writing scripts or using ETL tools or building data pipelines to automate the process and ensure data accuracy and consistency.
- Data Quality and Governance : Implement data quality checks and data governance policies to ensure data accuracy, consistency, and compliance with regulations.
- Should be able to design and track data lineage, data stewardship, metadata management, building business glossary etc.
- Data lakes or Warehousing : Design and maintain data lakes and data warehouse to store and manage structured data from relational databases, semi-structured data like JSON or XML, and unstructured data such as text documents, images, and videos at any scale.
- Should be able to integrate with big data processing frameworks such as Apache Hadoop, Apache Spark, and Apache Flink, as well as with machine learning and data visualization tools.
- Data Security : Implement security practices, technologies, and policies designed to protect data from unauthorized access, alteration, or destruction throughout its lifecycle.
- It should include data access, encryption, data masking and anonymization, data loss prevention, compliance, and regulatory requirements such as DPDP, GDPR, etc.
- Database Management : Administer and optimize databases, both relational and NoSQL, to manage large volumes of data effectively.
- Data Migration : Plan and execute data migration projects to transfer data between systems while ensuring data consistency and minimal downtime.
- Performance Optimization : Optimize data pipelines and queries for performance and scalability.
- Identify and resolve bottlenecks, tune database configurations, and implement caching and indexing strategies to improve data processing speed and efficiency.
- Collaboration : Collaborate with data scientists, analysts, and other stakeholders to understand their data requirements and provide them with access to the necessary data resources.
- They also work closely with IT operations teams to deploy and maintain data infrastructure in production environments.
- Documentation and Reporting : Document their work including data models, data pipelines/ETL processes, and system configurations.
- Create documentation and provide training to other team members to ensure the sustainability and maintainability of data systems.
- Continuous Learning : Stay updated with the latest technologies and trends in data engineering and related fields.
- Should participate in training programs, attend conferences, and engage with the data engineering community to enhance their skills and knowledge.
Desired Skills/ Competencies :
- Education : A Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or equivalent with at least 5 to 10 years of experience.
- Database Management : Strong expertise in working with databases, such as SQL databases (e.g, MySQL, PostgreSQL) and NoSQL databases (e.g, MongoDB, Cassandra).
- Big Data Technologies : Familiarity with big data technologies, such as Apache Hadoop, Spark, and related ecosystem components, for processing and analyzing large-scale datasets.
- ETL Tools : Experience with ETL tools (e.g, Apache NiFi, Talend, Apache Airflow, Talend Open Studio, Pentaho, Infosphere) for designing and orchestrating data workflows.
- Data Modeling and Warehousing : Knowledge of data modeling techniques and experience with data warehousing solutions (e.g, Amazon Redshift, Google BigQuery, Snowflake).
- Data Governance and Security : Understanding of data governance principles and best practices for ensuring data quality and security.
- Cloud Computing : Experience with cloud platforms (e.g, AWS, Azure, Google Cloud) and their data services for scalable and cost-effective data storage and processing.
- Streaming Data Processing : Familiarity with real-time data processing frameworks (e.g, Apache Kafka, Apache Flink) for handling streaming data.
- Familiar with Python Programming and Prompt Engineering.
KPIs :
- Data Pipeline Efficiency : Measure the efficiency of data pipelines in terms of data processing time, throughput, and resource utilization.
- KPIs could include average time to process data, data ingestion rates, and pipeline latency.
- Data Quality Metrics : Track data quality metrics such as completeness, accuracy, consistency, and timeliness of data.
- KPIs could include data error rates, missing values, data duplication rates, and data validation failures.
- System Uptime and Availability : Monitor the uptime and availability of data infrastructure, including databases, data warehouses, and data processing systems.
- KPIs could include system uptime percentage, mean time between failures (MTBF), and mean time to repair (MTTR).
- Data Storage Efficiency : Measure the efficiency of data storage systems in terms of storage utilization, data compression rates, and data retention policies.
- KPIs could include storage utilization rates, data compression ratios, and data storage costs per unit.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Your Wadhwani Foundation Interview with Real Experiences!
View interviews5-10 Yrs
Data Engineering, ETL Testing, DBMS +7 more
1-3 Yrs
Kolkata, Mumbai, New Delhi +4 more
Animation, QA Engineering, Photoshop +5 more
15-18 Yrs
Javascript, Engineering Management, Full Stack +2 more
15-17 Yrs
Azure DevOps, Engineering Management, MongoDB +3 more
6-10 Yrs
Artificial Intelligence, Chatgpt, Brokerage +1 more
3-7 Yrs
Data Analytics, Artificial Intelligence, Machine Learning +3 more
4-8 Yrs
Manual Testing, Performance Testing, Selenium Testing +5 more
4-8 Yrs
Manual Testing, Postman, Automation Testing +5 more
4-7 Yrs
Manual Testing, Python, SQL +5 more