Premium Employer

i

This company page is being actively managed by Persistent Systems Team. If you also belong to the team, you can get access from here

Persistent Systems Verified Tick Work with us arrow

Compare button icon Compare button icon Compare

Filter interviews by

Persistent Systems Senior Data Engineer Interview Questions and Answers

Updated 27 Sep 2024

15 Interview questions

A Senior Data Engineer was asked 10mo ago
Q. How do you merge two schemas in PySpark?
Ans. 

Merging two schemas in PySpark involves combining DataFrames with different structures into a unified format.

  • Use the `unionByName()` method to merge DataFrames with different column names.

  • Example: df1.unionByName(df2, allowMissingColumns=True) merges df1 and df2, filling missing columns with nulls.

  • For schema evolution, use `mergeSchema` option when reading from Parquet files.

  • Example: spark.read.option('mergeSchema...

A Senior Data Engineer was asked 10mo ago
Q. What is SCD?
Ans. 

SCD stands for Slowly Changing Dimension, a concept in data warehousing to track changes in data over time.

  • SCD is used to maintain historical data in a data warehouse.

  • There are three types of SCD - Type 1, Type 2, and Type 3.

  • Type 1 SCD overwrites old data with new data.

  • Type 2 SCD creates a new record for each change, preserving history.

  • Type 3 SCD maintains both old and new values in the same record.

  • SCD is importan...

Senior Data Engineer Interview Questions Asked at Other Companies

asked in 7 Eleven
Q1. Write a query to get the customer with the highest total order va ... read more
asked in 7 Eleven
Q2. There are 10 million records in the table and the schema does not ... read more
asked in KFintech
Q3. Given infinite coins of some currency of denominations : 1,2,5,10 ... read more
asked in 7 Eleven
Q4. How do you handle data pipelines when the schema information keep ... read more
asked in 7 Eleven
Q5. Difference between Parquet and ORC file. Why industry uses parque ... read more
A Senior Data Engineer was asked 10mo ago
Q. What is the difference between repartition and coalesce?
Ans. 

Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

  • Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.

  • Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.

  • Repartition is typi...

A Senior Data Engineer was asked 10mo ago
Q. What happens when we enforce schema?
Ans. 

Enforcing schema ensures that data conforms to a predefined structure and rules.

  • Ensures data integrity by validating incoming data against predefined schema

  • Helps in maintaining consistency and accuracy of data

  • Prevents data corruption and errors in data processing

  • Can lead to rejection of data that does not adhere to the schema

A Senior Data Engineer was asked 10mo ago
Q. Using two tables, how would you identify the different records resulting from different types of joins?
Ans. 

To find different records for different joins using two tables

  • Use the SQL query to perform different joins like INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN

  • Identify the key columns in both tables to join on

  • Select the columns from both tables and use WHERE clause to filter out the different records

A Senior Data Engineer was asked 10mo ago
Q. Find the top 5 countries with the highest population using Spark and SQL.
Ans. 

Use Spark and SQL to find the top 5 countries with the highest population.

  • Use Spark to load the data and perform data processing.

  • Use SQL queries to group by country and sum the population.

  • Order the results in descending order and limit to top 5.

  • Example: SELECT country, SUM(population) AS total_population FROM table_name GROUP BY country ORDER BY total_population DESC LIMIT 5

Persistent Systems HR Interview Questions

73 questions and answers

Q. Why is persistence important?
Q. What type of skills are you developing?
Q. How do you rate yourself on negotiation skills?
A Senior Data Engineer was asked 10mo ago
Q. What is the best approach to determine if a data frame is empty?
Ans. 

Use the len() function to check the length of the data frame.

  • Use len() function to get the number of rows in the data frame.

  • If the length is 0, then the data frame is empty.

  • Example: if len(df) == 0: print('Data frame is empty')

Are these interview questions helpful?
A Senior Data Engineer was asked 10mo ago
Q. How does DAG handle fault tolerance?
Ans. 

DAGs handle fault tolerance by rerunning failed tasks and maintaining task dependencies.

  • DAGs rerun failed tasks automatically to ensure completion.

  • DAGs maintain task dependencies to ensure proper sequencing.

  • DAGs can be configured to retry failed tasks a certain number of times before marking them as failed.

A Senior Data Engineer was asked 10mo ago
Q. How do you handle incremental data?
Ans. 

Incremental data is handled by identifying new data since the last update and merging it with existing data.

  • Identify new data since last update

  • Merge new data with existing data

  • Update data warehouse or database with incremental changes

A Senior Data Engineer was asked 10mo ago
Q. How do you decide on the number of cores and worker nodes?
Ans. 

Cores and worker nodes are decided based on the workload requirements and scalability needs of the data processing system.

  • Consider the size and complexity of the data being processed

  • Evaluate the processing speed and memory requirements of the tasks

  • Take into account the parallelism and concurrency needed for efficient data processing

  • Monitor the system performance and adjust cores and worker nodes as needed

Persistent Systems Senior Data Engineer Interview Experiences

2 interviews found

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Naukri.com and was interviewed in Aug 2024. There were 2 interview rounds.

Round 1 - Technical 

(12 Questions)

  • Q1. Tell me about yourself and Project
  • Ans. 

    I am a Senior Data Engineer with experience in developing data pipelines and optimizing data storage for various projects.

    • Developed data pipelines using Apache Spark for real-time data processing

    • Optimized data storage using technologies like Hadoop and AWS S3

    • Worked on a project to analyze customer behavior and improve marketing strategies

  • Answered by AI
  • Q2. What was you day-to-day job in your project
  • Ans. 

    My day-to-day job in the project involved designing and implementing data pipelines, optimizing data workflows, and collaborating with cross-functional teams.

    • Designing and implementing data pipelines to extract, transform, and load data from various sources

    • Optimizing data workflows to improve efficiency and performance

    • Collaborating with cross-functional teams including data scientists, analysts, and business stakeholde...

  • Answered by AI
  • Q3. Spark Architecture
  • Q4. How DAG handle Fault tolerance?
  • Ans. 

    DAGs handle fault tolerance by rerunning failed tasks and maintaining task dependencies.

    • DAGs rerun failed tasks automatically to ensure completion.

    • DAGs maintain task dependencies to ensure proper sequencing.

    • DAGs can be configured to retry failed tasks a certain number of times before marking them as failed.

  • Answered by AI
  • Q5. What is shuffling? How to Handle Shuffling?
  • Ans. 

    Shuffling is the process of redistributing data across partitions in a distributed computing environment.

    • Shuffling is necessary when data needs to be grouped or aggregated across different partitions.

    • It can be handled efficiently by minimizing the amount of data being shuffled and optimizing the partitioning strategy.

    • Techniques like partitioning, combiners, and reducers can help reduce the amount of shuffling in MapRed...

  • Answered by AI
  • Q6. What is the difference between repartition and Coelsce?
  • Ans. 

    Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

    • Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.

    • Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.

    • Repartition is typically...

  • Answered by AI
  • Q7. How do you handle Incremental data?
  • Ans. 

    Incremental data is handled by identifying new data since the last update and merging it with existing data.

    • Identify new data since last update

    • Merge new data with existing data

    • Update data warehouse or database with incremental changes

  • Answered by AI
  • Q8. What is SCD ??
  • Ans. 

    SCD stands for Slowly Changing Dimension, a concept in data warehousing to track changes in data over time.

    • SCD is used to maintain historical data in a data warehouse.

    • There are three types of SCD - Type 1, Type 2, and Type 3.

    • Type 1 SCD overwrites old data with new data.

    • Type 2 SCD creates a new record for each change, preserving history.

    • Type 3 SCD maintains both old and new values in the same record.

    • SCD is important for...

  • Answered by AI
  • Q9. Scenerio based questions related to Spark ?
  • Q10. Two SQL Codes and Two Python codes like reverse a string ?
  • Ans. 

    Reverse a string using SQL and Python codes.

    • In SQL, use the REVERSE function to reverse a string.

    • In Python, use slicing with a step of -1 to reverse a string.

  • Answered by AI
  • Q11. Find top 5 countries with highest population in Spark and SQL
  • Ans. 

    Use Spark and SQL to find the top 5 countries with the highest population.

    • Use Spark to load the data and perform data processing.

    • Use SQL queries to group by country and sum the population.

    • Order the results in descending order and limit to top 5.

    • Example: SELECT country, SUM(population) AS total_population FROM table_name GROUP BY country ORDER BY total_population DESC LIMIT 5

  • Answered by AI
  • Q12. Using two tables find the different records for different joins
  • Ans. 

    To find different records for different joins using two tables

    • Use the SQL query to perform different joins like INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN

    • Identify the key columns in both tables to join on

    • Select the columns from both tables and use WHERE clause to filter out the different records

  • Answered by AI
Round 2 - One-on-one 

(7 Questions)

  • Q1. What is a catalyst optimiser? How it works?
  • Ans. 

    A catalyst optimizer is a query optimization tool used in Apache Spark to improve performance by generating an optimal query plan.

    • Catalyst optimizer is a rule-based query optimization framework in Apache Spark.

    • It leverages rules to transform the logical query plan into a more optimized physical plan.

    • The optimizer applies various optimization techniques like predicate pushdown, constant folding, and join reordering.

    • By o...

  • Answered by AI
  • Q2. Tell me about the optimization you used in your project.
  • Ans. 

    Used query optimization techniques to improve performance in database queries.

    • Utilized indexing to speed up search queries.

    • Implemented query caching to reduce redundant database calls.

    • Optimized SQL queries by restructuring joins and subqueries.

    • Utilized database partitioning to improve query performance.

    • Used query profiling tools to identify and optimize slow queries.

  • Answered by AI
  • Q3. Pyspark question related to merging two schemas?
  • Ans. 

    Merging two schemas in PySpark involves combining DataFrames with different structures into a unified format.

    • Use the `unionByName()` method to merge DataFrames with different column names.

    • Example: df1.unionByName(df2, allowMissingColumns=True) merges df1 and df2, filling missing columns with nulls.

    • For schema evolution, use `mergeSchema` option when reading from Parquet files.

    • Example: spark.read.option('mergeSchema', 't...

  • Answered by AI
  • Q4. What is the best approach to finding whether the data frame is empty or not?
  • Ans. 

    Use the len() function to check the length of the data frame.

    • Use len() function to get the number of rows in the data frame.

    • If the length is 0, then the data frame is empty.

    • Example: if len(df) == 0: print('Data frame is empty')

  • Answered by AI
  • Q5. Spark Architecture
  • Q6. How do you decide on cores and worker nodes?
  • Ans. 

    Cores and worker nodes are decided based on the workload requirements and scalability needs of the data processing system.

    • Consider the size and complexity of the data being processed

    • Evaluate the processing speed and memory requirements of the tasks

    • Take into account the parallelism and concurrency needed for efficient data processing

    • Monitor the system performance and adjust cores and worker nodes as needed

  • Answered by AI
  • Q7. What happens when we enforce schema ?
  • Ans. 

    Enforcing schema ensures that data conforms to a predefined structure and rules.

    • Ensures data integrity by validating incoming data against predefined schema

    • Helps in maintaining consistency and accuracy of data

    • Prevents data corruption and errors in data processing

    • Can lead to rejection of data that does not adhere to the schema

  • Answered by AI

Interview Preparation Tips

Topics to prepare for Persistent Systems Senior Data Engineer interview:
  • SQL
  • Pyspark
  • Python
  • Spark
  • Database
Interview preparation tips for other job seekers - Be prepared with Spark core concepts and SQL Coding

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Naukri.com and was interviewed before Jun 2023. There were 3 interview rounds.

Round 1 - One-on-one 

(2 Questions)

  • Q1. It’s general type of question
  • Q2. Experience n all
Round 2 - Group Discussion 

It’s just reasoning type questions.

Round 3 - Technical 

(2 Questions)

  • Q1. What is ssis? How we use
  • Ans. 

    SSIS stands for SQL Server Integration Services, a tool provided by Microsoft for data integration and workflow applications.

    • SSIS is a platform for building high-performance data integration and workflow solutions.

    • It allows you to create packages that move data from various sources to destinations.

    • SSIS includes a visual design interface for creating, monitoring, and managing data integration processes.

    • You can use SSIS ...

  • Answered by AI
  • Q2. When we use ssis packages? Difference between union merge
  • Ans. 

    SSIS packages are used for ETL processes in SQL Server. Union combines datasets vertically, while merge combines them horizontally.

    • SSIS packages are used for Extract, Transform, Load (ETL) processes in SQL Server.

    • Union in SSIS combines datasets vertically, stacking rows on top of each other.

    • Merge in SSIS combines datasets horizontally, matching rows based on specified columns.

    • Union All in SSIS combines datasets vertica...

  • Answered by AI

Skills evaluated in this interview

What people are saying about Persistent Systems

View All
damodharg
Verified Icon
2w
works at
IBM
AI judging interviews: fair or totally flawed?
The first round was fine, but round two has me questioning the whole process. I answered scenario-based questions in Excel, giving verbal explanations too. But only the Excel entries were assessed by ChatGPT, ignoring my spoken context. Plus, I only had 15 secs to respond! This brings up some serious questions: - Can AI really evaluate without the full picture? - What’s the point of the panel if AI makes the final call? - Why wasn’t the use of AI disclosed? If I’d known AI was assessing me, I’d have used more technical terms and formatting. We need a transparent process: - Guidelines for capturing all inputs. - Disclosure of AI tools and their scope. - A mix of human and AI judgment. - Feedback that reflects the full context. Apart from one point in feedback surprises me is interviewer don't need what you say. I believe interview is an interactive assessment that to face to face. If AI does, keep virtual AI interview it can parallelly do interview in a single shot. Saves our time
Got a question about Persistent Systems?
Ask anonymously on communities.

Interview questions from similar companies

I applied via Walk-in and was interviewed before Apr 2021. There were 3 interview rounds.

Round 1 - Aptitude Test 

Technical assessment - java

Round 2 - Technical 

(1 Question)

  • Q1. Core-java depth questions
Round 3 - HR 

(2 Questions)

  • Q1. What are your strengths and weaknesses?
  • Q2. Tell me about yourself.

Interview Preparation Tips

Interview preparation tips for other job seekers - Good to prepare java collections, thread concepts, oops concept

I applied via Campus Placement and was interviewed before Nov 2020. There were 3 interview rounds.

Interview Questionnaire 

1 Question

  • Q1. Basic project description and data base design

Interview Preparation Tips

Interview preparation tips for other job seekers - Basic level interview question

I applied via Campus Placement and was interviewed in Apr 2020. There were 4 interview rounds.

Interview Questionnaire 

3 Questions

  • Q1. Code for the pattern 1 12 123 1234 12345
  • Ans. 

    Code for the pattern 1 12 123 1234 12345

    • Use nested loops

    • Print numbers in ascending order

    • Add a line break after each row

  • Answered by AI
  • Q2. Your project
  • Q3. Some maths questions

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare for pattern questions and know about sorting algorithms

I applied via Naukri.com and was interviewed before Sep 2020. There were 3 interview rounds.

Interview Questionnaire 

2 Questions

  • Q1. Multithreading in Java
  • Ans. 

    Multithreading in Java allows for concurrent execution of multiple threads within a single program.

    • Multithreading can improve performance by allowing multiple tasks to be executed simultaneously.

    • Java provides built-in support for multithreading through the Thread class and Runnable interface.

    • Synchronization is important to prevent race conditions and ensure thread safety.

    • Examples of multithreading in Java include GUI a...

  • Answered by AI
  • Q2. LinkedList custom implementation, springBoot, Microservices, kafka, core Java, DS questions

Interview Preparation Tips

Interview preparation tips for other job seekers - Keep practicing DS, this will boost your confidence.

Skills evaluated in this interview

I applied via Walk-in and was interviewed before Oct 2020. There were 3 interview rounds.

Interview Questionnaire 

1 Question

  • Q1. The questions are on the basics of C#, OOPS. It depends on the project.

Interview Preparation Tips

Interview preparation tips for other job seekers - Be strong on the basics.

I applied via Campus Placement and was interviewed before Oct 2019. There were 4 interview rounds.

Interview Questionnaire 

2 Questions

  • Q1. About programming skills and strengths weakness
  • Q2. Tell me about urself
  • Ans. 

    I'm a passionate software engineer with a strong background in full-stack development and a love for solving complex problems.

    • Graduated with a degree in Computer Science from XYZ University.

    • Worked at ABC Corp, where I developed a web application that increased user engagement by 30%.

    • Proficient in languages like JavaScript, Python, and Java, with experience in frameworks such as React and Django.

    • Enjoy collaborating in a...

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Be bold enough to speak and prove your own skills in front of HR

I applied via Recruitment Consulltant and was interviewed before Apr 2021. There were 3 interview rounds.

Round 1 - Aptitude Test 

Standard Aptitude questions

Round 2 - Coding Test 

Based on strings and array

Round 3 - One-on-one 

(1 Question)

  • Q1. Background questions and technical questions related to problem solved

Interview Preparation Tips

Interview preparation tips for other job seekers - Mindtree is one of the best companies, i really miss working there.
It's more on your self development and your carrier.

I applied via Naukri.com and was interviewed before Oct 2020. There were 5 interview rounds.

Interview Questionnaire 

1 Question

  • Q1. Basic Question about your job profile.Question about your previous project

Interview Preparation Tips

Interview preparation tips for other job seekers - Be confident and if you dot know answer of any question then sey it instead of giving wrong answer

Persistent Systems Interview FAQs

How many rounds are there in Persistent Systems Senior Data Engineer interview?
Persistent Systems interview process usually has 2-3 rounds. The most common rounds in the Persistent Systems interview process are One-on-one Round, Technical and Group Discussion.
How to prepare for Persistent Systems Senior Data Engineer interview?
Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at Persistent Systems. The most common topics and skills that interviewers at Persistent Systems expect are Python, Java, Kafka, Spark and HBase.
What are the top questions asked in Persistent Systems Senior Data Engineer interview?

Some of the top questions asked at the Persistent Systems Senior Data Engineer interview -

  1. What is the best approach to finding whether the data frame is empty or n...read more
  2. What is the difference between repartition and Coels...read more
  3. Two SQL Codes and Two Python codes like reverse a strin...read more

Tell us how to improve this page.

Overall Interview Experience Rating

4/5

based on 2 interview experiences

Difficulty level

Moderate 100%

Duration

Less than 2 weeks 100%
View more
Join Persistent Systems See Beyond, Rise Above
Persistent Systems Senior Data Engineer Salary
based on 44 salaries
₹6.2 L/yr - ₹25.7 L/yr
22% less than the average Senior Data Engineer Salary in India
View more details

Persistent Systems Senior Data Engineer Reviews and Ratings

based on 3 reviews

3.3/5

Rating in categories

3.3

Skill development

2.9

Work-life balance

2.9

Salary

2.1

Job security

2.9

Company culture

2.1

Promotions

2.1

Work satisfaction

Explore 3 Reviews and Ratings
Senior Software Engineer
4.7k salaries
unlock blur

₹5.9 L/yr - ₹19.6 L/yr

Software Engineer
4.7k salaries
unlock blur

₹4.7 L/yr - ₹11.3 L/yr

Lead Software Engineer
3.8k salaries
unlock blur

₹9.2 L/yr - ₹17.5 L/yr

Lead Engineer
3.6k salaries
unlock blur

₹13.7 L/yr - ₹25.4 L/yr

Project Lead
2.3k salaries
unlock blur

₹21.4 L/yr - ₹36 L/yr

Explore more salaries
Compare Persistent Systems with

Cognizant

3.7
Compare

TCS

3.5
Compare

IBM

3.9
Compare

LTIMindtree

3.7
Compare
write
Share an Interview