Filter interviews by
Spark architecture includes driver, cluster manager, and worker nodes for distributed processing.
Spark architecture consists of a driver program that manages the execution of tasks on worker nodes.
Cluster manager is responsible for allocating resources and scheduling tasks across worker nodes.
Worker nodes execute the tasks and store data in memory or disk for processing.
Example: In a Spark application, the driver ...
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Examp...
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data se...
Handling ADF pipelines involves designing, building, and monitoring data pipelines in Azure Data Factory.
Designing data pipelines using ADF UI or code
Building pipelines with activities like copy data, data flow, and custom activities
Monitoring pipeline runs and debugging issues
Optimizing pipeline performance and scheduling triggers
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
I applied via Company Website and was interviewed in Jan 2024. There was 1 interview round.
Spark architecture includes driver, cluster manager, and worker nodes for distributed processing.
Spark architecture consists of a driver program that manages the execution of tasks on worker nodes.
Cluster manager is responsible for allocating resources and scheduling tasks across worker nodes.
Worker nodes execute the tasks and store data in memory or disk for processing.
Example: In a Spark application, the driver progr...
I applied via Recruitment Consulltant and was interviewed before Jul 2023. There were 2 interview rounds.
Handling ADF pipelines involves designing, building, and monitoring data pipelines in Azure Data Factory.
Designing data pipelines using ADF UI or code
Building pipelines with activities like copy data, data flow, and custom activities
Monitoring pipeline runs and debugging issues
Optimizing pipeline performance and scheduling triggers
Technical questions from hive , spark Scala and azure
Top trending discussions
I applied via Walk-in and was interviewed before Jan 2021. There were 5 interview rounds.
Developed a web-based project management system for a construction company.
Used PHP and MySQL for backend development.
Implemented a responsive UI using Bootstrap and jQuery.
Incorporated features such as task assignment, progress tracking, and document management.
Conducted user testing and made improvements based on feedback.
Completed the project within the given timeline and budget.
I applied via Referral and was interviewed in Apr 2020. There were 5 interview rounds.
Normalization is a process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization involves breaking down a table into smaller tables and establishing relationships between them.
There are different levels of normalization, with each level having specific rules to follow.
Normalization helps to prevent data inconsistencies and anomalies.
Examples of normalization include converting r...
posted on 15 Jul 2022
I applied via Approached by Company and was interviewed before Jul 2021. There were 2 interview rounds.
Basic programming questions
I applied via Naukri.com and was interviewed in Mar 2021. There were 3 interview rounds.
I applied via Campus Placement and was interviewed before Feb 2021. There were 4 interview rounds.
Technical MCQ on concepts of Computer Science (Programming, Database, etc)
Flow Chart - Aptitude Round
I applied via Referral and was interviewed before Jan 2021. There were 3 interview rounds.
based on 3 interview experiences
Difficulty level
Duration
based on 1 review
Rating in categories
Senior Software Engineer
572
salaries
| ₹20 L/yr - ₹33.1 L/yr |
Senior Consultant
412
salaries
| ₹20.5 L/yr - ₹35 L/yr |
Consultant
291
salaries
| ₹12.4 L/yr - ₹23 L/yr |
Software Engineer
218
salaries
| ₹5.3 L/yr - ₹15 L/yr |
Senior Software Developer
153
salaries
| ₹17.9 L/yr - ₹30 L/yr |
Accenture
Synechron
Movate
Sopra Steria