Big Data Engineer
Big Data Engineer Interview Questions
Most Searched Companies
165 results found
Sort By:
Popularity
Interview Questions
Q1. Difference between partitioning and bucketing. Types of joins in spark Optimization Techniques in spark Broadcast variable and broadcast join Difference between ORC and Parquet Difference between RDD and Dataframe Architecture of our project and Day-to-day activities and responsibilities
View answer (1)Q2. SQL question Remove duplicate records 5th highest salary department wise
View answer (1)Q3. Same Kind of questions and the same level of difficulty but another technical person took the interview
Add AnswerQ4. First round Spark: why and how accumulators Why and how reparation and coalesce How spark will generate code and executes Which operation executes where ( let's df2=df1.filter.map.collection df2.map ) Spark modules Why we will get and how to solve if we are getting memory out of exception and other issues Indirectly asking when to use broadcast join Group by and count example in spark What project and flow (data comes from and goes where to where Hive: Bucketing and partitioning why and how Python: Memory management Tuple and list Program to find minimum number of swaps to sort an array How to use and handle specific errors like KeyError using Try and except
Add AnswerQ5. Python List based questions Python Dictionaries based questions
Add AnswerQ6. Second round: spark how to handle upserts in spark
View answer (1)Q7. Worst HR and time waste.. don't take it as final offer. Keep backup offer if u got offere here
Add AnswerQ8. Can't tell specifics. But expect questions starting from basics till the advance level. And the major point is that your basic understanding has to be very good.
Add AnswerQ9. Spark memory optimisation techniques
View answer (1)Q10. Hadoop serialisation techniques.
View answer (1)
Interview Questions
Q1. Most questions on technologies mentioned in resume, pyspark, sql , database, hadoop, spark, what are optimisation techniques in spark, what id DAG , hadoop architecture, spark architecture, Questions on Hive, can we apply indexing in hive?, Can we define primary key in hive? , Does hive support ACID properties ? , Questions on DataBricks , Azure services , ADF , Join operations on PySpark, window functions in pyspark, how to join two spark dataframe with a common column having different name , broadcast variable, mapside join, serialisation and de-serialisation, Magic commands in Dtabricks notebook, how to schedule databricks notebook, how to import variables from other notebok to master notebook, how databricks is different from other cloude platforms and why databricks? ,Azure synapse analytics , what is runtime in databricks, how do you setup cluster in databricks,what is Mapreduce in hadoop, what is default block size in HDFS and can we change it? ,What is the default size of a partition in hive? , How do you read CSV in pandas, how to extract specific columns in pandas.
Add AnswerQ2. Most questions were from Project, how do you handle incremental pipelines, basic pyspark questions on optimisation , python coding question from DSA(medium level) , Sql view ,CTE, Window funxlctions , procedure , Azure services
Add AnswerQ3. Basics about python, data structures in python, sql basics
Add Answer
Interview Questions
Q1. Basic Big Data architecture and coding questions
Add AnswerQ2. External and internal table difference
View answer (1)Q3. Spark , Hadoop Scala basic and advanced questions, SQL query 1)What repartition and coalesce. 3)Windows function .
Add Answer- View answer (1)
- View answer (1)
- View answer (1)
Q7. 2)What is spark architecture.
View answer (1)Q8. what is the difference between tuples and list
View answer (1)Q9. About basic questions and grasping power
Add AnswerQ10. Partitioning and bucketing
Add Answer
Interview Questions
Q1. If we have streaming data coming from kafka and spark , how will you handle fault tolerance?
View answer (1)Q2. If i have large dataset to load which will not fit into the memory, How will you load the file?
Add AnswerQ3. What are core components of spark?
View answer (1)- View answer (1)
Q5. What is hive Architecture?
View answer (1)Q6. What is vectorization in ?
View answer (1)Q7. What is partition in hive?
View answer (1)Q8. What are functions in SQL?
View answer (1)Q9. Explain Rank, Dense_rank , row_number
View answer (1)Q10. We have to do Vectorization?
Add Answer
Interview Questions
Q1. Last was the managerial round, it wasn't to test me knowledge, it was to check what are the fields that interests me and what are the projects I made and an overview of it.
Add AnswerQ2. The technical interview was revolving around technical as well as logical and aptitude testing, good to have knowledge of current affairs.
Add AnswerQ3. It was just a telephonic round and was meant to get some basic detail of you.
Add Answer
Interview Questions
Q1. First Interviewer should know answer before asking the candidate, What He wanted to ask , even if you say correct answer, He can't recognize, nowadays Interviewers are like that 2021 07 03 12:30
View answer (1)Q2. col1 100 100 200 200 300 400 400 400 using partition By col1 get the rank col1 rank 100 1 100 1 200 1 200 1 300 1 400 1 400 1 400 1
View answer (1)
Interview Questions
Q1. Difference between Internal and External table in Hive
View answer (1)Q2. Explain about Hadoop Architecture
View answer (1)Q3. If there is something useful that I can say in one line what would be - learn all you can. Don't just mug things up. Take your time and dwelve deep into the concepts.
Add AnswerQ4. You can be asked any and everything. This is where your learning matters.
Add Answer
Interview Questions
Q1. Project explanation , internal vs external table
View answer (1)Q2. Mostly on Hive, scala questions for first round which is of 43 theory questions and one coding question was asked for
Add Answer- Add Answer
Q4. Spark architecture, Optimization technique , Hive , sql query
Add AnswerQ5. Python programming for list and string
View answer (1)- Add Answer
Interview Questions
Q1. Checking whether a fibonacci number is present between a particukar range (100 - 200)
View answer (1)Q2. Convert a list of dictionaries to CSV in Python
View answer (1)Q3. What is partitioning in Hive?
View answer (1)Q4. Simple array questions. Give an integer array and asked to find a pair with the given total.
Add Answer- View answer (1)
- Add Answer
Interview Questions
Q1. find the number of pairs which sum to target.
View answer (1)Q2. Binary search moderate problems of array ones increasing then decreasing
View answer (1)Q3. smallest subarray having given target sum
View answer (1)Q4. system design for a web surfing utility
View answer (1)Q5. Pyspark optimizations and theoretical questions
Add Answer
- Home >
- Interviews >
- Big Data Engineer Interview Questions