Hadoop Developer

Hadoop Developer Interview Questions and Answers

Updated 29 Jun 2025
search-icon

Asked in HSBC Group

5d ago

Q. How do you ingest a CSV file into a Spark DataFrame and write it to a Hive table?

Ans.

Ingest CSV file to Spark dataframe and write to Hive table.

  • Create SparkSession object

  • Read CSV file using SparkSession.read.csv() method

  • Create a dataframe from the CSV file

  • Create a Hive table using SparkSession.sql() method

  • Write the dataframe to the Hive table using dataframe.write.saveAsTable() method

Asked in LTIMindtree

5d ago

Q. Architecture of spark. What is lazy evaluation? Difference between repartition and coalesce function?

Ans.

Spark architecture, lazy evaluation, repartition vs coalesce

  • Spark architecture consists of a driver program, cluster manager, and worker nodes

  • Lazy evaluation is a feature of Spark where transformations are not executed until an action is called

  • Repartition function shuffles data across partitions while coalesce reduces the number of partitions

  • Repartition can increase or decrease the number of partitions while coalesce only decreases

  • Repartition is a costly operation while coale...read more

Asked in LTIMindtree

4d ago

Q. What is mapreduce? Advantages of spark over Hadoop

Ans.

MapReduce is a programming model and software framework for processing large amounts of data in parallel on a cluster.

  • MapReduce is used for distributed processing of big data

  • It consists of two phases: Map and Reduce

  • Map phase processes input data and produces intermediate key-value pairs

  • Reduce phase takes the output of the Map phase and combines the values for each key

  • MapReduce is fault-tolerant and highly scalable

  • Example: Word count program in MapReduce

Asked in HSBC Group

2d ago

Q. What is the difference between a Managed table and an External table in Hive?

Ans.

Managed tables are physically stored in Hive's warehouse directory while external tables are not.

  • Managed tables are created and managed by Hive while external tables are created outside of Hive.

  • Managed tables are physically stored in Hive's warehouse directory while external tables are not.

  • Managed tables are deleted when the table is dropped while external tables are not.

  • Managed tables are used for internal purposes while external tables are used for external purposes.

  • Example...read more

Are these interview questions helpful?

Asked in HSBC Group

3d ago

Q. What is the role of a boundary query in Sqoop?

Ans.

Boundary query in Sqoop is used to import data within a specific range of values.

  • Boundary query is used to import data within a specific range of values

  • It is used with the --boundary-query option in Sqoop

  • It is useful when importing large datasets and you only need a subset of the data

  • For example, importing data from a database table where the values in a particular column fall within a specific range

Asked in EPAM Systems

3d ago

Q. Architecture of hive,types of hive table, file formats in hive, dynamic partition in hive

Ans.

Hive architecture, table types, file formats, and dynamic partitioning.

  • Hive architecture consists of metastore, driver, compiler, and execution engine.

  • Hive tables can be of two types: managed tables and external tables.

  • File formats supported by Hive include text, sequence, ORC, and Parquet.

  • Dynamic partitioning allows automatic creation of partitions based on data.

Hadoop Developer Jobs

Onix  logo
GCP Hadoop Developer/Lead 5-10 years
Onix
3.5
Pune
Wissen Infotech Pvt Ltd logo
Hadoop Developer 3-6 years
Wissen Infotech Pvt Ltd
3.8
Bangalore / Bengaluru
Diverse Lynx logo
Hadoop Developer 4-9 years
Diverse Lynx
3.7
Chennai

Asked in HSBC Group

2d ago

Q. What is the top command in shell scripting?

Ans.

Top command is a Linux utility that displays the system's processes in real-time.

  • Displays the processes running on the system

  • Updates the list of processes in real-time

  • Provides information on CPU usage, memory usage, and process IDs

  • Can be used to monitor system performance and identify resource-intensive processes

Asked in EPAM Systems

5d ago

Q. Joins window functions in spark, partition vs colsec, performance optimization techniques

Ans.

The question is about joins, window functions, partition vs colsec, and performance optimization techniques in Spark.

  • Joins in Spark can be performed using various methods such as broadcast join, shuffle join, and sort-merge join.

  • Window functions in Spark allow us to perform calculations across a group of rows that are related to the current row.

  • Partitioning in Spark can be done based on columns or keys, and it affects the performance of operations such as joins and aggregatio...read more

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Interview Experiences of Popular Companies

Infosys Logo
3.6
 • 7.9k Interviews
LTIMindtree Logo
3.7
 • 3k Interviews
EPAM Systems Logo
3.7
 • 569 Interviews
HSBC Group Logo
3.9
 • 510 Interviews
View all
interview tips and stories logo
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Hadoop Developer Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
play-icon
play-icon
qr-code
Trusted by over 1.5 Crore job seekers to find their right fit company
80 L+

Reviews

10L+

Interviews

4 Cr+

Salaries

1.5 Cr+

Users

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2025 Info Edge (India) Ltd.

Follow Us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter
Profile Image
Hello, Guest
AmbitionBox Employee Choice Awards 2025
Winners announced!
awards-icon
Contribute to help millions!
Write a review
Write a review
Share interview
Share interview
Contribute salary
Contribute salary
Add office photos
Add office photos
Add office benefits
Add office benefits