Data Analyst Intern

100+ Data Analyst Intern Interview Questions and Answers

Updated 27 Apr 2025
search-icon

Q1. Water Jug Problem Statement

You have two water jugs with capacities X and Y liters respectively, both initially empty. You also have an infinite water supply. The goal is to determine if it is possible to measu...read more

Ans.

The Water Jug Problem involves determining if a target measurement can be achieved using two jugs of different capacities and specific operations.

  • Start by filling one jug and transferring water between the jugs to reach the target measurement.

  • Consider all possible combinations of filling, emptying, and transferring water between the jugs.

  • Keep track of the states of both jugs and the amount of water in each jug during the operations.

  • If the target measurement is reached, return...read more

Q2. Insertion Sort in a Linked List

Given a singly linked list with 'N' nodes containing integer values, your task is to sort the list using insertion sort and output the sorted list.

Insertion Sort is an algorithm...read more

Ans.

Implement insertion sort on a singly linked list to sort the elements in-place.

  • Iterate through the linked list and for each node, find its correct position in the sorted part of the list

  • Adjust pointers to insert the node in the correct position

  • Repeat this process until all nodes are sorted

Data Analyst Intern Interview Questions and Answers for Freshers

illustration image
Q3. What is the difference between loc and iloc in data science, and can you explain what dashboards are?
Ans.

loc is label-based indexing while iloc is integer-based indexing in data science. Dashboards are visual tools for data analysis.

  • loc is used for selecting rows and columns by labels

  • iloc is used for selecting rows and columns by integer position

  • Dashboards are visual representations of data for easy analysis and decision-making

Q4. Can you explain the different types of keys in a database and their properties?
Ans.

Different types of keys in a database include primary key, foreign key, unique key, and composite key.

  • Primary key: uniquely identifies each record in a table, must be unique and not null.

  • Foreign key: establishes a link between two tables, ensures referential integrity.

  • Unique key: ensures that all values in a column are unique.

  • Composite key: combination of two or more columns to uniquely identify a record.

Are these interview questions helpful?

Q5. What do you mean by MTD ? How to create it

Ans.

MTD stands for Month-to-Date. It refers to the period from the beginning of the current month up to the present date.

  • MTD is a common term used in financial and business reporting to track performance within a specific month.

  • To create MTD, you would sum up the data from the beginning of the month up to the current date.

  • For example, if you are calculating MTD sales for January 2022 on January 15th, you would sum up all sales data from January 1st to January 15th.

  • MTD is often us...read more

Q6. You have 3 jars each with labels: one labeled 'Apples', one labeled 'Oranges', and one labeled 'Apples and Oranges'. However, all the jars are labeled incorrectly. You can pick one fruit from each jar. How can...

read more
Ans.

Pick a fruit from the jar labeled Apples and Oranges, then pick a fruit from the jar labeled Oranges (since it can't be Oranges), and finally pick a fruit from the jar labeled Apples (since it can't be Apples).

  • Pick a fruit from the jar labeled Apples and Oranges

  • Since the jar labeled Oranges can't be Oranges, it must be Apples and Oranges

  • The remaining jar must be Apples

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Q7. What is your understanding of data, and how important is it in an organization?

Ans.

Data is information collected and stored for analysis and decision-making purposes in an organization.

  • Data is raw facts and figures that need to be processed to provide meaningful information.

  • It is crucial for organizations to make informed decisions, identify trends, and improve performance.

  • Examples of data in an organization include sales figures, customer demographics, and website traffic.

  • Data can be structured (in databases) or unstructured (like text documents or social ...read more

Q8. Write a query to find the department-wise highest salary in an organization.

Ans.

Query to find department wise highest salary in an organisation

  • Use GROUP BY clause to group data by department

  • Use MAX() function to find highest salary in each department

  • Join the tables if necessary to get department information

Data Analyst Intern Jobs

Data Analyst Intern 0-1 years
Anzen Technologies Pvt. Ltd
3.7
Navi Mumbai
Data Analyst Intern 0-3 years
Newton School
3.1
Bangalore / Bengaluru
Data Analyst Intern 0-1 years
Codenera
3.1
Pune

Q9. What is Data Modelling? Types of Data Models

Ans.

Data modeling is the process of creating a visual representation of data structures and relationships.

  • Data modeling involves defining the structure of data, its storage, and how it will be accessed and manipulated.

  • Types of data models include conceptual, logical, and physical models.

  • Conceptual models focus on high-level business concepts and relationships.

  • Logical models define the structure of the data without considering how it will be implemented in a database system.

  • Physic...read more

Q10. Write a Python code to check if the given sentence is a palindrome or not.

Ans.

Python code to check if a sentence is palindrome or not

  • Remove all spaces and convert to lowercase

  • Reverse the string and compare with original

  • If both are same, then it is a palindrome

Q11. what is rdms? what are the objects in database? difference between olap and oltp? what is view? what is index? what are functions and stored procedures? what are constrains? what are foreign keys?

Ans.

RDBMS is a relational database management system. Objects in a database include tables, views, indexes, functions, stored procedures, constraints, and foreign keys. OLAP is for data analysis while OLTP is for transaction processing.

  • RDBMS stands for Relational Database Management System

  • Objects in a database include tables, views, indexes, functions, stored procedures, constraints, and foreign keys

  • OLAP (Online Analytical Processing) is used for data analysis and reporting

  • OLTP (...read more

Q12. How do you create separate lines in standard output using C++?

Ans.

Separate lines in standard output in C++ are used to display different pieces of information on separate lines for better readability.

  • Separate lines are used to display different outputs or messages in a clear and organized manner.

  • They are commonly used with the 'endl' or ' ' characters to move to the next line.

  • For example, cout << 'Hello' << endl; will display 'Hello' on one line and move to the next line for the next output.

Q13. How did you perform data analysis on your project?

Ans.

I start by defining the problem, collecting relevant data, cleaning and organizing the data, performing analysis using statistical methods and tools, and finally interpreting and presenting the results.

  • Define the problem statement and objectives of the analysis

  • Collect relevant data from various sources

  • Clean and organize the data to ensure accuracy and consistency

  • Perform analysis using statistical methods and tools such as Excel, Python, or R

  • Interpret the results and present f...read more

Q14. List all languages in Sql and explain

Ans.

List of SQL languages and their brief explanation

  • SQL (Structured Query Language) is a standard language for managing relational databases

  • T-SQL (Transact-SQL) is a proprietary extension of SQL used by Microsoft SQL Server

  • PL/SQL (Procedural Language/Structured Query Language) is Oracle Corporation's proprietary extension of SQL

  • MySQL is an open-source relational database management system that uses SQL

  • PostgreSQL is an open-source object-relational database management system that...read more

Q15. How do double linked list work? What is the difference between linked list and double linked list?

Ans.

A double linked list is a data structure where each node contains a reference to the previous and next node.

  • In a linked list, each node contains a reference to the next node only, while in a double linked list, each node contains references to both the previous and next nodes.

  • Double linked lists allow for traversal in both directions, making operations like deletion and insertion easier compared to single linked lists.

  • Example: In a double linked list, a node might have pointe...read more

Q16. What is a HashMap in Java?

Ans.

Hash map is a data structure that stores key-value pairs and allows fast retrieval of values based on keys.

  • Hash map uses hashing to store and retrieve values based on keys

  • It allows null values and null keys

  • It is not synchronized and not thread-safe

  • Example: HashMap<String, Integer> map = new HashMap<>();

  • map.put("apple", 1); int value = map.get("apple");

Q17. How do you perform data analysis on any of your project?

Q18. What is the difference between a super key and a foreign key?

Ans.

Super key is a set of attributes that uniquely identifies a record, while foreign key is a reference to a primary key in another table.

  • Super key is a combination of one or more attributes that uniquely identifies a record in a table.

  • Foreign key is a field in a table that refers to the primary key of another table.

  • Super key can have additional attributes that are not necessary for uniqueness.

  • Foreign key establishes a relationship between two tables.

  • Example: In a database of st...read more

Q19. What is the difference between a 4-stroke and a 2-stroke engine?

Ans.

4-stroke engines have 4 strokes per cycle, while 2-stroke engines have 2 strokes per cycle.

  • 4-stroke engines are more fuel-efficient and produce less pollution than 2-stroke engines.

  • 2-stroke engines are simpler and lighter than 4-stroke engines.

  • 4-stroke engines have separate intake, compression, power, and exhaust strokes, while 2-stroke engines combine intake and compression, and power and exhaust strokes.

  • Examples of 4-stroke engines include those found in cars, while example...read more

Q20. Define your dataset and what difficulties have you faced while preparing your model?

Ans.

The dataset consists of customer purchase history and demographic information. Difficulties faced include data cleaning and missing values.

  • Dataset includes customer ID, purchase amount, purchase date, age, gender, and location.

  • Difficulties faced include handling missing values in the age and location columns.

  • Data cleaning involved removing duplicates and outliers to ensure accurate analysis.

  • Normalization and standardization of data for model preparation.

Q21. Explain how to define an outlier using a boxplot analysis.

Ans.

Outliers in a boxplot are defined as data points that fall below Q1 - 1.5*IQR or above Q3 + 1.5*IQR.

  • Calculate the interquartile range (IQR) by subtracting Q1 from Q3.

  • Identify the lower bound as Q1 - 1.5*IQR and the upper bound as Q3 + 1.5*IQR.

  • Any data points below the lower bound or above the upper bound are considered outliers.

  • For example, if Q1 = 10, Q3 = 20, and IQR = 5, then the lower bound = 10 - 1.5*5 = 2.5 and the upper bound = 20 + 1.5*5 = 27.5.

Q22. What are joins? Types of joins.

Ans.

Joins are used to combine rows from two or more tables based on a related column between them.

  • Types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.

  • INNER JOIN returns rows when there is at least one match in both tables.

  • LEFT JOIN returns all rows from the left table and the matched rows from the right table.

  • RIGHT JOIN returns all rows from the right table and the matched rows from the left table.

  • FULL JOIN returns rows when there is a match in one of the tabl...read more

Q23. What is Power Bi, why we use power bi, and what is the power query in power bi?

Ans.

Power BI is a business analytics tool used to visualize and analyze data. Power Query is a data transformation and shaping tool in Power BI.

  • Power BI is a powerful business intelligence tool developed by Microsoft.

  • It allows users to connect to various data sources, transform and clean the data, and create interactive visualizations and reports.

  • Power BI enables data analysts to gain insights and make data-driven decisions.

  • Power Query is a data transformation and shaping tool wi...read more

Q24. Memory management and hash map in java

Ans.

Memory management and hash map are important concepts in Java programming.

  • Memory management is the process of allocating and deallocating memory in a program.

  • Java uses automatic memory management through garbage collection.

  • Hash map is a data structure that stores key-value pairs and uses hashing to retrieve values efficiently.

  • Java's HashMap class implements the Map interface and provides constant-time performance for basic operations.

  • It is important to properly manage memory ...read more

Q25. Estimate the number of paper cups used in one day in an office.

Ans.

Approximately 500 paper cups may be used in a day in an average office.

  • Consider the number of employees in the office

  • Think about the average number of hot beverage drinkers

  • Factor in the number of meetings and events held in the office

  • Take into account the availability of reusable cups

  • Estimate based on personal experience or observation

Q26. what are Sql joins, window functions

Ans.

SQL joins are used to combine rows from two or more tables based on a related column between them. Window functions perform calculations across a set of table rows that are related to the current row.

  • SQL joins are used to retrieve data from multiple tables based on a related column between them (e.g. INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN).

  • Window functions are used to perform calculations on a set of rows related to the current row (e.g. ROW_NUMBER(), RANK(), LAG(), LEA...read more

Q27. Which technical tools do you use for analysis and presentation purposes?

Ans.

I use tools like Python, SQL, Tableau for data analysis and visualization.

  • Python for data cleaning and analysis

  • SQL for querying databases

  • Tableau for creating interactive visualizations

Q28. Can you describe a time when you had to think creatively?

Ans.

I creatively solved a data visualization challenge by using unconventional tools to present insights effectively.

  • Faced a challenge in visualizing complex data for a presentation.

  • Used unconventional tools like Tableau and R to create interactive dashboards.

  • Incorporated storytelling techniques to make data relatable and engaging.

  • Received positive feedback for clarity and creativity in presentation.

Q29. How would you extract data from a table, given a specific time interval?

Ans.

Use SQL query with WHERE clause to pull data from a table based on a time interval.

  • Use SQL query with SELECT statement to specify the columns you want to retrieve.

  • Add a WHERE clause with the condition for the time interval, using appropriate date/time functions.

  • Example: SELECT * FROM table_name WHERE timestamp_column BETWEEN 'start_time' AND 'end_time';

Q30. How would you run a business on a state level to maximize your sales?

Ans.

To maximize sales on a state level, focus on market research, targeted marketing strategies, strong customer service, and strategic partnerships.

  • Conduct market research to understand the local consumer behavior and preferences

  • Implement targeted marketing strategies based on the research findings

  • Provide excellent customer service to build loyalty and attract repeat business

  • Form strategic partnerships with local businesses or organizations to expand reach and customer base

Q31. What is the SQL query to retrieve employees whose names start with the letter 'A'?

Ans.

Use SQL's SELECT statement with a WHERE clause to filter employees whose names start with 'A'.

  • Use the SELECT statement to specify the columns you want to retrieve.

  • Use the WHERE clause to filter results based on conditions.

  • Utilize the LIKE operator with 'A%' to match names starting with 'A'.

  • Example query: SELECT * FROM employees WHERE name LIKE 'A%';

Q32. What is the difference between a primary key and a candidate key?

Ans.

Primary key uniquely identifies a record while candidate key can also uniquely identify a record but may not be chosen as primary key.

  • Primary key is a column or set of columns that uniquely identifies each record in a table

  • Candidate key is a column or set of columns that can also uniquely identify each record in a table

  • A table can have multiple candidate keys but only one primary key

  • Primary key cannot have null values while candidate key can have null values

  • Example: In a tabl...read more

Q33. What is the difference between a calculated column and a measure?

Ans.

Calculated columns are static values calculated at the row level, while measures are dynamic values calculated at the aggregated level.

  • Calculated columns are computed for each row in a table, while measures are computed for the entire dataset or a subset of data.

  • Calculated columns are stored in the data model, while measures are calculated on the fly based on user interactions.

  • Examples of calculated columns include age calculated from birthdate, while examples of measures inc...read more

Q34. Are you familiar with any data visualization tools, and if so, which ones?

Ans.

I am familiar with several data visualization tools, including Tableau, Power BI, and Matplotlib for Python.

  • Tableau: Great for creating interactive dashboards and visualizations.

  • Power BI: Integrates well with Microsoft products and offers robust reporting features.

  • Matplotlib: A Python library for creating static, animated, and interactive visualizations.

  • Seaborn: Built on Matplotlib, it provides a high-level interface for drawing attractive statistical graphics.

Q35. How would you articulate your findings to an audience without a technical background?

Ans.

I would simplify complex data, use visuals, and relate findings to real-world scenarios for better understanding.

  • Use simple language: Avoid jargon and technical terms. For example, instead of saying 'regression analysis', say 'we looked at how one thing affects another'.

  • Utilize visuals: Present data through charts and graphs. For instance, a pie chart can show market share percentages clearly.

  • Tell a story: Frame findings in a narrative. For example, 'Our analysis shows that c...read more

Q36. Which Excel functions do you use most frequently for data analysis?

Ans.

I frequently use Excel functions like VLOOKUP, SUMIF, and PivotTables for efficient data analysis and reporting.

  • VLOOKUP: Used to search for a value in the first column of a range and return a value in the same row from a specified column. Example: =VLOOKUP(A2, B2:D10, 3, FALSE)

  • SUMIF: Helps in summing values based on a specific condition. Example: =SUMIF(A2:A10, 'Sales', B2:B10) sums values in B2:B10 where A2:A10 equals 'Sales'.

  • PivotTables: Essential for summarizing large data...read more

Q37. What are the differences between hard margin and soft margin in support vector machines?

Ans.

Hard margin SVMs require perfect separation, while soft margin SVMs allow some misclassifications for better generalization.

  • Hard margin SVMs assume data is linearly separable without errors.

  • Soft margin SVMs introduce a penalty for misclassified points, allowing for some errors.

  • Example: Hard margin is used when data is clean, like classifying well-separated flowers.

  • Example: Soft margin is useful in noisy datasets, like distinguishing between healthy and diseased plants.

Q38. What interests you about data analytics, and what do you hope to gain from this internship?

Ans.

I'm fascinated by data analytics for its ability to uncover insights and drive decision-making, and I hope to gain practical experience.

  • I enjoy transforming raw data into meaningful insights, like analyzing patient data to improve healthcare outcomes.

  • I am excited about using tools like Python and SQL to manipulate data and create visualizations that tell a story.

  • I hope to learn how to apply statistical methods to real-world problems, such as predicting trends in customer beha...read more

Q39. What is a loss function in the context of machine learning?

Ans.

A loss function quantifies the difference between predicted and actual outcomes in machine learning models.

  • Measures model performance: A lower loss indicates a better model fit.

  • Types of loss functions: Common examples include Mean Squared Error (MSE) for regression and Cross-Entropy Loss for classification.

  • Guides optimization: Loss functions are minimized during training to improve model accuracy.

  • Example: In a regression task, MSE calculates the average squared difference bet...read more

Q40. What is regression analysis, and how is it used in data analysis?

Ans.

Regression analysis is a statistical method for modeling relationships between variables to predict outcomes.

  • Used to identify relationships between dependent and independent variables.

  • Common types include linear regression, logistic regression, and polynomial regression.

  • Example: Predicting house prices based on features like size, location, and number of bedrooms.

  • In healthcare, it can predict patient outcomes based on treatment variables.

  • Helps in making informed decisions by ...read more

Q41. Tell me about the projects you've worked on and how analytics was used in them.

Ans.

I have worked on projects involving customer segmentation, sales forecasting, and sentiment analysis using analytics.

  • Customer segmentation: Used clustering algorithms to group customers based on their behavior and demographics.

  • Sales forecasting: Utilized time series analysis to predict future sales trends and optimize inventory management.

  • Sentiment analysis: Applied natural language processing techniques to analyze customer feedback and sentiment towards products or services.

Q42. How much mb the latest project consumed and what are ur weakenesses.

Ans.

The latest project consumed approximately 500 MB of data.

  • The latest project consumed 500 MB of data.

  • It is important to track data consumption for future optimization.

  • Weaknesses can include lack of experience with certain tools or techniques.

  • Weaknesses can also include difficulty in time management or communication.

Q43. How would you manage missing data within a dataset?

Ans.

Managing missing data involves identifying, analyzing, and applying appropriate techniques to handle gaps in datasets.

  • Identify missing data: Use functions like isnull() in Python to locate missing values.

  • Remove missing data: If the missing data is minimal, consider dropping rows or columns (e.g., df.dropna()).

  • Impute missing values: Replace missing values with mean, median, or mode (e.g., df.fillna(df.mean())).

  • Use predictive modeling: Employ algorithms to predict and fill in m...read more

Q44. What is the difference between structured and unstructured data?

Ans.

Structured data is organized and easily searchable, while unstructured data is unorganized and requires more effort to analyze.

  • Structured data is typically stored in databases with a predefined schema (e.g., SQL databases).

  • Unstructured data includes formats like text, images, and videos that do not have a specific structure (e.g., social media posts).

  • Examples of structured data: customer information in a CRM system, sales records in a spreadsheet.

  • Examples of unstructured data...read more

Q45. What is the difference between DELETE, DROP, and TRUNCATE?

Ans.

Delete removes specific rows from a table, drop removes entire table structure, and truncate removes all rows from a table.

  • Delete is a DML command, drop is a DDL command, and truncate is a DDL command.

  • Delete can be rolled back, drop cannot be rolled back, and truncate cannot be rolled back.

  • Delete triggers delete triggers, drop triggers drop triggers, and truncate does not trigger any triggers.

  • Example: DELETE FROM table_name WHERE condition;

  • Example: DROP TABLE table_name;

  • Examp...read more

Q46. types of graphs available in tableau? general applications of various types of charts

Ans.

Tableau offers various types of graphs like bar charts, line charts, scatter plots, etc. for visualizing data.

  • Bar charts: used to compare different categories or show trends over time

  • Line charts: show trends over time or relationships between variables

  • Scatter plots: show relationships between two numerical variables

  • Pie charts: show parts of a whole or percentages

  • Heat maps: show data density or relationships in a matrix format

Q47. How does data science play a vital role in the contemporary world?

Ans.

Data science plays vital roles in contemporary world by enabling businesses to make data-driven decisions, improving healthcare outcomes, enhancing customer experiences, and driving innovation.

  • Data science helps businesses make informed decisions by analyzing large datasets to identify trends and patterns.

  • In healthcare, data science is used to predict disease outbreaks, personalize treatment plans, and improve patient outcomes.

  • Data science is crucial for enhancing customer ex...read more

Q48. List all languages available in SQL.

Ans.

List of SQL languages

  • MySQL

  • Oracle

  • PostgreSQL

  • Microsoft SQL Server

  • SQLite

Q49. What is exception handling in Java?

Ans.

Exception handling in Java allows for the handling of errors and exceptions that may occur during program execution.

  • Java provides try-catch blocks to handle exceptions.

  • The try block contains the code that may throw an exception.

  • The catch block catches and handles the thrown exception.

  • Multiple catch blocks can be used to handle different types of exceptions.

  • The finally block is optional and is executed regardless of whether an exception occurs or not.

  • Exceptions can also be thr...read more

Q50. How would you describe data analytics in your own words?

Ans.

Data analytics is the process of examining data sets to draw conclusions, identify patterns, and support decision-making.

  • Involves collecting, processing, and analyzing data to uncover insights.

  • Uses statistical methods to interpret data trends, e.g., sales growth analysis.

  • Can involve data visualization tools like Tableau to present findings clearly.

  • Supports business decisions, such as optimizing marketing strategies based on customer behavior.

  • In healthcare, it can analyze pati...read more

1
2
3
Next
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Interview experiences of popular companies

3.7
 • 4.9k Interviews
3.4
 • 1.2k Interviews
4.2
 • 28 Interviews
3.5
 • 13 Interviews
3.8
 • 12 Interviews
3.7
 • 7 Interviews
4.7
 • 5 Interviews
4.2
 • 4 Interviews
View all

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Data Analyst Intern Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter