Junior Data Analyst
100+ Junior Data Analyst Interview Questions and Answers
Q1. What is the main difference between data mining and data analysis?
Data mining involves discovering patterns and relationships in large datasets, while data analysis focuses on interpreting and drawing insights from data.
Data mining is the process of extracting useful information from large datasets.
Data analysis involves examining and interpreting data to draw conclusions and make informed decisions.
Data mining uses techniques like clustering, classification, and association to discover patterns and relationships.
Data analysis involves tech...read more
Q2. How do you use 'PARTITION BY` and `ORDER BY in window functions
PARTITION BY is used to divide the result set into partitions, while ORDER BY is used to sort the rows within each partition in window functions.
PARTITION BY is used to group rows with the same values in specified columns
ORDER BY is used to sort the rows within each partition
Example: SELECT column1, column2, SUM(column3) OVER (PARTITION BY column1 ORDER BY column2) AS total FROM table_name
Junior Data Analyst Interview Questions and Answers for Freshers
Q3. What is SQL, and why is it important in data analytics
SQL is a programming language used for managing and analyzing data in relational databases.
SQL stands for Structured Query Language
It is used to retrieve, manipulate, and analyze data stored in relational databases
SQL is important in data analytics as it allows analysts to query databases to extract relevant information for analysis
It helps in filtering, sorting, and aggregating data to generate insights
Examples of SQL commands include SELECT, INSERT, UPDATE, and DELETE
Q4. Difference between Adverse Event and Adverse reaction with example.
Adverse event is any undesirable medical occurrence while adverse reaction is a specific type of adverse event caused by a medication.
Adverse event can be caused by any medical intervention or procedure while adverse reaction is specifically caused by a medication.
Adverse event can be expected or unexpected while adverse reaction is always unexpected.
Example of adverse event: a patient develops a fever after surgery. Example of adverse reaction: a patient develops a rash afte...read more
Q5. What is the difference between 'WHERE` and 'HAVING` clauses
WHERE clause is used to filter rows before grouping, while HAVING clause is used to filter groups after grouping.
WHERE clause is used with SELECT, UPDATE, DELETE statements to filter rows based on a condition
HAVING clause is used with SELECT statement to filter groups based on a condition
WHERE clause is applied before the data is grouped, while HAVING clause is applied after the data is grouped
Example: SELECT * FROM table_name WHERE column_name = 'value';
Example: SELECT colum...read more
Q6. Explain the main steps involved in data analysis ?
Data analysis involves several steps including data collection, data cleaning, data exploration, data modeling, and data visualization.
Data collection: Gathering relevant data from various sources.
Data cleaning: Removing any errors, inconsistencies, or missing values from the data.
Data exploration: Analyzing the data to understand its characteristics and identify patterns or trends.
Data modeling: Applying statistical or machine learning techniques to build models and make pre...read more
Share interview questions and help millions of jobseekers 🌟
Q7. Explain the difference between 'INNER JOIN', 'LEFT JOIN`, `RIGHT JOIN`, and `FULL OUTER JOIN`.
Different types of SQL joins used to combine rows from two or more tables based on a related column between them.
INNER JOIN: Returns rows when there is at least one match in both tables.
LEFT JOIN: Returns all rows from the left table and the matched rows from the right table.
RIGHT JOIN: Returns all rows from the right table and the matched rows from the left table.
FULL OUTER JOIN: Returns all rows when there is a match in either left or right table.
Q8. What are Indexing, it's types and use of it
Indexing is a technique used to optimize data retrieval in databases by creating indexes on columns.
Types of indexing include clustered and non-clustered indexes
Clustered indexes physically reorder the data in the table based on the index key
Non-clustered indexes create a separate structure to store the index key and a pointer to the actual data
Indexes are used to speed up data retrieval operations such as SELECT queries
Junior Data Analyst Jobs
Q9. What kind of cases handled and explain in brief
Handled cases include data cleaning, analysis, visualization and reporting for various industries.
Data cleaning and analysis for a retail company to identify sales trends
Visualization of customer behavior for a telecommunications company
Reporting on website traffic for an e-commerce business
Data analysis for a healthcare provider to improve patient outcomes
Cleaning and analyzing survey data for a non-profit organization
Q10. Explain the difference between `TRUNCATE`, `DELETE`, and `DROP` commands.
TRUNCATE removes all rows from a table, DELETE removes specific rows, and DROP deletes the entire table structure.
TRUNCATE is faster than DELETE as it does not log individual row deletions.
DELETE is slower than TRUNCATE as it logs each row deletion.
DROP removes the entire table structure along with all data.
TRUNCATE and DELETE can be rolled back, but DROP cannot be rolled back.
Example: TRUNCATE table_name;
Example: DELETE FROM table_name WHERE condition;
Example: DROP TABLE tab...read more
Q11. Explain window functions like `ROW_NUMBER()`, `RANK()`, and `DENSE_RANK()`.
Window functions like ROW_NUMBER(), RANK(), and DENSE_RANK() assign a unique number to each row based on specified criteria.
ROW_NUMBER() assigns a unique sequential integer starting from 1 to each row within a partition
RANK() assigns a unique rank to each row within a partition, with no gaps in ranking if there are ties
DENSE_RANK() assigns a unique rank to each row within a partition, with possible gaps in ranking if there are ties
Q12. What is a foreign key in the context of relational databases?
A foreign key in relational databases is a field that links two tables together, establishing a relationship between them.
A foreign key in one table points to the primary key in another table
It ensures referential integrity by enforcing relationships between tables
Foreign keys help maintain data consistency and prevent orphaned records
Example: In a database with tables for 'orders' and 'customers', the 'customer_id' in the 'orders' table would be a foreign key linking to the ...read more
Q13. What is the difference between Data Definition Language (DDL) and Data Manipulation Language (DML)?
DDL is used to define the structure of database objects, while DML is used to manipulate data within those objects.
DDL is used to create, modify, and delete database objects such as tables, indexes, and views.
DML is used to insert, update, retrieve, and delete data within those database objects.
DDL statements include CREATE, ALTER, DROP, TRUNCATE, etc.
DML statements include SELECT, INSERT, UPDATE, DELETE, etc.
DDL changes the structure of the database, while DML changes the co...read more
Q14. Like what is maleria, what is drug ,alergy,hypertension,diabetes,obesity,gerd,gout,hyperlipidermia,what is agar agar,pigment names
Malaria is a mosquito-borne infectious disease caused by parasites. Drug allergy is an adverse reaction to medication. Hypertension is high blood pressure. Diabetes is a metabolic disorder affecting blood sugar levels. Obesity is excessive body weight. GERD is gastroesophageal reflux disease. Gout is a form of arthritis. Hyperlipidemia is high levels of lipids in the blood. Agar agar is a gelatinous substance derived from seaweed. Pigment names refer to various coloring agent...read more
Q15. What is the difference between C and C++? What is the use of website testing?
C is a procedural programming language while C++ is an object-oriented programming language.
C++ is an extension of C with added features like classes, inheritance, and polymorphism.
C++ is used for developing software applications, games, and operating systems.
Website testing is the process of checking the functionality, usability, and performance of a website.
It involves testing the website's links, forms, navigation, and compatibility with different devices and browsers.
Webs...read more
Q16. Merge two sorted linked list and from scratch, create class of linked list then create method of generating linked list
Merge two sorted linked lists by creating a linked list class and method to generate linked lists from scratch.
Create a Node class with data and next pointer
Create a LinkedList class with methods to insert nodes and merge two lists
Iterate through both lists and compare nodes to merge them in sorted order
Q17. Difference between PowerBI and Tableau Calculated Field in Tableau Difference Between Data Blending and Data Joining
PowerBI and Tableau are both popular data visualization tools, but they have some key differences in terms of features and functionality.
PowerBI is a Microsoft product, while Tableau is developed by Tableau Software.
PowerBI is more user-friendly and integrates well with other Microsoft products, while Tableau offers more advanced visualization capabilities.
Tableau has a feature called Calculated Field which allows users to create new fields based on existing data, while Power...read more
Q18. How to find the null values in the given excel sheet
Null values in an Excel sheet can be found by using filters or functions like ISBLANK or COUNTBLANK.
Use filters to easily identify blank cells in the Excel sheet
Use functions like ISBLANK or COUNTBLANK to check for null values in specific cells
Look for cells with no data or missing values, which indicate null values
Q19. A practical application of VLOOKUP on a given data
VLOOKUP can be used to find specific information in a table by matching a key value.
Use VLOOKUP to find a student's grade based on their student ID in a table of student data
VLOOKUP can be used to retrieve a customer's contact information based on their customer ID
It can also be used to look up product prices based on product codes in a pricing table
Q20. What SQL commands do you know?
I am familiar with basic SQL commands such as SELECT, INSERT, UPDATE, DELETE, JOIN, and GROUP BY.
SELECT: Retrieve data from a database table
INSERT: Add new records to a table
UPDATE: Modify existing records in a table
DELETE: Remove records from a table
JOIN: Combine rows from two or more tables based on a related column
GROUP BY: Group rows that have the same values into summary rows
Q21. Results of Left Join, Right Join and Cross Join
Left Join includes all records from the left table and matching records from the right table. Right Join includes all records from the right table and matching records from the left table. Cross Join combines all records from both tables.
Left Join: Includes all records from the left table and matching records from the right table.
Right Join: Includes all records from the right table and matching records from the left table.
Cross Join: Combines all records from both tables.
Q22. Row-level Security and 4 role in power Bi
Row-level security in Power BI allows restricting access to specific rows of data based on user roles.
Row-level security in Power BI is used to control access to data at the row level based on user roles.
Roles in Power BI define the level of access users have to data and reports.
Examples of roles in Power BI include Admin, Analyst, Viewer, and Contributor.
By setting up row-level security, users can only see the data that is relevant to their role.
Row-level security can be imp...read more
Q23. What is the diffrence betwe.en credit and debit note
Credit note is issued to reduce the amount payable by a customer, while debit note is issued to increase the amount payable by a customer.
Credit note is issued when a customer has been overcharged or returned goods, resulting in a reduction of the amount owed.
Debit note is issued when a customer has been undercharged or additional goods/services have been provided, resulting in an increase of the amount owed.
Credit note decreases the accounts receivable balance, while debit n...read more
Q24. What is data validation?
Data validation is the process of ensuring that data is accurate, complete, and consistent.
Data validation involves checking data for errors, inconsistencies, and anomalies.
It helps to ensure data quality and reliability.
Validation can be done through various techniques such as range checks, format checks, and cross-field validation.
Examples of data validation include verifying that a phone number has the correct number of digits or that a date is in the correct format.
Data v...read more
Q25. Seriousness criteria of cases Explain Congenital Anomaly.
Congenital anomaly refers to a physical or structural abnormality present at birth.
Seriousness criteria of cases depend on the type and severity of the anomaly.
Some congenital anomalies may be minor and have little impact on health, while others can be life-threatening.
Examples of congenital anomalies include heart defects, cleft lip and palate, and neural tube defects.
Congenital anomalies can be caused by genetic factors, environmental factors, or a combination of both.
Early...read more
Q26. What are BookMarks, use of it
Bookmarks are digital markers used to quickly navigate to specific sections or pages within a document or website.
Bookmarks allow users to easily access important or frequently visited sections of a document or website.
They are commonly used in web browsers to save specific web pages for quick access.
Bookmarks can also be used in PDF documents to mark important pages or sections for easy reference.
Q27. Difference Between List and Touple in python
List is mutable, ordered collection of items while tuple is immutable, ordered collection of items in Python.
List is defined using square brackets [] while tuple is defined using parentheses ().
Elements in a list can be changed or modified while elements in a tuple cannot be changed.
Lists are typically used for collections of similar items while tuples are used for fixed collections of items.
Example: list_example = [1, 2, 3] and tuple_example = (4, 5, 6)
Q28. Tuple is immutable, while list is mutable.
Tuple is immutable, list is mutable in Python.
Tuple elements cannot be changed once assigned, while list elements can be modified.
Tuple uses parentheses () and list uses square brackets [] for declaration.
Example: tuple_example = (1, 2, 3) vs list_example = [1, 2, 3]
Q29. Define Solicited report and Spontaneous report.
Solicited report is a report requested by an authority while spontaneous report is a voluntary report by an individual.
Solicited report is requested by an authority or organization.
Spontaneous report is voluntary and not requested.
Solicited report is usually for a specific purpose or event.
Spontaneous report is usually for unexpected events or adverse reactions.
Examples of solicited reports include clinical trial reports and regulatory reports.
Examples of spontaneous reports ...read more
Q30. Difference between Union & Union all
Union combines and removes duplicates, Union all combines without removing duplicates.
Union combines result sets and removes duplicates
Union all combines result sets without removing duplicates
Union is slower than Union all as it involves removing duplicates
Union all is faster than Union as it does not remove duplicates
Q31. Explain what data cleansing is
Data cleansing is the process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets.
Data cleansing involves identifying and handling missing values in datasets.
It also includes removing duplicate records or entries.
Data cleansing may involve correcting spelling mistakes or formatting issues in data.
It helps improve data quality and reliability for analysis and decision-making.
Example: Removing rows with missing values, standardizing d...read more
Q32. What is fact & dimensions
Facts are measurable data points, while dimensions provide context to the facts by categorizing and organizing them.
Facts are quantitative data that can be measured or counted.
Dimensions provide context to the facts by categorizing and organizing them.
In a sales database, the fact could be the total revenue generated, while dimensions could include product category, region, and time period.
Q33. What makes you to choose data analyst role
Passion for uncovering insights from data and making data-driven decisions.
Fascination with numbers and patterns
Desire to solve complex problems
Interest in using data to drive business decisions
Ability to communicate findings effectively
Q34. What are types of clinical research phase
There are four phases of clinical research: Phase 1, Phase 2, Phase 3, and Phase 4.
Phase 1: Focuses on safety and dosage in a small group of healthy volunteers.
Phase 2: Expands to a larger group to see if the treatment is effective.
Phase 3: Compares the new treatment to standard treatments in a larger group.
Phase 4: Post-marketing studies to monitor the treatment's long-term effects.
Q35. What are the coding languages you know
I know Python, SQL, and R.
Proficient in Python for data analysis and visualization
Experience with SQL for data querying and manipulation
Familiarity with R for statistical analysis and modeling
Q36. What is pivot table and describe
A pivot table is a data summarization tool used to condense and aggregate large datasets.
Pivot tables allow users to quickly analyze and manipulate large amounts of data.
They can be used to group data by categories and display summarized information.
Users can easily change the layout of the table to view data from different perspectives.
Pivot tables are commonly used in spreadsheet programs like Microsoft Excel and Google Sheets.
For example, a sales team could use a pivot tab...read more
Q37. What is SUSAR and Name of Regulatory Authorities
SUSAR stands for Suspected Unexpected Serious Adverse Reaction. Regulatory authorities include FDA, EMA, MHRA, etc.
SUSAR refers to adverse reactions that are unexpected, serious, and suspected to be caused by a drug or medical product
Regulatory authorities such as FDA (Food and Drug Administration), EMA (European Medicines Agency), MHRA (Medicines and Healthcare products Regulatory Agency) oversee reporting and monitoring of SUSARs
Reporting SUSARs is crucial for ensuring the ...read more
Q38. What is Pharmacovigilance and Adverse Event
Pharmacovigilance is the science and activities related to the detection, assessment, understanding, and prevention of adverse effects or any other drug-related problems.
Pharmacovigilance involves monitoring and evaluating the safety of pharmaceutical products.
Adverse events are any undesirable experience associated with the use of a medical product.
Examples of adverse events include side effects, allergic reactions, and medication errors.
Pharmacovigilance aims to improve pat...read more
Q39. Difference between Append and Merged
Append adds rows to a dataset, while Merge combines datasets based on a common key.
Append adds rows to the bottom of a dataset, increasing the number of observations.
Merge combines datasets based on a common key, such as a unique identifier or variable.
Appending is useful for adding new data, while merging is useful for combining related datasets.
Example: Appending a new month of sales data to an existing dataset. Merging customer information with sales data based on customer...read more
Q40. Difference between Duplicate & Reference
Duplicate refers to an exact copy, while reference is a pointer to the original object.
Duplicate is a separate copy of the original data, while reference points to the original data.
Changing a duplicate does not affect the original, but changing a reference does.
Duplicates consume more memory than references.
Example: Duplicate - making a photocopy of a document. Reference - sharing a link to a document.
Example: Duplicate - cloning a hard drive. Reference - creating a shortcut...read more
Q41. Difference betweek cross join and cross apply
Cross join combines every row from the first table with every row from the second table, while cross apply applies a table-valued function to each row of the first table.
Cross join results in a Cartesian product of the two tables.
Cross apply is used to invoke a table-valued function for each row of the first table.
Cross join does not require a specific condition to join the tables, while cross apply does.
Q42. Difference betweek PowerBI Report and Dashboard
PowerBI Report is a collection of visualizations and data organized in a single page, while Dashboard is a single-page display of key metrics and KPIs.
PowerBI Report contains multiple pages with different visualizations and data sets.
Dashboards are a single-page display of key metrics and KPIs for quick insights.
Reports are more detailed and allow for in-depth analysis, while Dashboards provide a high-level overview.
Reports are typically used for detailed analysis and sharing...read more
Q43. difference b/w candidate key and compound key
Candidate key is a unique key that can uniquely identify each record in a table, while a compound key is a key that consists of multiple columns to uniquely identify each record.
Candidate key is a single column key, while compound key is a combination of multiple columns.
Candidate key can be a primary key, while compound key cannot be a primary key if it includes non-unique columns.
Example: In a table of students, student ID can be a candidate key, while a compound key of stu...read more
Q44. What is ETL
ETL stands for Extract, Transform, Load. It is a process used to extract data from various sources, transform it into a consistent format, and load it into a data warehouse for analysis.
Extract: Data is extracted from multiple sources such as databases, files, APIs, etc.
Transform: Data is cleaned, standardized, and transformed into a consistent format suitable for analysis.
Load: The transformed data is loaded into a data warehouse or database for further processing and analys...read more
Q45. What is join
Join is a SQL operation used to combine rows from two or more tables based on a related column between them.
Join is used to retrieve data from multiple tables based on a related column.
Common types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Example: SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column;
Q46. What is power Bi
Power BI is a business analytics tool by Microsoft that provides interactive visualizations and business intelligence capabilities.
Developed by Microsoft
Allows users to create interactive visualizations and reports
Integrates with various data sources such as Excel, SQL databases, and cloud services
Enables data exploration and sharing insights with stakeholders
Offers features like dashboards, data connections, and data preparation
Q47. What is SQL
SQL is a programming language used for managing and manipulating relational databases.
SQL stands for Structured Query Language
It is used to communicate with databases to perform tasks such as querying data, updating data, and creating tables
Common SQL commands include SELECT, INSERT, UPDATE, DELETE
Example: SELECT * FROM table_name WHERE condition;
Q48. What are the available data types in sql
The available data types in SQL include numeric, character, date/time, and boolean types.
Numeric data types include integer, decimal, and floating-point types.
Character data types include char, varchar, and text types.
Date/time data types include date, time, datetime, and timestamp types.
Boolean data type represents true or false values.
Q49. What are the joins available in SQL
Joins are used to combine rows from two or more tables based on related columns.
INNER JOIN: Returns records that have matching values in both tables.
LEFT JOIN: Returns all records from the left table and the matched records from the right table.
RIGHT JOIN: Returns all records from the right table and the matched records from the left table.
FULL JOIN: Returns all records when there is a match in either left or right table.
CROSS JOIN: Returns the Cartesian product of the two ta...read more
Q50. Convert decimal number to binary representation
Convert decimal number to binary representation using division and remainder method.
Start by dividing the decimal number by 2 and noting down the remainder.
Continue dividing the quotient by 2 until the quotient is 0.
The remainders obtained in reverse order will give the binary representation.
Interview Questions of Similar Designations
Top Interview Questions for Junior Data Analyst Related Skills
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month