Filter interviews by
I appeared for an interview in Sep 2024, where I was asked the following questions.
Creating a Spark data pipeline to monitor online devices involves data ingestion, processing, and real-time analytics.
1. Data Ingestion: Use Spark Streaming to ingest data from sources like Kafka or MQTT where device status updates are published.
2. Data Processing: Transform the incoming data using Spark's DataFrame API to filter and aggregate the number of online devices.
3. Real-time Analytics: Utilize Spark Structure...
LLMs can generate scripts, ideas, and captions for engaging YouTube Shorts content.
Script Generation: LLMs can create concise scripts based on trending topics, e.g., a 60-second summary of a popular movie.
Content Ideas: They can suggest creative concepts for Shorts, like '5 Quick Tips for Healthy Eating' or 'Top 3 Travel Destinations'.
Caption and Hashtag Suggestions: LLMs can generate catchy captions and relevant hasht...
I applied via Recruitment Consulltant and was interviewed in Jun 2022. There were 2 interview rounds.
SQL query to find duplicate emails in a table named person
Use GROUP BY and HAVING clause to group emails and count their occurrences
Select only those emails which have count greater than 1
Example: SELECT email, COUNT(*) FROM person GROUP BY email HAVING COUNT(*) > 1;
SQL query to find date ids with higher temperature compared to previous dates in weather table
Use self join to compare temperature of current date with previous dates
Order the table by date to ensure correct comparison
Select date ids where temperature is higher than previous dates
Top trending discussions
I applied via Approached by Company and was interviewed before Sep 2021. There were 3 interview rounds.
Explain dynamic programming with memoization
I applied via Referral and was interviewed in Sep 2023. There was 1 interview round.
I applied via Campus Placement and was interviewed in Sep 2024. There was 1 interview round.
Decision tree is a tree-like model of decisions and their possible consequences, while random forest is an ensemble learning method that builds multiple decision trees and merges them together.
Decision tree is a flowchart-like structure where each internal node represents a decision based on an attribute, each branch represents the outcome of the decision, and each leaf node represents a class label.
Random forest is a ...
I appeared for an interview in Apr 2025, where I was asked the following questions.
Dropout is a regularization technique used during training, but typically not applied during test time to ensure full model performance.
Dropout randomly sets a fraction of input units to zero during training to prevent overfitting.
During test time, dropout is usually turned off, allowing the full network to make predictions.
This ensures that all neurons contribute to the output, providing a more accurate representation...
Random Forest mitigates overfitting by averaging multiple decision trees, enhancing generalization and robustness.
Ensemble Learning: Combines predictions from multiple trees to reduce variance.
Bootstrap Aggregating: Each tree is trained on a random subset of data, promoting diversity.
Feature Randomness: Randomly selects features for splitting, preventing dominance of any single feature.
Example: In a dataset with noise,...
For high imbalance problems, choose models like Random Forest or XGBoost, and use techniques like SMOTE for better performance.
Use ensemble methods like Random Forest or Gradient Boosting (e.g., XGBoost) for better handling of imbalanced data.
Consider using resampling techniques such as SMOTE (Synthetic Minority Over-sampling Technique) to balance the dataset.
Evaluate models using metrics like F1-score, precision, reca...
Transformers are advanced neural network architectures that excel in processing sequential data, particularly in NLP tasks.
Self-Attention Mechanism: Allows the model to weigh the importance of different words in a sentence, e.g., in 'The cat sat on the mat', 'cat' and 'sat' are closely related.
Positional Encoding: Since transformers don't have a built-in sense of order, positional encodings are added to input embedding...
Query, Key, and Value are components of self-attention mechanisms in neural networks, enabling context-aware representations.
In self-attention, each input element is transformed into three vectors: Query, Key, and Value.
The Query vector represents the current element's focus, while the Key vector represents the context of other elements.
The attention score is computed by taking the dot product of the Query and Key vect...
Self-attention allows models to weigh the importance of different words in a sequence when processing them, enhancing context understanding.
Self-attention computes a weighted representation of input sequences, focusing on relevant parts.
It uses three vectors: Query (Q), Key (K), and Value (V) to determine attention scores.
For each word, the model calculates how much attention to pay to every other word in the sequence.
...
Multi-head attention enhances model performance by focusing on different parts of input data simultaneously.
Improves natural language processing tasks like translation and summarization.
Used in image processing for tasks like object detection and segmentation.
Facilitates recommendation systems by analyzing user preferences from multiple perspectives.
Enhances speech recognition by focusing on different phonetic features...
Degrees of freedom in Chi-Square distribution indicate the number of independent values in a statistical calculation.
Degrees of freedom (df) = number of categories - 1 in Chi-Square tests.
Example: For a test with 5 categories, df = 5 - 1 = 4.
In goodness-of-fit tests, df helps determine the critical value for hypothesis testing.
Higher degrees of freedom lead to a more accurate approximation of the Chi-Square distributio...
Linear Regression uses statistical methods to model the relationship between variables, predicting outcomes based on input features.
Linear regression assumes a linear relationship between the dependent and independent variables.
The model can be represented as: Y = β0 + β1X1 + β2X2 + ... + βnXn + ε, where Y is the dependent variable.
The coefficients (β) are estimated using the least squares method, minimizing the sum of...
LLN and CLT are statistical theorems that describe the behavior of sample averages as sample size increases.
LLN (Law of Large Numbers) states that as the sample size increases, the sample mean converges to the population mean.
CLT (Central Limit Theorem) states that the distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population's distribution.
Both theorems are...
Covariance measures the directional relationship between two variables, while correlation quantifies the strength and direction of that relationship.
Covariance can take any value between -∞ and +∞, while correlation ranges from -1 to +1.
Positive covariance indicates that two variables move in the same direction, while negative covariance indicates they move in opposite directions.
Correlation standardizes covariance, ma...
Linear regression relies on several key assumptions for valid results, including linearity, independence, and homoscedasticity.
Linearity: The relationship between the independent and dependent variables should be linear. For example, predicting weight based on height.
Independence: Observations should be independent of each other. For instance, data collected from different individuals should not influence each other.
Ho...
I appeared for an interview before May 2024, where I was asked the following questions.
Designing an experiment to validate a recommendation engine involves A/B testing, metrics, and user feedback for effectiveness.
A/B Testing: Split users into two groups, one using the new engine and the other using the old one, to compare performance metrics.
Key Metrics: Measure click-through rates, conversion rates, and user engagement to assess the effectiveness of the new engine.
User Feedback: Collect qualitative fee...
Some of the top questions asked at the Times Internet Data Scientist interview -
based on 1 interview experience
Difficulty level
based on 2 reviews
Rating in categories
Senior Software Engineer
149
salaries
| ₹20.8 L/yr - ₹37 L/yr |
Product Manager
105
salaries
| ₹20 L/yr - ₹32.7 L/yr |
Software Developer
94
salaries
| ₹11.3 L/yr - ₹20 L/yr |
Manager
69
salaries
| ₹13.3 L/yr - ₹23.8 L/yr |
Accounts Manager
63
salaries
| ₹5.2 L/yr - ₹10.4 L/yr |
Amazon
Flipkart
Indiamart Intermesh
BigBasket