i
Deloitte
Filter interviews by
Bias-variance trade off is the balance between underfitting and overfitting in machine learning models.
Bias refers to the error introduced by approximating a real-world problem, leading to underfitting.
Variance refers to the model's sensitivity to fluctuations in the training data, leading to overfitting.
Finding the right balance between bias and variance is crucial for creating a model that generalizes well to un...
No, decision trees in a random forest are different due to the use of bootstrapping and feature randomization.
Decision trees in a random forest are trained on different subsets of the data through bootstrapping.
Each decision tree in a random forest also considers only a random subset of features at each split.
The final prediction in a random forest is made by aggregating the predictions of all individual decision ...
Handling class imbalanced dataset involves techniques like resampling, using different algorithms, adjusting class weights, and using ensemble methods.
Use resampling techniques like oversampling the minority class or undersampling the majority class.
Try using different algorithms that are less sensitive to class imbalance, such as Random Forest or XGBoost.
Adjust class weights in the model to give more importance t...
Precision is the ratio of correctly predicted positive observations to the total predicted positives, while recall is the ratio of correctly predicted positive observations to the all observations in actual class.
Precision focuses on the accuracy of positive predictions, while recall focuses on the proportion of actual positives that were correctly identified.
Precision = TP / (TP + FP), Recall = TP / (TP + FN)
High...
What people are saying about Deloitte
Pipeline flow is the process of moving data through a series of interconnected stages or steps in a systematic manner.
Pipeline flow involves the sequential movement of data from one stage to another, with each stage performing a specific task or transformation.
It helps in automating and streamlining the data processing process, making it more efficient and scalable.
Examples of pipeline flow include data preprocess...
I am looking for a competitive salary based on industry standards and my experience.
Research industry standards for Data Scientist salaries
Consider my level of experience and skills when determining salary expectations
Be open to negotiation based on the overall compensation package offered
Homoscedasticity refers to the assumption that the variance of errors is constant across all levels of the independent variable.
Homoscedasticity is a key assumption in linear regression analysis.
It indicates that the residuals (errors) have constant variance.
If the residuals exhibit a pattern where the spread of points increases or decreases as the predicted values increase, it violates the assumption of homosceda...
Deep learning neural networks are a type of artificial neural network with multiple layers, used for complex pattern recognition.
Deep learning neural networks consist of multiple layers of interconnected nodes, allowing for more complex patterns to be learned.
They are capable of automatically learning features from data, eliminating the need for manual feature engineering.
Examples include Convolutional Neural Netw...
Stemming and lemmatization are techniques used in natural language processing to reduce words to their base or root form.
Stemming is a process of reducing words to their base form by removing suffixes.
Lemmatization is a process of reducing words to their base form by considering the context and part of speech.
Stemming is faster but may not always produce a valid word, while lemmatization is slower but produces val...
Multicollinearity can be measured using correlation matrix, variance inflation factor (VIF), or eigenvalues.
Calculate the correlation matrix to identify highly correlated variables.
Use the variance inflation factor (VIF) to quantify the extent of multicollinearity.
Check for high eigenvalues in the correlation matrix, indicating multicollinearity.
Consider using dimensionality reduction techniques like principal com...
I applied via Approached by Company and was interviewed in May 2024. There were 3 interview rounds.
DSA was asked. And general coding language questions were asked. Previous experience based questions were asked.
Machine Learning, Generative AI, Deep learning interview questions. 2 Coding problems based on Algorithms.
I appeared for an interview in Mar 2025, where I was asked the following questions.
Feature selection methods help identify the most relevant variables for predictive modeling, improving model performance and interpretability.
Filter methods: Use statistical tests (e.g., Chi-square, ANOVA) to select features based on their relationship with the target variable.
Wrapper methods: Evaluate subsets of features by training a model (e.g., Recursive Feature Elimination) to find the best combination.
Embedded me...
Precision is the ratio of correctly predicted positive observations to the total predicted positives, while recall is the ratio of correctly predicted positive observations to the all observations in actual class.
Precision focuses on the accuracy of positive predictions, while recall focuses on the proportion of actual positives that were correctly identified.
Precision = TP / (TP + FP), Recall = TP / (TP + FN)
High prec...
I applied via Campus Placement and was interviewed in Apr 2024. There were 2 interview rounds.
1 hour test with 3 python programming questions.
No, decision trees in a random forest are different due to the use of bootstrapping and feature randomization.
Decision trees in a random forest are trained on different subsets of the data through bootstrapping.
Each decision tree in a random forest also considers only a random subset of features at each split.
The final prediction in a random forest is made by aggregating the predictions of all individual decision trees...
Handling class imbalanced dataset involves techniques like resampling, using different algorithms, adjusting class weights, and using ensemble methods.
Use resampling techniques like oversampling the minority class or undersampling the majority class.
Try using different algorithms that are less sensitive to class imbalance, such as Random Forest or XGBoost.
Adjust class weights in the model to give more importance to the...
I applied via Referral and was interviewed in Mar 2024. There was 1 interview round.
Stemming and lemmatization are techniques used in natural language processing to reduce words to their base or root form.
Stemming is a process of reducing words to their base form by removing suffixes.
Lemmatization is a process of reducing words to their base form by considering the context and part of speech.
Stemming is faster but may not always produce a valid word, while lemmatization is slower but produces valid wo...
Explains default values in Python classes and functions with examples.
Default values allow functions to be called with fewer arguments.
Example: def greet(name='Guest'): returns 'Hello, Guest!' if no name is provided.
In classes, default values can be set in the __init__ method.
Example: class Person: def __init__(self, name='John'): sets default name to 'John'.
Default mutable arguments (like lists) can lead to unexpected...
Multicollinearity can be measured using correlation matrix, variance inflation factor (VIF), or eigenvalues.
Calculate the correlation matrix to identify highly correlated variables.
Use the variance inflation factor (VIF) to quantify the extent of multicollinearity.
Check for high eigenvalues in the correlation matrix, indicating multicollinearity.
Consider using dimensionality reduction techniques like principal componen...
Pipeline flow is the process of moving data through a series of interconnected stages or steps in a systematic manner.
Pipeline flow involves the sequential movement of data from one stage to another, with each stage performing a specific task or transformation.
It helps in automating and streamlining the data processing process, making it more efficient and scalable.
Examples of pipeline flow include data preprocessing, ...
I applied via Company Website and was interviewed in Jan 2024. There was 1 interview round.
It was good , basic projects and coding was asked
The duration of Deloitte Data Scientist interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 17 interview experiences
Difficulty level
Duration
based on 23 reviews
Rating in categories
7-12 Yrs
Not Disclosed
Consultant
39.8k
salaries
| ₹10.1 L/yr - ₹21.5 L/yr |
Senior Consultant
24.7k
salaries
| ₹16.5 L/yr - ₹33.2 L/yr |
Analyst
16.5k
salaries
| ₹5 L/yr - ₹12 L/yr |
Assistant Manager
11.2k
salaries
| ₹12 L/yr - ₹22 L/yr |
Manager
7.9k
salaries
| ₹24.5 L/yr - ₹43.5 L/yr |
Accenture
PwC
Ernst & Young
Cognizant