Upload Button Icon Add office photos

HSBC Group

Compare button icon Compare button icon Compare

Filter interviews by

HSBC Group Data Scientist Interview Questions and Answers

Updated 15 Dec 2024

9 Interview questions

A Data Scientist was asked 7mo ago
Q. How can we decide to choose Linear Regression for a business problem?
Ans. 

Linear Regression is chosen for its simplicity, interpretability, and effectiveness in modeling linear relationships in data.

  • Linear relationship: Use when the relationship between independent and dependent variables is linear, e.g., predicting sales based on advertising spend.

  • Continuous outcome: Suitable for predicting continuous outcomes, like house prices based on features like size and location.

  • Interpretability...

A Data Scientist was asked 7mo ago
Q. How do embeddings work in vector databases?
Ans. 

Embeddings in vector databases represent data points as dense vectors for efficient similarity search and retrieval.

  • Embeddings convert categorical data into continuous vector space, enabling mathematical operations.

  • For example, words can be represented as vectors in Word2Vec, capturing semantic relationships.

  • Vector databases store these embeddings, allowing for fast nearest neighbor searches.

  • Applications include r...

Data Scientist Interview Questions Asked at Other Companies

Q1. for a data with 1000 samples and 700 dimensions, how would you fi ... read more
Q2. Special Sum of Array Problem Statement Given an array 'arr' conta ... read more
asked in Affine
Q3. You have a pandas dataframe with three columns filled with state ... read more
asked in Walmart
Q4. Describe the data you would analyze to solve cost and revenue opt ... read more
Q5. Clone a Linked List with Random Pointers Given a linked list wher ... read more
A Data Scientist was asked 7mo ago
Q. Explain the ARIMA model.
Ans. 

ARIMA is a statistical model used for forecasting time series data by capturing trends and seasonality.

  • ARIMA stands for AutoRegressive Integrated Moving Average.

  • It combines three components: AR (AutoRegressive), I (Integrated), and MA (Moving Average).

  • AR component uses past values to predict future values.

  • I component involves differencing the data to make it stationary.

  • MA component models the error of the predicti...

A Data Scientist was asked 7mo ago
Q. What is a token, and what are the token limits for open-source LLMs?
Ans. 

Tokens are units of text processed by LLMs, with limits varying by model, affecting input/output length.

  • A token can be as short as one character or as long as one word (e.g., 'cat' is one token, 'chatGPT' is one token).

  • Common token limits for open-source LLMs range from 512 to 4096 tokens, depending on the architecture.

  • For example, GPT-2 has a limit of 1024 tokens, while GPT-3 can handle up to 4096 tokens.

  • Exceedin...

What people are saying about HSBC Group

View All
nuttywhale
Verified Icon
2w
works at
Persistent Systems
Hsbc wfh policy , pune
Does the HSBC require employees to stay within a 50km radius of the office to be eligible for WFH? Also, are there any other restrictions while working remotely, like fixed hours, tracking tools, or location checks? what if i need wfh from hometown for few days ? HSBC Group HSBC Software Development
Got a question about HSBC Group?
Ask anonymously on communities.
A Data Scientist was asked 7mo ago
Q. What is the difference between a Regression problem and a Time Series problem?
Ans. 

Regression predicts continuous outcomes; time series analyzes data points over time for trends and patterns.

  • Regression focuses on relationships between variables (e.g., predicting house prices based on features).

  • Time series analyzes data collected at regular intervals (e.g., stock prices over time).

  • Regression can be used for static datasets, while time series requires temporal ordering.

  • In regression, predictors ca...

A Data Scientist was asked 7mo ago
Q. What are the advantages of LSTM over RNN?
Ans. 

LSTMs effectively handle long-term dependencies, overcoming RNNs' vanishing gradient problem.

  • LSTMs use memory cells to store information over long sequences, unlike RNNs which forget earlier data.

  • They employ gates (input, output, forget) to control the flow of information, enhancing learning.

  • LSTMs are better suited for tasks like language modeling and time series prediction where context is crucial.

  • For example, in...

HSBC Group HR Interview Questions

92 questions and answers

Q. What can you share about your current role and your approach to making sales pit ... read more
Q. Could you provide an overview of your past experiences and your understanding of ... read more
Q. If we offer you a position and you receive a counteroffer from your current orga ... read more
A Data Scientist was asked
Q. What is the difference between CNN and MLP?
Ans. 

CNN is used for image recognition while MLP is used for general classification tasks.

  • CNN uses convolutional layers to extract features from images while MLP uses fully connected layers.

  • CNN is better suited for tasks that require spatial understanding like object detection while MLP is better for tabular data.

  • CNN has fewer parameters than MLP due to weight sharing in convolutional layers.

  • CNN can handle input of var...

Are these interview questions helpful?
A Data Scientist was asked 11mo ago
Q. Central Limit Theorem
Ans. 

Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.

  • The Central Limit Theorem is essential in statistics as it allows us to make inferences about a population based on a sample.

  • It states that regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normally dist...

A Data Scientist was asked
Q. Feature selection methods
Ans. 

Feature selection methods help in selecting the most relevant features for building predictive models.

  • Feature selection methods aim to reduce the number of input variables to only those that are most relevant.

  • Common methods include filter methods, wrapper methods, and embedded methods.

  • Examples include Recursive Feature Elimination (RFE), Principal Component Analysis (PCA), and Lasso regression.

HSBC Group Data Scientist Interview Experiences

6 interviews found

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Referral and was interviewed in Nov 2024. There were 2 interview rounds.

Round 1 - Technical 

(4 Questions)

  • Q1. Types of Chunking in data preparation in RAG
  • Q2. How Embedding works in Vector Databases
  • Ans. 

    Embeddings in vector databases represent data points as dense vectors for efficient similarity search and retrieval.

    • Embeddings convert categorical data into continuous vector space, enabling mathematical operations.

    • For example, words can be represented as vectors in Word2Vec, capturing semantic relationships.

    • Vector databases store these embeddings, allowing for fast nearest neighbor searches.

    • Applications include recomm...

  • Answered by AI
  • Q3. Explain ARIMA model
  • Ans. 

    ARIMA is a statistical model used for forecasting time series data by capturing trends and seasonality.

    • ARIMA stands for AutoRegressive Integrated Moving Average.

    • It combines three components: AR (AutoRegressive), I (Integrated), and MA (Moving Average).

    • AR component uses past values to predict future values.

    • I component involves differencing the data to make it stationary.

    • MA component models the error of the prediction as...

  • Answered by AI
  • Q4. How can we decide to choose Linear Regression for a business problem
  • Ans. 

    Linear Regression is chosen for its simplicity, interpretability, and effectiveness in modeling linear relationships in data.

    • Linear relationship: Use when the relationship between independent and dependent variables is linear, e.g., predicting sales based on advertising spend.

    • Continuous outcome: Suitable for predicting continuous outcomes, like house prices based on features like size and location.

    • Interpretability: Pro...

  • Answered by AI
Round 2 - Technical 

(4 Questions)

  • Q1. What is token and it's limit for Open Source LLMs
  • Ans. 

    Tokens are units of text processed by LLMs, with limits varying by model, affecting input/output length.

    • A token can be as short as one character or as long as one word (e.g., 'cat' is one token, 'chatGPT' is one token).

    • Common token limits for open-source LLMs range from 512 to 4096 tokens, depending on the architecture.

    • For example, GPT-2 has a limit of 1024 tokens, while GPT-3 can handle up to 4096 tokens.

    • Exceeding tok...

  • Answered by AI
  • Q2. Difference of a Regression and Time Series problem
  • Ans. 

    Regression predicts continuous outcomes; time series analyzes data points over time for trends and patterns.

    • Regression focuses on relationships between variables (e.g., predicting house prices based on features).

    • Time series analyzes data collected at regular intervals (e.g., stock prices over time).

    • Regression can be used for static datasets, while time series requires temporal ordering.

    • In regression, predictors can be ...

  • Answered by AI
  • Q3. Advantage of LSTM over RNN
  • Q4. Performance Metrics for Logistic Regression

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(1 Question)

  • Q1. Asked about ml algos

Data Scientist Interview Questions & Answers

user image DEVANG RATHOD

posted on 25 Aug 2024

Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Aptitude Test 

(1 Question)

  • Q1. Central Limit Theorem
  • Ans. 

    Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.

    • The Central Limit Theorem is essential in statistics as it allows us to make inferences about a population based on a sample.

    • It states that regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normally distribut...

  • Answered by AI
Interview experience
3
Average
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Referral and was interviewed before May 2023. There was 1 interview round.

Round 1 - Technical 

(2 Questions)

  • Q1. Self Intro and projects discussion
  • Q2. Feature selection methods
  • Ans. 

    Feature selection methods help in selecting the most relevant features for building predictive models.

    • Feature selection methods aim to reduce the number of input variables to only those that are most relevant.

    • Common methods include filter methods, wrapper methods, and embedded methods.

    • Examples include Recursive Feature Elimination (RFE), Principal Component Analysis (PCA), and Lasso regression.

  • Answered by AI

Skills evaluated in this interview

I applied via Approached by Company and was interviewed before Sep 2021. There were 3 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all tips
Round 2 - Technical 

(1 Question)

  • Q1. Projects and Data Science concepts
Round 3 - Technical 

(1 Question)

  • Q1. Python and coding skills

Interview Preparation Tips

Interview preparation tips for other job seekers - Be through with concepts - ML, stats, NLP

I applied via Recruitment Consulltant and was interviewed before Aug 2021. There was 1 interview round.

Round 1 - Technical 

(1 Question)

  • Q1. Difference between CNN and MLP
  • Ans. 

    CNN is used for image recognition while MLP is used for general classification tasks.

    • CNN uses convolutional layers to extract features from images while MLP uses fully connected layers.

    • CNN is better suited for tasks that require spatial understanding like object detection while MLP is better for tabular data.

    • CNN has fewer parameters than MLP due to weight sharing in convolutional layers.

    • CNN can handle input of varying ...

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Brush up basic statistics . Also prepare atleast 2 , 3 ML algorithms for the interview.

Skills evaluated in this interview

Interview questions from similar companies

I applied via Walk-in and was interviewed in Mar 2020. There was 1 interview round.

Interview Questionnaire 

10 Questions

  • Q1. What is R square and how R square is different from Adjusted R square
  • Ans. 

    R square is a statistical measure that represents the proportion of the variance in the dependent variable explained by the independent variables.

    • R square is a value between 0 and 1, where 0 indicates that the independent variables do not explain any of the variance in the dependent variable, and 1 indicates that they explain all of it.

    • It is used to evaluate the goodness of fit of a regression model.

    • Adjusted R square t...

  • Answered by AI
  • Q2. Explain what do u understand by the team WOE and IV. What's the importance. Advantages and disadvantages
  • Ans. 

    WOE (Weight of Evidence) and IV (Information Value) are metrics used for feature selection and assessing predictive power in models.

    • WOE transforms categorical variables into continuous variables, making them more suitable for modeling.

    • IV quantifies the predictive power of a feature by measuring the separation between the good and bad outcomes.

    • For example, if a feature has an IV of 0.3, it indicates strong predictive po...

  • Answered by AI
  • Q3. What are variable reducing techniques
  • Ans. 

    Variable reducing techniques are methods used to identify and select the most relevant variables in a dataset.

    • Variable reducing techniques help in reducing the number of variables in a dataset.

    • These techniques aim to identify the most important variables that contribute significantly to the outcome.

    • Some common variable reducing techniques include feature selection, dimensionality reduction, and correlation analysis.

    • Fea...

  • Answered by AI
  • Q4. Which test is used in logistic regression to check the significance of the variable
  • Ans. 

    The Wald test is used in logistic regression to check the significance of the variable.

    • The Wald test calculates the ratio of the estimated coefficient to its standard error.

    • It follows a chi-square distribution with one degree of freedom.

    • A small p-value indicates that the variable is significant.

    • For example, in Python, the statsmodels library provides the Wald test in the summary of a logistic regression model.

  • Answered by AI
  • Q5. How to check multicollinearity in Logistic regression
  • Ans. 

    Multicollinearity in logistic regression can be checked using correlation matrix and variance inflation factor (VIF).

    • Calculate the correlation matrix of the independent variables and check for high correlation coefficients.

    • Calculate the VIF for each independent variable and check for values greater than 5 or 10.

    • Consider removing one of the highly correlated variables or variables with high VIF to address multicollinear...

  • Answered by AI
  • Q6. Difference between bagging and boosting
  • Ans. 

    Bagging and boosting are ensemble methods used in machine learning to improve model performance.

    • Bagging involves training multiple models on different subsets of the training data and then combining their predictions through averaging or voting.

    • Boosting involves iteratively training models on the same dataset, with each subsequent model focusing on the samples that were misclassified by the previous model.

    • Bagging reduc...

  • Answered by AI
  • Q7. Explain the logistics regression process
  • Ans. 

    Logistic regression is a statistical method used to analyze and model the relationship between a binary dependent variable and one or more independent variables.

    • It is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables.

    • It uses a logistic function to model the probability of the dependent variable taking a particular value.

    • It is commo...

  • Answered by AI
  • Q8. Explain Gini coefficient
  • Ans. 

    Gini coefficient measures the inequality among values of a frequency distribution.

    • Gini coefficient ranges from 0 to 1, where 0 represents perfect equality and 1 represents perfect inequality.

    • It is commonly used to measure income inequality in a population.

    • A Gini coefficient of 0.4 or higher is considered to be a high level of inequality.

    • Gini coefficient can be calculated using the Lorenz curve, which plots the cumulati...

  • Answered by AI
  • Q9. Difference between chair and cart
  • Ans. 

    A chair is a piece of furniture used for sitting, while a cart is a vehicle used for transporting goods.

    • A chair typically has a backrest and armrests, while a cart does not.

    • A chair is designed for one person to sit on, while a cart can carry multiple items or people.

    • A chair is usually stationary, while a cart is mobile and can be pushed or pulled.

    • A chair is commonly found in homes, offices, and public spaces, while a c...

  • Answered by AI
  • Q10. How to check outliers in a variable, what treatment should you use to remove such outliers
  • Ans. 

    Outliers can be detected using statistical methods like box plots, z-score, and IQR. Treatment can be removal or transformation.

    • Use box plots to visualize outliers

    • Calculate z-score and remove data points with z-score greater than 3

    • Calculate IQR and remove data points outside 1.5*IQR

    • Transform data using log or square root to reduce the impact of outliers

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Explain the concept properly, if not able to explain properly then take a pause and try again with some examples. Be confident.

Skills evaluated in this interview

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. How do you define model Gini?
  • Ans. 

    Model Gini is a measure of statistical dispersion used to evaluate the performance of classification models.

    • Model Gini is calculated as twice the area between the ROC curve and the diagonal line (random model).

    • It ranges from 0 (worst model) to 1 (best model), with higher values indicating better model performance.

    • A Gini coefficient of 0.5 indicates a model that is no better than random guessing.

    • Commonly used in credit ...

  • Answered by AI
  • Q2. How to you train XG boost model
  • Ans. 

    XGBoost model is trained by specifying parameters, splitting data into training and validation sets, fitting the model, and tuning hyperparameters.

    • Specify parameters for XGBoost model such as learning rate, max depth, and number of trees

    • Split data into training and validation sets using train_test_split function

    • Fit the XGBoost model on training data using fit method

    • Tune hyperparameters using techniques like grid search...

  • Answered by AI

Skills evaluated in this interview

Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Coding Test 

Python coding question and ML question

Round 2 - Technical 

(1 Question)

  • Q1. ML questions from resume + general
Round 3 - One-on-one 

(1 Question)

  • Q1. Techno managerial round
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via LinkedIn and was interviewed in Jul 2024. There were 3 interview rounds.

Round 1 - Assignment 

Assignment on credit risk

Round 2 - Technical 

(1 Question)

  • Q1. Hyperparameter tuning
Round 3 - Technical 

(1 Question)

  • Q1. Case study for problem solving

HSBC Group Interview FAQs

How many rounds are there in HSBC Group Data Scientist interview?
HSBC Group interview process usually has 1-2 rounds. The most common rounds in the HSBC Group interview process are Technical, Resume Shortlist and Aptitude Test.
How to prepare for HSBC Group Data Scientist interview?
Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at HSBC Group. The most common topics and skills that interviewers at HSBC Group expect are Clinical SAS Programming, Data Analysis, Data Domain, Data Quality and Data Science.
What are the top questions asked in HSBC Group Data Scientist interview?

Some of the top questions asked at the HSBC Group Data Scientist interview -

  1. How can we decide to choose Linear Regression for a business prob...read more
  2. What is token and it's limit for Open Source L...read more
  3. How Embedding works in Vector Databa...read more

Tell us how to improve this page.

Overall Interview Experience Rating

3.5/5

based on 4 interview experiences

Difficulty level

Moderate 100%

Duration

Less than 2 weeks 100%
View more

Interview Questions from Similar Companies

Wells Fargo Interview Questions
3.8
 • 628 Interviews
Citicorp Interview Questions
3.7
 • 593 Interviews
American Express Interview Questions
4.1
 • 390 Interviews
BNY Interview Questions
3.8
 • 370 Interviews
UBS Interview Questions
3.9
 • 354 Interviews
Morgan Stanley Interview Questions
3.6
 • 309 Interviews
View all
HSBC Group Data Scientist Salary
based on 96 salaries
₹10.7 L/yr - ₹27.1 L/yr
22% more than the average Data Scientist Salary in India
View more details

HSBC Group Data Scientist Reviews and Ratings

based on 12 reviews

3.9/5

Rating in categories

3.5

Skill development

4.4

Work-life balance

3.4

Salary

4.4

Job security

4.2

Company culture

3.1

Promotions

3.6

Work satisfaction

Explore 12 Reviews and Ratings
Assistant Manager
2.8k salaries
unlock blur

₹5.5 L/yr - ₹13.2 L/yr

Manager
2.2k salaries
unlock blur

₹14 L/yr - ₹24.1 L/yr

Senior Software Engineer
1.8k salaries
unlock blur

₹13.2 L/yr - ₹24 L/yr

Assistant Vice President
1.7k salaries
unlock blur

₹25 L/yr - ₹43 L/yr

Software Engineer
1.5k salaries
unlock blur

₹7.8 L/yr - ₹14 L/yr

Explore more salaries
Compare HSBC Group with

Wells Fargo

3.8
Compare

JPMorgan Chase & Co.

3.9
Compare

Cholamandalam Investment & Finance

3.9
Compare

Citicorp

3.7
Compare
write
Share an Interview