i
Filter interviews by
I handle failures in ADF Pipelines by setting up monitoring, alerts, retries, and error handling mechanisms.
Implement monitoring to track pipeline runs and identify failures
Set up alerts to notify when a pipeline fails
Configure retries for transient failures
Use error handling activities like Try/Catch to manage exceptions
Utilize Azure Monitor to analyze pipeline performance and troubleshoot issues
Yes, I have worked on real-time data processing projects using technologies like Apache Kafka and Spark Streaming.
Implemented real-time data pipelines using Apache Kafka for streaming data ingestion
Utilized Spark Streaming for processing and analyzing real-time data
Worked on monitoring and optimizing the performance of real-time data processing systems
Dynamic Content in ADF allows for dynamic values to be passed between activities in Azure Data Factory.
Dynamic Content can be used to pass values between activities, such as passing output from one activity as input to another.
Expressions can be used within Dynamic Content to manipulate data or create dynamic values.
Dynamic Content can be used in various ADF components like datasets, linked services, and activitie...
You can load data from Databricks to Synapse using PolyBase or Azure Data Factory.
Use PolyBase to load data from Databricks to Synapse by creating an external table in Synapse pointing to the Databricks data location.
Alternatively, use Azure Data Factory to copy data from Databricks to Synapse by creating a pipeline with Databricks as source and Synapse as destination.
Ensure proper permissions and connectivity bet...
Set up ETL flow for data in Lake House using Databricks
Connect Databricks to Lake House storage (e.g. Azure Data Lake Storage)
Define ETL process using Databricks notebooks or jobs
Extract data from Lake House, transform as needed, and load into target destination
Monitor and schedule ETL jobs for automated data processing
SQL query to fetch customers who have not transacted in last 30 days but did before
Use a subquery to filter customers who transacted before 30 days
Use NOT IN or NOT EXISTS to exclude customers who transacted in last 30 days
SQL query to fetch Top 3 revenue generating Products from Sales table
Use the SELECT statement to retrieve data from the Sales table
Use the GROUP BY clause to group the data by Product
Use the ORDER BY clause to sort the revenue in descending order
Use the LIMIT clause to fetch only the top 3 revenue generating Products
Distributed table in Synapse is a table that is distributed across multiple nodes for parallel processing.
Distributed tables in Synapse are divided into distributions to optimize query performance.
There are three distribution types: Hash distribution, Round-robin distribution, and Replicate distribution.
Hash distribution is ideal for joining large tables on a common key, Round-robin distribution evenly distributes...
I applied via LinkedIn and was interviewed in Sep 2024. There were 2 interview rounds.
Set up ETL flow for data in Lake House using Databricks
Connect Databricks to Lake House storage (e.g. Azure Data Lake Storage)
Define ETL process using Databricks notebooks or jobs
Extract data from Lake House, transform as needed, and load into target destination
Monitor and schedule ETL jobs for automated data processing
I handle failures in ADF Pipelines by setting up monitoring, alerts, retries, and error handling mechanisms.
Implement monitoring to track pipeline runs and identify failures
Set up alerts to notify when a pipeline fails
Configure retries for transient failures
Use error handling activities like Try/Catch to manage exceptions
Utilize Azure Monitor to analyze pipeline performance and troubleshoot issues
Yes, I have worked on developing a Data Validation Framework to ensure data accuracy and consistency.
Developed automated data validation scripts to check for data accuracy and consistency
Implemented data quality checks to identify and resolve data issues
Utilized tools like SQL queries, Python scripts, and Azure Data Factory for data validation
Worked closely with data stakeholders to define validation rules and requirem...
SQL query to fetch Top 3 revenue generating Products from Sales table
Use the SELECT statement to retrieve data from the Sales table
Use the GROUP BY clause to group the data by Product
Use the ORDER BY clause to sort the revenue in descending order
Use the LIMIT clause to fetch only the top 3 revenue generating Products
SQL query to fetch customers who have not transacted in last 30 days but did before
Use a subquery to filter customers who transacted before 30 days
Use NOT IN or NOT EXISTS to exclude customers who transacted in last 30 days
Dynamic Content in ADF allows for dynamic values to be passed between activities in Azure Data Factory.
Dynamic Content can be used to pass values between activities, such as passing output from one activity as input to another.
Expressions can be used within Dynamic Content to manipulate data or create dynamic values.
Dynamic Content can be used in various ADF components like datasets, linked services, and activities.
For...
I have applied optimization techniques like partitioning, caching, and cluster sizing in Databricks projects.
Utilized partitioning to improve query performance by limiting the amount of data scanned
Implemented caching to store frequently accessed data in memory for faster retrieval
Adjusted cluster sizing based on workload requirements to optimize cost and performance
Distributed table in Synapse is a table that is distributed across multiple nodes for parallel processing.
Distributed tables in Synapse are divided into distributions to optimize query performance.
There are three distribution types: Hash distribution, Round-robin distribution, and Replicate distribution.
Hash distribution is ideal for joining large tables on a common key, Round-robin distribution evenly distributes data...
You can load data from Databricks to Synapse using PolyBase or Azure Data Factory.
Use PolyBase to load data from Databricks to Synapse by creating an external table in Synapse pointing to the Databricks data location.
Alternatively, use Azure Data Factory to copy data from Databricks to Synapse by creating a pipeline with Databricks as source and Synapse as destination.
Ensure proper permissions and connectivity between ...
Yes, I have worked on real-time data processing projects using technologies like Apache Kafka and Spark Streaming.
Implemented real-time data pipelines using Apache Kafka for streaming data ingestion
Utilized Spark Streaming for processing and analyzing real-time data
Worked on monitoring and optimizing the performance of real-time data processing systems
Top trending discussions
posted on 6 Dec 2019
I applied via Naukri.com and was interviewed in Jun 2019. There were 6 interview rounds.
I applied via Walk-in and was interviewed in Mar 2021. There was 1 interview round.
C is a general-purpose programming language known for its efficiency and low-level control.
C was developed by Dennis Ritchie at Bell Labs in the 1970s.
It is widely used for system programming, embedded systems, and developing operating systems.
C is known for its simplicity, allowing direct memory manipulation and efficient code execution.
It influenced the development of many other programming languages, such as C++, Ja...
Java is a high-level programming language known for its platform independence and object-oriented approach.
Java is widely used for developing desktop, web, and mobile applications.
It is known for its write once, run anywhere (WORA) principle.
Java programs are compiled into bytecode and executed on a Java Virtual Machine (JVM).
It supports multithreading, exception handling, and automatic memory management.
Popular framew...
Lambda is a function that allows you to write and use small, anonymous functions in programming languages.
Lambda functions are often used in functional programming languages.
They are anonymous, meaning they don't have a name.
Lambdas can be used as arguments to higher-order functions.
They are commonly used for tasks that require a short, one-time function.
Lambda functions can be written in various programming languages,...
Loan life cycle refers to the stages involved in the processing and management of a loan.
The loan life cycle includes origination, underwriting, funding, servicing, and collection.
Origination involves the application and approval process.
Underwriting involves assessing the borrower's creditworthiness and determining the terms of the loan.
Funding involves disbursing the loan amount to the borrower.
Servicing involves man...
posted on 6 Jan 2025
posted on 1 Mar 2023
I applied via Naukri.com and was interviewed in Sep 2022. There were 3 interview rounds.
I applied via Naukri.com and was interviewed in Feb 2022. There were 2 interview rounds.
Good test soo check
Failed in this round
posted on 5 Jun 2024
Logical question related to programming
posted on 13 Jan 2024
I applied via Recruitment Consulltant and was interviewed in Jul 2023. There were 3 interview rounds.
About the designation and management.
Coding structure using GET,PUT,POST,DELETE etc.
I applied via Referral and was interviewed before Mar 2023. There was 1 interview round.
Some of the top questions asked at the Insight Global Technologies Azure Data Engineer interview -
based on 3 interview experiences
Difficulty level
Duration
Devops Engineer
23
salaries
| ₹14 L/yr - ₹22 L/yr |
Data Engineer
5
salaries
| ₹9.5 L/yr - ₹25 L/yr |
Senior Data Engineer
5
salaries
| ₹20.2 L/yr - ₹42.4 L/yr |
Senior QA Engineer -Software Testing
5
salaries
| ₹32 L/yr - ₹35 L/yr |
DBA Administrator
5
salaries
| ₹16 L/yr - ₹17 L/yr |
Randstad
First Advantage
Experis IT
Pyramid IT Consulting