i
Cognizant
Filter interviews by
SQL tricky questions often test your understanding of complex queries and data manipulation techniques.
Understand JOIN types: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN.
Use GROUP BY and HAVING clauses to aggregate data effectively.
Be familiar with window functions like ROW_NUMBER(), RANK(), and DENSE_RANK().
Practice writing subqueries and common table expressions (CTEs) for better readability.
Know how to h...
Different types of joins in Spark include inner join, outer join, left join, right join, and full join.
Inner join: Returns only the rows that have matching values in both datasets.
Outer join: Returns all rows when there is a match in either dataset.
Left join: Returns all rows from the left dataset and the matched rows from the right dataset.
Right join: Returns all rows from the right dataset and the matched rows f...
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.
It stores data in Parquet format and uses Apache Spark for processing.
Delta Lake ensures data reliability and data quality by providing schema enforcement and data versioning.
It supports tim...
To connect S3 from Databricks, you can use the AWS connector provided by Databricks.
Use the AWS connector provided by Databricks to connect to S3
Provide the necessary AWS credentials and S3 bucket details in the connector configuration
You can access S3 data using the file system API in Databricks
What people are saying about Cognizant
XCom in Airflow is a way for tasks to exchange messages or small amounts of data.
XCom allows tasks to communicate with each other by passing small pieces of data
It can be used to share information between tasks in a DAG
XCom can be used to pass information like task status, results, or any other data
CDC stands for Change Data Capture, a process of identifying and capturing changes made to data in a database.
CDC is used to track changes in data over time, allowing for real-time data integration and analysis.
It captures inserts, updates, and deletes made to data, providing a historical record of changes.
CDC is commonly used in data warehousing, data replication, and data integration processes.
Examples of CDC to...
A list in Python is a collection of items that are ordered and mutable.
Lists are created using square brackets []
Items in a list can be of different data types
Lists can be modified by adding, removing, or changing items
Example: my_list = [1, 'apple', True]
Coalesce reduces the number of partitions in a DataFrame, while repartition increases the number of partitions.
Coalesce is used to reduce the number of partitions in a DataFrame without shuffling data
Repartition is used to increase the number of partitions in a DataFrame and can involve shuffling data
Coalesce is more efficient for reducing partitions when no data movement is required
Repartition is typically used f...
Partitioning is dividing data into smaller chunks for better organization and performance, while bucketing is grouping data based on a specific criteria.
Partitioning is dividing data into smaller subsets based on a column or key.
Bucketing is grouping data based on a specific number of buckets or ranges.
Partitioning is commonly used in distributed systems for better data organization and query performance.
Bucketing...
Flat map is used to flatten nested arrays while map is used to transform each element in an array.
Flat map is used to flatten nested arrays into a single array.
Map is used to transform each element in an array using a function.
Flat map is commonly used in functional programming languages like JavaScript and Scala.
Map is a higher-order function that applies a given function to each element in an array.
Aptitude test involved with quantative aptitude, logical reasoning and reading comprehensions.
Optimization techniques in Spark improve performance and efficiency of data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data in memory
Using broadcast variables for small lookup tables
Avoiding shuffling operations whenever possible
Tuning operations in Databricks involves optimizing performance and efficiency of data processing tasks.
Use cluster configuration settings to allocate resources efficiently
Optimize code by minimizing data shuffling and reducing unnecessary operations
Leverage Databricks Auto Optimize to automatically tune performance
Monitor job performance using Databricks Runtime Metrics and Spark UI
I applied via Approached by Company and was interviewed in Jun 2024. There was 1 interview round.
Spark optimization techniques used in project
Partitioning data to optimize parallel processing
Caching frequently accessed data to reduce computation time
Using broadcast variables for efficient data sharing across nodes
Optimizing shuffle operations to minimize data movement
Tuning memory and CPU settings for better performance
XCom in Airflow is a way for tasks to exchange messages or small amounts of data.
XCom allows tasks to communicate with each other by passing small pieces of data
It can be used to share information between tasks in a DAG
XCom can be used to pass information like task status, results, or any other data
To connect S3 from Databricks, you can use the AWS connector provided by Databricks.
Use the AWS connector provided by Databricks to connect to S3
Provide the necessary AWS credentials and S3 bucket details in the connector configuration
You can access S3 data using the file system API in Databricks
I appeared for an interview in Mar 2025, where I was asked the following questions.
SQL tricky questions often test your understanding of complex queries and data manipulation techniques.
Understand JOIN types: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN.
Use GROUP BY and HAVING clauses to aggregate data effectively.
Be familiar with window functions like ROW_NUMBER(), RANK(), and DENSE_RANK().
Practice writing subqueries and common table expressions (CTEs) for better readability.
Know how to handle...
I applied via Campus Placement
I am passionate about working with data and enjoy the challenges and opportunities that come with being a data engineer.
I have a strong background in data engineering and enjoy working with data processing technologies such as Hadoop, Spark, and Kafka.
I find data engineering to be a dynamic and evolving field that allows me to continuously learn and grow my skills.
I am excited about the impact that data engineering can...
I applied via Naukri.com and was interviewed in Jan 2024. There was 1 interview round.
Word count by spark, flatMap, and map difference
Spark is a distributed computing framework for big data processing
flatMap is used to split each input string into words
map is used to transform each word into a key-value pair for counting
The difference lies in how the data is processed and transformed
Partitioning is dividing data into smaller chunks for better organization and performance, while bucketing is grouping data based on a specific criteria.
Partitioning is dividing data into smaller subsets based on a column or key.
Bucketing is grouping data based on a specific number of buckets or ranges.
Partitioning is commonly used in distributed systems for better data organization and query performance.
Bucketing is o...
50 MCQ for python SQL
The duration of Cognizant Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 39 interview experiences
Difficulty level
Duration
based on 175 reviews
Rating in categories
Associate
71.1k
salaries
| ₹5.3 L/yr - ₹13.6 L/yr |
Programmer Analyst
56k
salaries
| ₹3.5 L/yr - ₹7.3 L/yr |
Senior Associate
55.8k
salaries
| ₹10 L/yr - ₹23.6 L/yr |
Senior Processing Executive
30.1k
salaries
| ₹2.2 L/yr - ₹6.5 L/yr |
Technical Lead
18.6k
salaries
| ₹6 L/yr - ₹21.5 L/yr |
TCS
Infosys
Wipro
Accenture