i
Cognizant
Filter interviews by
Coalesce reduces the number of partitions without shuffling data, while repartition reshuffles data to create a specific number of partitions.
Coalesce is used to reduce the number of partitions without shuffling data
Repartition is used to increase or decrease the number of partitions by shuffling data
Coalesce is more efficient when reducing partitions as it avoids shuffling
Repartition is useful when you need to ex...
DataFrame is a higher-level abstraction built on top of RDD, providing more structure and optimization capabilities.
DataFrames are distributed collections of data organized into named columns, similar to tables in a relational database.
RDDs are lower-level abstractions representing a collection of objects distributed across a cluster, with no inherent structure.
DataFrames provide optimizations like query optimizat...
The SQL code for calculating year-on-year growth percentage with year-wise grouping.
Use the LAG function to get the previous year's value
Calculate the growth percentage using the formula: ((current year value - previous year value) / previous year value) * 100
Group by year to get year-wise growth percentage
SQL query to find the second highest rank in a dataset
Use the ORDER BY clause to sort the ranks in descending order
Use the LIMIT and OFFSET clauses to skip the highest rank and retrieve the second highest rank
Example: SELECT rank FROM dataset ORDER BY rank DESC LIMIT 1 OFFSET 1
What people are saying about Cognizant
To connect Google Cloud Platform with Apache Spark, tools like Dataproc, Cloud Storage, and BigQuery can be used.
Use Google Cloud Dataproc to create managed Spark and Hadoop clusters on GCP.
Store data in Google Cloud Storage and access it from Spark applications.
Utilize Google BigQuery for querying and analyzing large datasets directly from Spark.
Optimization techniques in Apache Spark improve performance and efficiency.
Partitioning data to distribute work evenly
Caching frequently accessed data in memory
Using broadcast variables for small lookup tables
Optimizing shuffle operations by reducing data movement
Applying predicate pushdown to filter data early
Orchestrating code in GCP involves using tools like Cloud Composer or Cloud Dataflow to schedule and manage workflows.
Use Cloud Composer to create, schedule, and monitor workflows using Apache Airflow
Utilize Cloud Dataflow for real-time data processing and batch processing tasks
Use Cloud Functions for event-driven serverless functions
Leverage Cloud Scheduler for job scheduling
Integrate with other GCP services like...
Coalesce reduces the number of partitions without shuffling data, while repartition increases the number of partitions by shuffling data. Cache and persist are used to persist RDDs in memory.
Coalesce is used to reduce the number of partitions without shuffling data, while repartition is used to increase the number of partitions by shuffling data.
Coalesce is more efficient when reducing partitions as it avoids shuf...
I applied via Walk-in and was interviewed in Nov 2024. There were 3 interview rounds.
Optimization techniques in Apache Spark improve performance and efficiency.
Partitioning data to distribute work evenly
Caching frequently accessed data in memory
Using broadcast variables for small lookup tables
Optimizing shuffle operations by reducing data movement
Applying predicate pushdown to filter data early
Coalesce reduces the number of partitions without shuffling data, while repartition increases the number of partitions by shuffling data. Cache and persist are used to persist RDDs in memory.
Coalesce is used to reduce the number of partitions without shuffling data, while repartition is used to increase the number of partitions by shuffling data.
Coalesce is more efficient when reducing partitions as it avoids shuffling...
SQL query to find the second highest rank in a dataset
Use the ORDER BY clause to sort the ranks in descending order
Use the LIMIT and OFFSET clauses to skip the highest rank and retrieve the second highest rank
Example: SELECT rank FROM dataset ORDER BY rank DESC LIMIT 1 OFFSET 1
The SQL code for calculating year-on-year growth percentage with year-wise grouping.
Use the LAG function to get the previous year's value
Calculate the growth percentage using the formula: ((current year value - previous year value) / previous year value) * 100
Group by year to get year-wise growth percentage
To connect Google Cloud Platform with Apache Spark, tools like Dataproc, Cloud Storage, and BigQuery can be used.
Use Google Cloud Dataproc to create managed Spark and Hadoop clusters on GCP.
Store data in Google Cloud Storage and access it from Spark applications.
Utilize Google BigQuery for querying and analyzing large datasets directly from Spark.
Orchestrating code in GCP involves using tools like Cloud Composer or Cloud Dataflow to schedule and manage workflows.
Use Cloud Composer to create, schedule, and monitor workflows using Apache Airflow
Utilize Cloud Dataflow for real-time data processing and batch processing tasks
Use Cloud Functions for event-driven serverless functions
Leverage Cloud Scheduler for job scheduling
Integrate with other GCP services like BigQ...
Coalesce reduces the number of partitions without shuffling data, while repartition reshuffles data to create a specific number of partitions.
Coalesce is used to reduce the number of partitions without shuffling data
Repartition is used to increase or decrease the number of partitions by shuffling data
Coalesce is more efficient when reducing partitions as it avoids shuffling
Repartition is useful when you need to explici...
DataFrame is a higher-level abstraction built on top of RDD, providing more structure and optimization capabilities.
DataFrames are distributed collections of data organized into named columns, similar to tables in a relational database.
RDDs are lower-level abstractions representing a collection of objects distributed across a cluster, with no inherent structure.
DataFrames provide optimizations like query optimization a...
I applied via Company Website and was interviewed before Oct 2020. There were 3 interview rounds.
I applied via Company Website and was interviewed before Feb 2020. There was 1 interview round.
What people are saying about Cognizant
I applied via LinkedIn and was interviewed before Jul 2021. There were 2 interview rounds.
Easy logical questions
basic quant
Easy level coding questions
Counting frequency of alphabets
Multiple dispatch is not a feature of Redux. It can be achieved using middleware or custom logic.
Middleware like redux-thunk or redux-saga can be used to dispatch multiple actions based on a single action.
Custom logic can be implemented in the reducer to handle multiple actions based on a single action type.
For example, a single 'ADD_ITEM' action can trigger multiple actions like 'UPDATE_TOTAL', 'UPDATE_HISTORY', etc.
M...
I applied via Campus Placement and was interviewed before Jun 2020. There were 3 interview rounds.
I applied via Job Portal and was interviewed before Dec 2019. There was 1 interview round.
I applied via Company Website and was interviewed before Jun 2020. There was 1 interview round.
I applied via Campus Placement and was interviewed before Mar 2020. There were 5 interview rounds.
Some of the top questions asked at the Cognizant Pyspark Developer interview -
based on 2 interview experiences
Difficulty level
Duration
based on 2 reviews
Rating in categories
Associate
73.1k
salaries
| ₹5.3 L/yr - ₹12.5 L/yr |
Programmer Analyst
56.2k
salaries
| ₹3.5 L/yr - ₹7.3 L/yr |
Senior Associate
52.9k
salaries
| ₹10.5 L/yr - ₹23.5 L/yr |
Senior Processing Executive
29.8k
salaries
| ₹2.2 L/yr - ₹6.5 L/yr |
Technical Lead
19k
salaries
| ₹6 L/yr - ₹21.3 L/yr |
TCS
Infosys
Wipro
Accenture