i
Cognizant
Filter interviews by
Deleting duplicate rows in SQL
Use the DISTINCT keyword in SELECT statement to retrieve unique rows
Use GROUP BY clause to group rows with same values and then use aggregate functions to select one row
Use the ROW_NUMBER() function to assign a unique number to each row and then delete the rows with duplicate numbers
To find process id in Linux, use the command 'ps -aux | grep
Open the terminal
Type 'ps -aux' to list all running processes
Use 'grep
The process id (PID) will be listed in the second column
To remove header and trailer from a sequential data file in Datastage.
Use Sequential File stage in Datastage.
Set the 'Skip Rows' property to the number of header rows to be skipped.
Set the 'Trailer Rows' property to the number of trailer rows to be skipped.
Use a Transformer stage to remove any remaining header or trailer rows.
Use the 'Remove' function in the Transformer stage to remove the rows.
Spark OutOfMemory (OOM) issues occur when the application exceeds memory limits, causing failures in processing data.
Increase executor memory: Use the configuration 'spark.executor.memory' to allocate more memory to executors.
Optimize data partitioning: Use 'repartition()' or 'coalesce()' to manage the number of partitions effectively.
Use broadcast variables: For large lookup tables, use 'sc.broadcast()' to reduce...
What people are saying about Cognizant
Reading data from a .log file and extracting columns with a specific regex.
Use Python's built-in 're' module to define the regex pattern.
Open the .log file using Python's 'open' function.
Iterate through each line of the file and extract the desired columns using the regex pattern.
Store the extracted data in a data structure such as a list or dictionary.
Calculate the count and profit from data over the last four years using SQL or data analysis tools.
Use SQL queries like 'SELECT COUNT(*)' to get the total count of records.
To calculate profit, use 'SUM(profit_column)' grouped by year.
Example SQL: 'SELECT YEAR(date_column), COUNT(*), SUM(profit_column) FROM sales WHERE date_column >= DATE_SUB(CURDATE(), INTERVAL 4 YEAR) GROUP BY YEAR(date_column);'
Consider filte...
SORT BY, ORDER BY, CLUSTER BY, and DISTRIBUTE BY are SQL clauses used for data sorting and partitioning.
SORT BY is used to sort the result set in ascending or descending order based on one or more columns.
ORDER BY is used to sort the result set in ascending or descending order based on one or more columns. It is similar to SORT BY but can be used with other clauses like LIMIT and OFFSET.
CLUSTER BY is used to group...
RDS, VA, DF, VS, and DS are all acronyms related to data engineering.
RDS stands for Relational Database Service, a managed database service by AWS.
VA stands for Virtual Assistant, a software program that can assist with tasks.
DF stands for Dataflow, a managed service by Google Cloud for data processing.
VS stands for Virtual Server, a server that runs on a virtual machine.
DS stands for Datastore, a NoSQL document d...
Small file problem refers to the issue of having a large number of small files in a storage system.
Small files can cause inefficiencies in storage and processing.
Solutions include consolidating small files into larger ones or using a different storage system.
Examples include Hadoop's SequenceFile format and Amazon S3's object size optimization.
SQL queries with window functions
Window functions perform calculations across a set of rows that are related to the current row
Common window functions include ROW_NUMBER, RANK, DENSE_RANK, and NTILE
Window functions are used with the OVER() clause to define the window or subset of rows to perform the calculation on
To kill a job in Datastage
Stop the job manually from the Director client
Terminate the job from the command line using the dsjob command
Kill the job process from the operating system level
Delete the job from the Datastage repository
SQL queries with window functions
Window functions perform calculations across a set of rows that are related to the current row
Common window functions include ROW_NUMBER, RANK, DENSE_RANK, and NTILE
Window functions are used with the OVER() clause to define the window or subset of rows to perform the calculation on
SORT BY, ORDER BY, CLUSTER BY, and DISTRIBUTE BY are SQL clauses used for data sorting and partitioning.
SORT BY is used to sort the result set in ascending or descending order based on one or more columns.
ORDER BY is used to sort the result set in ascending or descending order based on one or more columns. It is similar to SORT BY but can be used with other clauses like LIMIT and OFFSET.
CLUSTER BY is used to group data...
Spark OutOfMemory (OOM) issues occur when the application exceeds memory limits, causing failures in processing data.
Increase executor memory: Use the configuration 'spark.executor.memory' to allocate more memory to executors.
Optimize data partitioning: Use 'repartition()' or 'coalesce()' to manage the number of partitions effectively.
Use broadcast variables: For large lookup tables, use 'sc.broadcast()' to reduce memo...
Small file problem refers to the issue of having a large number of small files in a storage system.
Small files can cause inefficiencies in storage and processing.
Solutions include consolidating small files into larger ones or using a different storage system.
Examples include Hadoop's SequenceFile format and Amazon S3's object size optimization.
RDS, VA, DF, VS, and DS are all acronyms related to data engineering.
RDS stands for Relational Database Service, a managed database service by AWS.
VA stands for Virtual Assistant, a software program that can assist with tasks.
DF stands for Dataflow, a managed service by Google Cloud for data processing.
VS stands for Virtual Server, a server that runs on a virtual machine.
DS stands for Datastore, a NoSQL document databa...
I applied via Recruitment Consultant and was interviewed in Sep 2021. There were 3 interview rounds.
I applied via Recruitment Consultant and was interviewed before Jun 2020. There were 4 interview rounds.
Reading data from a .log file and extracting columns with a specific regex.
Use Python's built-in 're' module to define the regex pattern.
Open the .log file using Python's 'open' function.
Iterate through each line of the file and extract the desired columns using the regex pattern.
Store the extracted data in a data structure such as a list or dictionary.
Calculate the count and profit from data over the last four years using SQL or data analysis tools.
Use SQL queries like 'SELECT COUNT(*)' to get the total count of records.
To calculate profit, use 'SUM(profit_column)' grouped by year.
Example SQL: 'SELECT YEAR(date_column), COUNT(*), SUM(profit_column) FROM sales WHERE date_column >= DATE_SUB(CURDATE(), INTERVAL 4 YEAR) GROUP BY YEAR(date_column);'
Consider filtering ...
Optimizations for data engineering
Use indexing to speed up queries
Partition data to improve query performance
Use caching to reduce data retrieval time
Optimize data storage format for faster processing
Use parallel processing to speed up data processing
Optimize network bandwidth usage
Use compression to reduce storage and network usage
Answering how to read JSON in Python.
Use the json module to load and parse JSON data
Use the json.loads() method to load JSON data from a string
Use the json.load() method to load JSON data from a file
Access JSON data using keys or indexes
Use the json.dumps() method to convert Python objects to JSON strings
I applied via Company Website and was interviewed before Jun 2020. There were 7 interview rounds.
I applied via Campus Placement and was interviewed before Jan 2021. There were 2 interview rounds.
Good
I applied via Campus Placement and was interviewed before Jul 2020. There were 4 interview rounds.
I applied via LinkedIn and was interviewed before Jul 2020. There were 4 interview rounds.
I applied via Company Website and was interviewed before Oct 2020. There were 3 interview rounds.
I applied via Amcat and was interviewed before Jul 2021. There were 2 interview rounds.
Refer R S Agarwal book for apptitude
A C program to perform arithmetic operations on fractional numbers.
Use float or double data type to store fractional numbers.
Use scanf() to take input from the user.
Perform arithmetic operations like addition, subtraction, multiplication, and division.
Use printf() to display the result.
based on 11 reviews
Rating in categories
Associate
73.1k
salaries
| ₹5.1 L/yr - ₹14.5 L/yr |
Programmer Analyst
56.1k
salaries
| ₹2 L/yr - ₹9.2 L/yr |
Senior Associate
54.7k
salaries
| ₹8.4 L/yr - ₹28.6 L/yr |
Senior Processing Executive
29.7k
salaries
| ₹1.4 L/yr - ₹9 L/yr |
Technical Lead
18k
salaries
| ₹6 L/yr - ₹25.5 L/yr |
TCS
Infosys
Wipro
Accenture