Big data engineer Interview Questions

Data join question: The higher the Key Performance Indicator is, the better the performance of the tower. Please provide a solution to determine the best and worst performing tower, as well as the average tower performance per market. Datasets KPI Dataset = TOWERID, DATE, KPI Market Info = TOWERID, MARKET

Big Data Engineer

Interviewed at AT&T

3.3★

Oct 16, 2015

Data join question: The higher the Key Performance Indicator is, the better the performance of the tower. Please provide a solution to determine the best and worst performing tower, as well as the average tower performance per market. Datasets KPI Dataset = TOWERID, DATE, KPI Market Info = TOWERID, MARKET

implement a function(int[][] matrix, int rownum, int colnum) that prints a matrix spiraling out from a given index in a multi-threaded fashion

Big Data Engineer

Interviewed at Dataminr

3.6★

Apr 20, 2017

implement a function(int[][] matrix, int rownum, int colnum) that prints a matrix spiraling out from a given index in a multi-threaded fashion

Serialize and deserialize binary tree

Senior Big Data Engineer

Interviewed at Intuit

4.2★

Nov 28, 2018

Serialize and deserialize binary tree

Big Data Engineer II

Interviewed at Amazon

3.5★

May 18, 2021

1.SQL: **d_customers** +-------------+-----------------------+---------------------+ | customer_id | membership_start_date | membership_end_date | +-------------+-----------------------+---------------------+ | 114 | 2015-01-01 | 2015-02-15 | | 116 | 2015-02-01 | 2015-03-15 | | 120 | 2015-02-15 | 2015-04-01 | | 221 | 2015-03-15 | 2015-10-01 | | 120 | 2015-05-15 | 2015-07-01 | +-------------+-----------------------+---------------------+ **d_shipments** +-------------+------------+-----------------------+----------+ | shipment_id | ship_date | receiving_customer_id | quantity | +-------------+------------+-----------------------+----------+ | 1 | 2015-02-13 | 114 | 2 | | 2 | 2015-03-01 | 116 | 4 | | 2 | 2015-03-01 | 116 | 1 | | 3 | 2015-06-01 | 116 | 1 | | 4 | 2015-03-01 | 120 | 6 | | 5 | 2015-10-01 | 120 | 3 | | 6 | 2015-03-01 | 321 | 10 | +-------------+------------+-----------------------+----------+ Populate **a_shipments** +-----------+-----------+----------+----------+----------+ | ship_date | customer_id | is_member | quantity | +-----------+-----------+----------+----------+----------+ the column [is_member]: if [ship_date] is between [membership_start_date] and [membership_end_date] then 'y', else 'N' sample of otput: 2015-03-01 | 116 | Y | 5 | 2015-06-01 | 116 | N | 1 | 2. Coding task. Check whether a string is palindrome. I have been asked to code a solution by iterative and recursive approach. 3. Big Data questions: 3.1. What format of files in Hadoop do I know? What is a difference between Avro and Parquet format? 3.2. How compression is used in Avro and Parquet formats? 3.3. Most difficult big data performance challenges you have faced and resolved? 3.4. Spark optimization. Spark cost based optimizer

Very broad range of questions covering data engineering, data science, distributed computing, architecture... and specialties like record linkage / deduplication + multiple code exercises

Big Data Engineer

Interviewed at Salesforce

4.1★

Jan 8, 2024

Very broad range of questions covering data engineering, data science, distributed computing, architecture... and specialties like record linkage / deduplication + multiple code exercises

how you will manage your work is many tasks are pending at the same time.

Big Data Engineer

Interviewed at Munich Re

4.1★

Sep 18, 2018

how you will manage your work is many tasks are pending at the same time.

Basis question on spark,scala, project explain

Big Data Engineer

Interviewed at Cognizant

3.6★

Sep 27, 2023

Basis question on spark,scala, project explain

What is combinebykey SCD1 logic Different between edge node and data node Where the code will be deployed? (edge node or in cluster) YARN architecture What are all the versions of spark you have worked? Diff btw SchemaRDD and df Different ways to create dataframe what is bundle in oozie? fork action in oozie? distcp command how do you decide number of mappers in sqoop job? what is the optimal number of mappers provided there is no restriction in establishing connection to DB? how to do you pull clob,blob datatype in oracle to HDFS? semi join,anti-join in scala diff between logical plan and physical plan where can we see logical plan?

Big Data Engineer

Interviewed at Lowe's Home Improvement

3.5★

Jul 19, 2019

What is combinebykey SCD1 logic Different between edge node and data node Where the code will be deployed? (edge node or in cluster) YARN architecture What are all the versions of spark you have worked? Diff btw SchemaRDD and df Different ways to create dataframe what is bundle in oozie? fork action in oozie? distcp command how do you decide number of mappers in sqoop job? what is the optimal number of mappers provided there is no restriction in establishing connection to DB? how to do you pull clob,blob datatype in oracle to HDFS? semi join,anti-join in scala diff between logical plan and physical plan where can we see logical plan?

interview questions were mostly from experience and easy.

Big Data Engineer

Interviewed at Oracle

3.5★

Sep 2, 2019

interview questions were mostly from experience and easy.

All basic Hadoop and big data questions

Big Data Engineer

Interviewed at Optum

3.4★

Nov 20, 2019

All basic Hadoop and big data questions

Big Data Engineer Interview Questions

898 big data engineer interview questions shared by candidates

See Interview Questions for Similar Jobs