Big data engineer Interview Questions

Difference between Structural databases and unstructured databases.

Big Data Engineer

Interviewed at Fidelity Investments

4.1★

May 6, 2018

Difference between Structural databases and unstructured databases.

i) Spark architecture ii) Difference between RDDs, DataFrames and Datasets iii) What is fault tolerance and how does spark handle it iv) Memory management and garbage collection in Spark v) One SparkSQL based question vi) One PySpark based question vii) ETL implementation in AWS Glue

Big Data Engineer

Interviewed at EPAM Systems

4★

May 21, 2024

i) Spark architecture ii) Difference between RDDs, DataFrames and Datasets iii) What is fault tolerance and how does spark handle it iv) Memory management and garbage collection in Spark v) One SparkSQL based question vi) One PySpark based question vii) ETL implementation in AWS Glue

Spark question to join two df's and add a new column based of salary. Also to replace null values in a row with 'ABC'.

Big Data Engineer

Interviewed at Impetus Technologies

3.7★

Jan 5, 2022

Spark question to join two df's and add a new column based of salary. Also to replace null values in a row with 'ABC'.

What is hive

Big Data Engineer QA

Interviewed at Impetus Technologies

3.7★

May 14, 2016

What is hive

L1 -Techincal Inyroduce yourself What is your project What are your source data types ? -csv/RDBMS how you get it? How bigger was the client cluster? What was data size? Load was daily , weekly or month? Why client selected hadoop rather than RDBMS? Which tool for workfow? What is staging in spark? What is RDD? What is intention behing lazy evaluation? What is intention behind keeping RDD immutable/unable to update? You have written multiple tranformations on your RDD but still you have not fired any action. How your spark server WEB UI will look like? Suppose you fired action on on RDD what exactly happens internally in spark ?(Here I told about it goes backword 1 by 1 to created required RDD using lineage graph in backword direction and first RDD is calculated and again return back to action) Which are the transformation in spark? I have given you an RDD . how will you convert it to paired RDD uisng its first element as key? ans- RDD2=RDD1.map(lambda x:(x[1], x)) What is difference between hadoop 2X and 1X ? What is HA concept? What if Name node failed? What to do and who was doing in your project? What is heartbeats concept? I have file 500 MB on hadoop 2x .how much block and replicas will be there ? I have a file home_id product meter h1 p1 20 h1 p2 30 H2 p2 23 I want to create partitions with the key home id.How will do it on local file system without suing SPARK, HIVE ,MAP reduce. Use simle programing language like java/python. Later how will you do it in hive and spark? 21. I have an 3x3 ARRAY which is sorted 1 3 5 7 8 9 11 15 18 Write a program so that if use passed any element from terminal, it will return its exact position in array. (i did as below ) a=int[3][3] a=[(1,3,5),(7,8,9),(11,15,18)] x=int(std.input()) --user input For i in 1 to 3 For j 1 to 3 If x ==a[i][j] Then print(‘location of x in %i %j’,i,j) L2 : technical 1.there is file Name id Ajay 1 Ram 2 Ajay 3 Ram 4 Jack 6 Devid 7 ID is unique and Name might be repeatble. Write program so that user will enter name ‘ajay’ then program will return list of IDs -[1,3] Input Ram : output [2,4]

Big Data Engineer

Interviewed at Persistent Systems

4.2★

Oct 28, 2018

L1 -Techincal Inyroduce yourself What is your project What are your source data types ? -csv/RDBMS how you get it? How bigger was the client cluster? What was data size? Load was daily , weekly or month? Why client selected hadoop rather than RDBMS? Which tool for workfow? What is staging in spark? What is RDD? What is intention behing lazy evaluation? What is intention behind keeping RDD immutable/unable to update? You have written multiple tranformations on your RDD but still you have not fired any action. How your spark server WEB UI will look like? Suppose you fired action on on RDD what exactly happens internally in spark ?(Here I told about it goes backword 1 by 1 to created required RDD using lineage graph in backword direction and first RDD is calculated and again return back to action) Which are the transformation in spark? I have given you an RDD . how will you convert it to paired RDD uisng its first element as key? ans- RDD2=RDD1.map(lambda x:(x[1], x)) What is difference between hadoop 2X and 1X ? What is HA concept? What if Name node failed? What to do and who was doing in your project? What is heartbeats concept? I have file 500 MB on hadoop 2x .how much block and replicas will be there ? I have a file home_id product meter h1 p1 20 h1 p2 30 H2 p2 23 I want to create partitions with the key home id.How will do it on local file system without suing SPARK, HIVE ,MAP reduce. Use simle programing language like java/python. Later how will you do it in hive and spark? 21. I have an 3x3 ARRAY which is sorted 1 3 5 7 8 9 11 15 18 Write a program so that if use passed any element from terminal, it will return its exact position in array. (i did as below ) a=int[3][3] a=[(1,3,5),(7,8,9),(11,15,18)] x=int(std.input()) --user input For i in 1 to 3 For j 1 to 3 If x ==a[i][j] Then print(‘location of x in %i %j’,i,j) L2 : technical 1.there is file Name id Ajay 1 Ram 2 Ajay 3 Ram 4 Jack 6 Devid 7 ID is unique and Name might be repeatble. Write program so that user will enter name ‘ajay’ then program will return list of IDs -[1,3] Input Ram : output [2,4]

They asked SQL and Python questions, the questions were challenging but would have been a fair play if we were given a code playground to test our queries and then submit the final answer.

Senior Big Data Engineer

Interviewed at dunnhumby

4.8★

Jun 4, 2025

They asked SQL and Python questions, the questions were challenging but would have been a fair play if we were given a code playground to test our queries and then submit the final answer.

how much CTC you are currently having how much percent of hike you are getting

Big Data Engineer

Interviewed at T-Systems

3.7★

Feb 12, 2019

how much CTC you are currently having how much percent of hike you are getting

how to load file into dataframe

Big Data Engineer

Interviewed at Impetus Technologies

3.7★

Jun 2, 2021

how to load file into dataframe

SQL queries on ground by and having

Big Data Engineer

Interviewed at Impetus Technologies

3.7★

Jun 2, 2021

SQL queries on ground by and having

Data structures and Algorithms and dynamic programming along with complexity

Senior Big Data Engineer

Interviewed at NetApp

3.8★

Jul 10, 2021

Data structures and Algorithms and dynamic programming along with complexity

Big Data Engineer Interview Questions

898 big data engineer interview questions shared by candidates

See Interview Questions for Similar Jobs