Difference between Structural databases and unstructured databases.
Big Data Engineer Interview Questions
898 big data engineer interview questions shared by candidates
i) Spark architecture ii) Difference between RDDs, DataFrames and Datasets iii) What is fault tolerance and how does spark handle it iv) Memory management and garbage collection in Spark v) One SparkSQL based question vi) One PySpark based question vii) ETL implementation in AWS Glue
Spark question to join two df's and add a new column based of salary. Also to replace null values in a row with 'ABC'.
What is hive
L1 -Techincal Inyroduce yourself What is your project What are your source data types ? -csv/RDBMS how you get it? How bigger was the client cluster? What was data size? Load was daily , weekly or month? Why client selected hadoop rather than RDBMS? Which tool for workfow? What is staging in spark? What is RDD? What is intention behing lazy evaluation? What is intention behind keeping RDD immutable/unable to update? You have written multiple tranformations on your RDD but still you have not fired any action. How your spark server WEB UI will look like? Suppose you fired action on on RDD what exactly happens internally in spark ?(Here I told about it goes backword 1 by 1 to created required RDD using lineage graph in backword direction and first RDD is calculated and again return back to action) Which are the transformation in spark? I have given you an RDD . how will you convert it to paired RDD uisng its first element as key? ans- RDD2=RDD1.map(lambda x:(x[1], x)) What is difference between hadoop 2X and 1X ? What is HA concept? What if Name node failed? What to do and who was doing in your project? What is heartbeats concept? I have file 500 MB on hadoop 2x .how much block and replicas will be there ? I have a file home_id product meter h1 p1 20 h1 p2 30 H2 p2 23 I want to create partitions with the key home id.How will do it on local file system without suing SPARK, HIVE ,MAP reduce. Use simle programing language like java/python. Later how will you do it in hive and spark? 21. I have an 3x3 ARRAY which is sorted 1 3 5 7 8 9 11 15 18 Write a program so that if use passed any element from terminal, it will return its exact position in array. (i did as below ) a=int[3][3] a=[(1,3,5),(7,8,9),(11,15,18)] x=int(std.input()) --user input For i in 1 to 3 For j 1 to 3 If x ==a[i][j] Then print(‘location of x in %i %j’,i,j) L2 : technical 1.there is file Name id Ajay 1 Ram 2 Ajay 3 Ram 4 Jack 6 Devid 7 ID is unique and Name might be repeatble. Write program so that user will enter name ‘ajay’ then program will return list of IDs -[1,3] Input Ram : output [2,4]
They asked SQL and Python questions, the questions were challenging but would have been a fair play if we were given a code playground to test our queries and then submit the final answer.
how much CTC you are currently having how much percent of hike you are getting
how to load file into dataframe
SQL queries on ground by and having
Data structures and Algorithms and dynamic programming along with complexity
Viewing 561 - 570 interview questions