How would you cluster / partition this database?
Sr Data Engineer Interview Questions
2,443 sr data engineer interview questions shared by candidates
“What is data shuffling and how does it impact performance?” “What’s data skew?” “How would you resolve it?” “What is Kubernetes?”
2nd highest salary, broadcast join, catalyst optimizer, salting
some SQL problem that examined knowledge about the JOIN function
About project
What is your salary expectation?
They asked SQL and Python questions, the questions were challenging but would have been a fair play if we were given a code playground to test our queries and then submit the final answer.
A. Core Data Engineering Concepts SQL (joins, window functions, performance tuning) Data Modeling (star vs snowflake, normalization) ETL/ELT pipelines (batch vs streaming, orchestration tools like Airflow) B. Apache Spark / PySpark Catalyst Optimizer & Tungsten Narrow vs Wide transformations Joins (broadcast, sort-merge), Skew handling AQE (Adaptive Query Execution) Partitioning, Predicate Pushdown Execution Plan (DAG → Stage → Tasks) Spark UI and Job Debugging SCD Type 2 Implementation in PySpark C. AWS S3, Glue, Athena, Lambda, EMR, Redshift Event-driven design (S3 → EventBridge → Lambda) Security: IAM roles, bucket policies, encryption CI/CD in AWS (CodePipeline, CloudFormation) D. Python Writing modular, reusable code Working with Pandas, Boto3 (for AWS interaction) Exception handling, logging Lambda functions and decorators E. Kafka / Streaming Kafka topic partitioning, consumer groups Offset management Integration with Spark Structured Streaming
Pyspark memory optimization, different types of keys in SQL
About glue, lambda, some questions on python About chat gpt About tuple and list in python About Dynamo db
Viewing 1571 - 1580 interview questions