Optum DataEngineer Interview Questions

By Karthik Kondpak

1) How do You execute your Spark Job ?

2) Do you have any knowledge , How memory and cpu cores are set for Executors?

3) what is Partition and Bucketing , How do you choose on which column partition to be applied?

4)Why Spark why not Hive ?

5)Write a Spark code , Whose names start with 'a' ?

6) What is Yarn ?

7)Difference Between final and Val keyword in scala ?

8) What is Case class in Scala ?

9) What are the Different Kind of Joins Spark Supports?

10) What is the Parent class or Super class to Int, Double Datatypes in Scala?

11) What are Hive External Tables?

12) can we apply rank() on Multiple Columns ?

13)Can we apply Indexing or index on HiveTable ?

14) What is Difference Between Rank() and Dense_Rank() ?

15) What happens if we drop a table in Hive ?

16) What not MapReduce and why hive ?

17) What is Difference between ETL/ELT ?

18) What is Difference between DataWarehouse and Databases?

19)Why Spark when you have Hive ?

20)Are there any situations where your team is not supportive to you?

21) What is the Biggest challenge you have faced in your journey?

22) Why not RDBMS why Bigdata ?

23) Any Optimizations in spark on which You have Worked on ?

24) How you will debug when you Spark Job fails ?

25) Any challenges u faced while building ur DataPipelines ?

26) What is you cluster size and how many executors are there in each node ?

27) Can you Explain your Project and day to day activities as a Dataengineer?

28) What if I want to add a new column in Hive ?

29) How spark Job runs behind the scenes?

30)How do you bring data from Staging Table to Target Table?

31) How do you Define Schema Explicitly in Spark ?

32) find the Second Highest number from a given list?