Welcome to TestSimulate

Pass Your Next Certification Exam Fast!

Everything you need to prepare, learn & pass your certification exam easily.

365 days free updates. First attempt guaranteed success.

Databricks Certified Associate Developer for Apache Spark 3.5 - Python (Associate-Developer-Apache-Spark-3.5) Free Practice Test

Question 1
A data engineer uses a broadcast variable to share a DataFrame containing millions of rows across executors for lookup purposes. What will be the outcome?

Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 2
49 of 55.
In the code block below, aggDF contains aggregations on a streaming DataFrame:
aggDF.writeStream \
.format("console") \
.outputMode("???") \
.start()
Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 3
A data engineer noticed improved performance after upgrading from Spark 3.0 to Spark 3.5. The engineer found that Adaptive Query Execution (AQE) was enabled.
Which operation is AQE implementing to improve performance?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 4
Given:
python
CopyEdit
spark.sparkContext.setLogLevel("<LOG_LEVEL>")
Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?

Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 5
An engineer has a large ORC file located at /file/test_data.orc and wants to read only specific columns to reduce memory usage.
Which code fragment will select the columns, i.e., col1, col2, during the reading process?

Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 6
A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately.
Which two characteristics of Apache Spark's execution model explain this behavior?
Choose 2 answers:

Correct Answer: B,E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 7
48 of 55.
A data engineer needs to join multiple DataFrames and has written the following code:
from pyspark.sql.functions import broadcast
data1 = [(1, "A"), (2, "B")]
data2 = [(1, "X"), (2, "Y")]
data3 = [(1, "M"), (2, "N")]
df1 = spark.createDataFrame(data1, ["id", "val1"])
df2 = spark.createDataFrame(data2, ["id", "val2"])
df3 = spark.createDataFrame(data3, ["id", "val3"])
df_joined = df1.join(broadcast(df2), "id", "inner") \
.join(broadcast(df3), "id", "inner")
What will be the output of this code?

Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 8
46 of 55.
A data engineer is implementing a streaming pipeline with watermarking to handle late-arriving records.
The engineer has written the following code:
inputStream \
.withWatermark("event_time", "10 minutes") \
.groupBy(window("event_time", "15 minutes"))
What happens to data that arrives after the watermark threshold?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 9
A data engineer is building an Apache Sparkā„¢ Structured Streaming application to process a stream of JSON events in real time. The engineer wants the application to be fault-tolerant and resume processing from the last successfully processed record in case of a failure. To achieve this, the data engineer decides to implement checkpoints.
Which code snippet should the data engineer use?

Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).