Welcome to TestSimulate

Pass Your Next Certification Exam Fast!

Everything you need to prepare, learn & pass your certification exam easily.

365 days free updates. First attempt guaranteed success.

Snowflake SnowPro Advanced: Data Scientist Certification (DSA-C03) Free Practice Test

Question 1
You are a data scientist working with a large dataset of customer transactions stored in Snowflake. You need to identify potential fraud using statistical summaries. Which of the following approaches would be MOST effective in identifying unusual spending patterns, considering the need for scalability and performance within Snowflake?

Correct Answer: A,E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 2
You've built a machine learning model in scikit-learn and want to deploy it to Snowflake for real-time inference. You have the following options for deploying the model. Select all that apply and are considered a best practice for cost and time optimization:

Correct Answer: A,C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 3
A healthcare provider has a Snowflake table 'MEDICAL RECORDS containing patient notes stored as unstructured text in a column called 'NOTE TEXT. They want to identify different patient groups based on the topics discussed in these notes. They aim to use a combination of unsupervised and supervised learning. Which of the following represents a robust workflow to achieve this goal?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 4
You are tasked with identifying fraudulent transactions in a large financial dataset stored in Snowflake using unsupervised learning. The dataset contains features like transaction amount, merchant ID, location, time, and user ID. You decide to use a combination of clustering and anomaly detection techniques. Which of the following steps and techniques would be MOST effective in achieving this goal while leveraging Snowflake's capabilities and minimizing false positives?

Correct Answer: A,B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 5
You are preparing a dataset in Snowflake for a K-means clustering algorithm. The dataset includes features like 'age', 'income' (in USD), and 'number of_transactions'. 'Income' has significantly larger values than 'age' and 'number of_transactions'. To ensure that all features contribute equally to the distance calculations in K-means, which of the following scaling approaches should you consider, and why? Select all that apply:

Correct Answer: B,D,E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 6
You're building a fraud detection model and want to determine if the average transaction amount for fraudulent transactions is significantly higher than the average transaction amount for legitimate transactions. You have two tables in Snowflake:
'FRAUDULENT TRANSACTIONS and 'LEGITIMATE TRANSACTIONS, both with a 'TRANSACTION AMOUNT column. You believe that FRAUDULENT TRANSACTIONS contains fewer than 30 transactions. You don't know the population standard deviations. What are the proper steps to conduct the hypothesis test, and what is the correct hypothesis statement?

Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 7
You are building a data science pipeline in Snowflake to predict customer churn. The pipeline involves extracting data, transforming it using Dynamic Tables, training a model using Snowpark ML, and deploying the model for inference. The raw data arrives in a Snowflake stage daily as Parquet files. You want to optimize the pipeline for cost and performance. Which of the following strategies are MOST effective, considering resource utilization and potential data staleness?

Correct Answer: A,E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 8
You are building a machine learning model using Snowflake data to predict customer churn. Your dataset includes a 'CUSTOMER TYPE column with the following possible values: 'New', 'Returning', and 'VIP'. You need to perform one-hot encoding on this column. Which of the following Snowflake SQL queries correctly implements one-hot encoding for the 'CUSTOMER TYPE column, creating separate binary columns for each customer type ('IS NEW', 'IS RETURNING', 'IS VIP')?

Correct Answer: C,D,E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 9
You have built an external function to train a PyTorch model using SageMaker. The model training process requires a significant amount of CPU and memory. The training data is passed from Snowflake to the external function in batches. The external function code in AWS Lambda is as follows:

The Snowflake external function is defined as follows:

During testing, you encounter '500 Internal Server Error' from the external function consistently. Upon inspection of the Lambda logs, you find messages indicating 'PayloadTooLargeError'. What is the most likely cause and how do you mitigate it within the context of Snowflake and AWS Lambda?

Correct Answer: E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 10
A data scientist is developing a model within a Snowpark Python environment to predict customer churn. They have established a Snowflake session and loaded data into a Snowpark DataFrame named 'customer data'. The feature engineering pipeline requires a custom Python function, 'calculate engagement_score', to be applied to each row. This function takes several columns as input and returns a single score representing customer engagement. The data scientist wants to apply this function in parallel across the entire DataFrame using Snowpark's UDF capabilities. The following code snippet is used to define and register the UDF:

When the UDF is called the above error is observed. What change needs to be applied to make the UDF work as expected?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 11
You are tasked with validating a regression model predicting customer lifetime value (CLTV). The model uses various customer attributes, including purchase history, demographics, and website activity, stored in a Snowflake table called 'CUSTOMER DATA. You want to assess the model's calibration specifically, whether the predicted CLTV values align with the actual observed CLTV values over time. Which of the following evaluation techniques would be MOST suitable for assessing the calibration of your CLTV regression model in Snowflake?

Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).