Free Sales Ending Soon - Use Real DP-201 PDF Questions [Jan 09, 2022]
Updated Jan-2022 Exam DP-201 Dumps - Pass Your Certification Exam
Microsoft Designing an Azure Data Solution Exam Certification Details:
| Schedule Exam | Pearson VUE |
| Passing Score | 700 / 1000 |
| Duration | 120 mins |
| Sample Questions | Microsoft Designing an Azure Data Solution Sample Questions |
| Exam Price | $165 (USD) |
| Exam Name | Microsoft Certified - Azure Data Engineer Associate |
| Books / Training | Course DP-201T01-A: Designing an Azure Data Solution |
| Number of Questions | 40-60 |
| Exam Code | DP-201 |
Preparation Materials and Resources
The Microsoft DP-201 exam is not easy and that is why you should take it with seriousness. If you want to pass the test at your first attempt, you need to devote enough time to preparation. It is recommended that you start studying for the exam with reviewing its objectives. Then you can proceed with the official preparation options available on the Microsoft webpage. The vendor offers the candidates two ways to prepare for the DP-201 test:
- Online learning: this training is available free of charge. There are several learning paths tackling various aspects of the Microsoft DP-201 exam.
- Instructor-led training: this is a paid course delivered under the guidance of the Microsoft authorized trainer. It lasts two days and is intended for the data professionals, data architects, as well as business intelligence professionals who want to enhance their expertise in data platform technologies available on the Microsoft Azure platform.
The students can use these training tools separately or in combination. Additionally, you can search for the relevant resources on other learning platforms.
Microsoft DP-201 Exam Syllabus Topics:
| Topic | Details |
|---|---|
Design Azure Data Storage Solutions (40-45%) | |
| Recommend an Azure data storage solution based on requirements | - choose the correct data storage solution to meet the technical and business requirements - choose the partition distribution type |
| Design non-relational cloud data stores | - design data distribution and partitions - design for scale (including multi-region, latency, and throughput) - design a solution that uses Cosmos DB, Data Lake Storage Gen2, or Blob storage - select the appropriate Cosmos DB API - design a disaster recovery strategy - design for high availability |
| Design relational cloud data stores | - design data distribution and partitions - design for scale (including latency and throughput) - design a solution that uses Azure Synapse Analytics - design a disaster recovery strategy - design for high availability |
Design Data Processing Solutions (25-30%) | |
| Design batch processing solutions | - design batch processing solutions that use Data Factory and Azure Databricks - identify the optimal data ingestion method for a batch processing solution - identify where processing should take place, such as at the source, at the destination, or in transit |
| Design real-time processing solutions | - design for real-time processing by using Stream Analytics and Azure Databricks - design and provision compute resources |
Design for Data Security and Compliance (25-30%) | |
| Design security for source data access | - plan for secure endpoints (private/public) - choose the appropriate authentication mechanism, such as access keys, shared access signatures (SAS), and Azure Active Directory (Azure AD) |
| Design security for data policies and standards | - design data encryption for data at rest and in transit - design for data auditing and data masking - design for data privacy and data classification - design a data retention policy - plan an archiving strategy - plan to purge data based on business requirements |
NEW QUESTION 60
You have an on-premises data warehouse that includes the following fact tables. Both tables have the following columns: DataKey, ProductKey, RegionKey. There are 120 unique product keys and 65 unique region keys.
Queries that use the data warehouse take a long time to complete.
You plan to migrate the solution to use Azure SQL Data Warehouse. You need to ensure that the Azure-based solution optimizes query performance and minimizes processing skew.
What should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation
Box 1: Hash-distributed
Box 2: ProductKey
ProductKey is used extensively in joins.
Hash-distributed tables improve query performance on large fact tables.
Box 3: Round-robin
Box 4: RegionKey
Round-robin tables are useful for improving loading speed.
Consider using the round-robin distribution for your table in the following scenarios:
* When getting started as a simple starting point since it is the default
* If there is no obvious joining key
* If there is not good candidate column for hash distributing the table
* If the table does not share a common join key with other tables
* If the join is less significant than other joins in the query
* When the table is a temporary staging table
Note: A distributed table appears as a single table, but the rows are actually stored across 60 distributions. The rows are distributed with a hash or round-robin algorithm.
References:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-distribute
NEW QUESTION 61
A company stores large datasets in Azure, including sales transactions and customer account information.
You must design a solution to analyze the data. You plan to create the following HDInsight clusters:
You need to ensure that the clusters support the query requirements.
Which cluster types should you recommend? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation
Box 1: Interactive Query
Choose Interactive Query cluster type to optimize for ad hoc, interactive queries.
Box 2: Hadoop
Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process.
Note: In Azure HDInsight, there are several cluster types and technologies that can run Apache Hive queries.
When you create your HDInsight cluster, choose the appropriate cluster type to help optimize performance for your workload needs.
For example, choose Interactive Query cluster type to optimize for ad hoc, interactive queries. Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process. Spark and HBase cluster types can also run Hive queries.
References:
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/hdinsight-hadoop-optimize-hive-query?toc=%2Fko-kr%2F
NEW QUESTION 62
You need to design a solution that meets the business requirements of Health Insights.
What should you include in the recommendation?
- A. Azure Cosmos DB that uses the SQL API
- B. Azure Cosmos DB that uses the Gremlin
- C. Azure Databricks
- D. Azure Data Factory
Answer: C
NEW QUESTION 63
You design data engineering solutions for a company that has locations around the world. You plan to deploy a large set of data to Azure Cosmos DB.
The data must be accessible from all company locations.
You need to recommend a strategy for deploying the data that minimizes latency for data read operations and minimizes costs.
What should you recommend?
- A. Use multiple Azure Cosmos DB accounts. Enable multi-region writes.
- B. Use a single Azure Cosmos DB account Configure data replication.
- C. Use a single Azure Cosmos DB account. Enable geo-redundancy.
- D. Use multiple Azure Cosmos DB accounts. For each account, configure the location to the closest Azure datacenter.
- E. Use a single Azure Cosmos DB account. Enable multi-region writes.
Answer: E
Explanation:
Explanation
With Azure Cosmos DB, you can add or remove the regions associated with your account at any time.
Multi-region accounts configured with multiple-write regions will be highly available for both writes and reads. Regional failovers are instantaneous and don't require any changes from the application.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/high-availability
NEW QUESTION 64
You need to design the authentication and authorization methods for sensors.
What should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation:
Sensor data must be stored in a Cosmos DB named treydata in a collection named SensorData Sensors must have permission only to add items to the SensorData collection Box 1: Resource Token Resource tokens provide access to the application resources within a Cosmos DB database.
Enable clients to read, write, and delete resources in the Cosmos DB account according to the permissions they've been granted.
Box 2: Cosmos DB user
You can use a resource token (by creating Cosmos DB users and permissions) when you want to provide access to resources in your Cosmos DB account to a client that cannot be trusted with the master key.
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/secure-access-to-data
NEW QUESTION 65
You need to ensure that performance requirements for Backtrack reports are met.
What should you recommend? To answer, drag the appropriate technologies to the correct locations. Each technology may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
References:
https://docs.microsoft.com/en-us/azure/cosmos-db/index-policy
https://docs.microsoft.com/en-us/azure/cosmos-db/time-to-live
Topic 2, The company identifies the following business requirements:
Requirements
Business
The company identifies the following business requirements:
You must transfer all images and customer data to cloud storage and remove on-premises servers.
You must develop an analytical processing solution for transforming customer data.
You must develop an image object and color tagging solution.
Capital expenditures must be minimized.
Cloud resource costs must be minimized.
Technical
The solution has the following technical requirements:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.
Image data must be stored in a single data store at minimum cost.
Customer data must be analyzed using managed Spark clusters.
Power BI must be used to visualize transformed customer data.
All data must be backed up in case disaster recovery is required.
Security and optimization
All cloud data must be encrypted at rest and in transit. The solution must support:
parallel processing of customer data
hyper-scale storage of images
global region data replication of processed image data
NEW QUESTION 66
What should you do to improve high availability of the real-time data processing solution?
- A. Deploy identical Azure Stream Analytics jobs to paired regions in Azure.
- B. Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.
- C. Set Data Lake Storage to use geo-redundant storage (GRS).
- D. Deploy a High Concurrency Databricks cluster.
Answer: A
Explanation:
Guarantee Stream Analytics job reliability during service updates
Part of being a fully managed service is the capability to introduce new service functionality and improvements at a rapid pace. As a result, Stream Analytics can have a service update deploy on a weekly (or more frequent) basis. No matter how much testing is done there is still a risk that an existing, running job may break due to the introduction of a bug. If you are running mission critical jobs, these risks need to be avoided. You can reduce this risk by following Azure's paired region model.
Scenario: The application development team will create an Azure event hub to receive real-time sales data, including store number, date, time, product ID, customer loyalty number, price, and discount amount, from the point of sale (POS) system and output the data to data storage in Azure Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-job-reliability Design for high availability and disaster recovery Testlet 2 Case study This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Overview
You develop data engineering solutions for Graphics Design Institute, a global media company with offices in New York City, Manchester, Singapore, and Melbourne.
The New York office hosts SQL Server databases that stores massive amounts of customer data. The company also stores millions of images on a physical server located in the New York office. More than 2 TB of image data is added each day. The images are transferred from customer devices to the server in New York.
Many images have been placed on this server in an unorganized manner, making it difficult for editors to search images. Images should automatically have object and color tags generated. The tags must be stored in a document database, and be queried by SQL.
You are hired to design a solution that can store, transform, and visualize customer data.
Requirements
Business
The company identifies the following business requirements:
* You must transfer all images and customer data to cloud storage and remove on-premises servers.
* You must develop an analytical processing solution for transforming customer data.
* You must develop an image object and color tagging solution.
* Capital expenditures must be minimized.
* Cloud resource costs must be minimized.
Technical
The solution has the following technical requirements:
* Tagging data must be uploaded to the cloud from the New York office location.
* Tagging data must be replicated to regions that are geographically close to company office locations.
* Image data must be stored in a single data store at minimum cost.
* Customer data must be analyzed using managed Spark clusters.
* Power BI must be used to visualize transformed customer data.
* All data must be backed up in case disaster recovery is required.
Security and optimization
All cloud data must be encrypted at rest and in transit. The solution must support:
* parallel processing of customer data
* hyper-scale storage of images
* global region data replication of processed image data
NEW QUESTION 67
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure SQL Database that will use elastic pools. You plan to store data about customers in a table. Each record uses a value for CustomerID.
You need to recommend a strategy to partition data based on values in CustomerID.
Proposed Solution: Separate data into shards by using horizontal partitioning.
Does the solution meet the goal?
- A. No
- B. Yes
Answer: B
Explanation:
Explanation/Reference:
Explanation:
Horizontal Partitioning - Sharding: Data is partitioned horizontally to distribute rows across a scaled out data tier. With this approach, the schema is identical on all participating databases. This approach is also called "sharding". Sharding can be performed and managed using (1) the elastic database tools libraries or (2) self-sharding. An elastic query is used to query or compile reports across many shards.
References:
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-query-overview
NEW QUESTION 68
You need to recommend a solution for storing customer data.
What should you recommend?
- A. Azure SQL Database
- B. Azure Synapse Analytics
- C. Azure Databricks
- D. Azure Stream Analytics
Answer: C
Explanation:
From the scenario:
Customer data must be analyzed using managed Spark clusters.
All cloud data must be encrypted at rest and in transit. The solution must support: parallel processing of customer data.
Reference:
https://www.microsoft.com/developerblog/2019/01/18/running-parallel-apache-spark-notebook-workloads-on- azure-databricks/ Design Azure data storage solutions Testlet 3 Case study This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Background
Current environment
The company has the following virtual machines (VMs):
Requirements
Storage and processing
You must be able to use a file system view of data stored in a blob.
You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store. The architecture will need to support data files, libraries, and images. Additionally, it must provide a web-based interface to documents that contain runnable command, visualizations, and narrative text such as a notebook.
CONT_SQL3 requires an initial scale of 35000 IOPS.
CONT_SQL1 and CONT_SQL2 must use the vCore model and should include replicas. The solution must support 8000 IOPS.
The storage should be configured to optimized storage for database OLTP workloads.
Migration
* You must be able to independently scale compute and storage resources.
* You must migrate all SQL Server workloads to Azure. You must identify related machines in the on- premises environment, get disk size data usage information.
* Data from SQL Server must include zone redundant storage.
* You need to ensure that app components can reside on-premises while interacting with components that run in the Azure public cloud.
* SAP data must remain on-premises.
* The Azure Site Recovery (ASR) results should contain per-machine data.
Business requirements
* You must design a regional disaster recovery topology.
* The database backups have regulatory purposes and must be retained for seven years.
* CONT_SQL1 stores customers sales data that requires ETL operations for data analysis. A solution is required that reads data from SQL, performs ETL, and outputs to Power BI. The solution should use managed clusters to minimize costs. To optimize logistics, Contoso needs to analyze customer sales data to see if certain products are tied to specific times in the year.
* The analytics solution for customer sales data must be available during a regional outage.
Security and auditing
* Contoso requires all corporate computers to enable Windows Firewall.
* Azure servers should be able to ping other Contoso Azure servers.
* Employee PII must be encrypted in memory, in motion, and at rest. Any data encrypted by SQL Server must support equality searches, grouping, indexing, and joining on the encrypted data.
* Keys must be secured by using hardware security modules (HSMs).
* CONT_SQL3 must not communicate over the default ports
Cost
* All solutions must minimize cost and resources.
* The organization does not want any unexpected charges.
* The data engineers must set the SQL Data Warehouse compute resources to consume 300 DWUs.
* CONT_SQL2 is not fully utilized during non-peak hours. You must minimize resource costs for during non- peak hours.
NEW QUESTION 69
You need to design a telemetry data solution that supports the analysis of log files in real time.
Which two Azure services should you include in the solution? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
- A. Azure IoT Hub
- B. Azure Databricks
- C. Azure Data Factory
- D. Azure Event Hubs
- E. Azure Data Lake Storage Gent 2
Answer: B,D
Explanation:
You connect a data ingestion system with Azure Databricks to stream data into an Apache Spark cluster in near real-time. You set up data ingestion system using Azure Event Hubs and then connect it to Azure Databricks to process the messages coming through.
Note: Azure Event Hubs is a highly scalable data streaming platform and event ingestion service, capable of receiving and processing millions of events per second. Event Hubs can process and store events, data, or telemetry produced by distributed software and devices. Data sent to an event hub can be transformed and stored using any real-time analytics provider or batching/storage adapters.
Reference:
https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-stream-from-eventhubs
NEW QUESTION 70
A company stores large datasets in Azure, including sales transactions and customer account information.
You must design a solution to analyze the data. You plan to create the following HDInsight clusters:
You need to ensure that the clusters support the query requirements.
Which cluster types should you recommend? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation
Box 1: Interactive Query
Choose Interactive Query cluster type to optimize for ad hoc, interactive queries.
Box 2: Hadoop
Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process.
Note: In Azure HDInsight, there are several cluster types and technologies that can run Apache Hive queries.
When you create your HDInsight cluster, choose the appropriate cluster type to help optimize performance for your workload needs.
For example, choose Interactive Query cluster type to optimize for ad hoc, interactive queries. Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process. Spark and HBase cluster types can also run Hive queries.
References:
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/hdinsight-hadoop-optimize-hive-query?toc=%2Fko-kr%2F
NEW QUESTION 71
You are designing a Spark job that performs batch processing of daily web log traffic.
When you deploy the job in the production environment, it must meet the following requirements:
* Run once a day.
* Display status information on the company intranet as the job runs.
You need to recommend technologies for triggering and monitoring jobs.
Which technologies should you recommend? To answer, drag the appropriate technologies to the correct locations. Each technology may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation
Box 1: Livy
You can use Livy to run interactive Spark shells or submit batch jobs to be run on Spark.
Box 2: Beeline
Apache Beeline can be used to run Apache Hive queries on HDInsight. You can use Beeline with Apache Spark.
Note: Beeline is a Hive client that is included on the head nodes of your HDInsight cluster. Beeline uses JDBC to connect to HiveServer2, a service hosted on your HDInsight cluster. You can also use Beeline to access Hive on HDInsight remotely over the internet.
References:
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-beeline
NEW QUESTION 72
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to store delimited text files in an Azure Data Lake Storage account that will be organized into department folders.
You need to configure data access so that users see only the files in their respective department folder.
Solution: From the storage account, you enable a hierarchical namespace, and you use access control lists (ACLs).
Does this meet the goal?
- A. No
- B. Yes
Answer: A
Explanation:
Azure Data Lake Storage implements an access control model that derives from HDFS, which in turn derives from the POSIX access control model.
Blob container ACLs does not support the hierarchical namespace, so it must be disabled.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues
NEW QUESTION 73
You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure SQL Data Warehouse.
You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained.
What should you recommend?
- A. CSV
- B. Avro
- C. JSON
- D. Parquet
Answer: B
Explanation:
Explanation
The Avro format is great for data and message preservation.
Avro schema with its support for evolution is essential for making the data robust for streaming architectures like Kafka, and with the metadata that schema provides, you can reason on the data. Having a schema provides robustness in providing meta-data about the data stored in Avro records which are self-documenting the data.
References:
http://cloudurable.com/blog/avro/index.html
NEW QUESTION 74
You need to optimize storage for CONT_SQL3.
What should you recommend?
- A. General
- B. Data warehousing
- C. Transactional processing
- D. AlwaysOn
Answer: C
Explanation:
CONT_SQL3 with the SQL Server role, 100 GB database size, Hyper-VM to be migrated to Azure VM.
The storage should be configured to optimized storage for database OLTP workloads.
Azure SQL Database provides three basic in-memory based capabilities (built into the underlying database engine) that can contribute in a meaningful way to performance improvements:
In-Memory Online Transactional Processing (OLTP)
Clustered columnstore indexes intended primarily for Online Analytical Processing (OLAP) workloads Nonclustered columnstore indexes geared towards Hybrid Transactional/Analytical Processing (HTAP) workloads References:
https://www.databasejournal.com/features/mssql/overview-of-in-memory-technologies-of-azure-sqldatabase.html
NEW QUESTION 75
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You schedule an Azure Databricks job that executes an R notebook, and then inserts the data into the data warehouse.
Does this meet the goal?
- A. No
- B. Yes
Answer: A
Explanation:
Explanation
You should use an Azure Data Factory, not an Azure Databricks job.
Reference:
https://docs.microsoft.com/en-US/azure/data-factory/transform-data
NEW QUESTION 76
You have an Azure Data Lake Storage account that has a virtual network service endpoint configured.
You plan to use Azure Data Factory to extract data from the Data Lake Storage account. The data will then be loaded to a data warehouse in Azure Synapse Analytics by using PolyBase.
Which authentication method should you use to access Data Lake Storage?
- A. managed identity authentication
- B. shared access key authentication
- C. service principal authentication
- D. account key authentication
Answer: A
Explanation:
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-sql-data-warehouse#use-polybase-to-load-d
NEW QUESTION 77
A company stores large datasets in Azure, including sales transactions and customer account information.
You must design a solution to analyze the data. You plan to create the following HDInsight clusters:
You need to ensure that the clusters support the query requirements.
Which cluster types should you recommend? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation
Box 1: Interactive Query
Choose Interactive Query cluster type to optimize for ad hoc, interactive queries.
Box 2: Hadoop
Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process.
Note: In Azure HDInsight, there are several cluster types and technologies that can run Apache Hive queries.
When you create your HDInsight cluster, choose the appropriate cluster type to help optimize performance for your workload needs.
For example, choose Interactive Query cluster type to optimize for ad hoc, interactive queries. Choose Apache Hadoop cluster type to optimize for Hive queries used as a batch process. Spark and HBase cluster types can also run Hive queries.
References:
https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/hdinsight-hadoop-optimize-hive-query?toc=%2Fko-kr%2F
NEW QUESTION 78
You plan to implement an Azure Data Lake Gen2 storage account.
You need to ensure that the data lake will remain available if a data center fails in the primary Azure region.
The solution must minimize costs.
Which type of replication should you use for the storage account?
- A. geo-redundant storage (GRS)
- B. geo-zone-redundant storage (GZRS)
- C. locally-redundant storage (LRS)
- D. zone-redundant storage (ZRS)
Answer: A
Explanation:
Explanation
Geo-redundant storage (GRS) copies your data synchronously three times within a single physical location in the primary region using LRS. It then copies your data asynchronously to a single physical location in the secondary region.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy
NEW QUESTION 79
You need to recommend a storage solution for a sales system that will receive thousands of small files per minute. The files will be in JSON, text, and CSV formats. The files will be processed and transformed before they are loaded into an Azure data warehouse. The files must be stored and secured in folders.
Which storage solution should you recommend?
- A. Azure SQL Database
- B. Azure Data Lake Storage Gen2
- C. Azure Blob storage
- D. Azure Cosmos DB
Answer: B
Explanation:
Explanation
Azure provides several solutions for working with CSV and JSON files, depending on your needs. The primary landing place for these files is either Azure Storage or Azure Data Lake Store.1 Azure Data Lake Storage is an optimized storage for big data analytics workloads.
NEW QUESTION 80
You are planning a solution that combines log data from multiple systems. The log data will be downloaded from an API and stored in a data store.
You plan to keep a copy of the raw data as well as some transformed versions of the data. You expect that there will be at least 2 TB of log files. The data will be used by data scientists and applications.
You need to recommend a solution to store the data in Azure. The solution must minimize costs.
What storage solution should you recommend?
- A. Azure SQL Database
- B. Azure Data Lake Storage Gen2
- C. Azure Synapse Analytics
- D. Azure Cosmos DB
Answer: B
Explanation:
To land the data in Azure storage, you can move it to Azure Blob storage or Azure Data Lake Store Gen2. In either location, the data should be stored in text files. PolyBase and the COPY statement can load from either location.
Incorrect Answers:
B: Azure Synapse Analytics, uses distributed query processing architecture that takes advantage of the scalability and flexibility of compute and storage resources. Use Azure Synapse Analytics transform and move the data.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading
NEW QUESTION 81
You are designing an application that will store petabytes of medical imaging data When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution must minimize costs.
Which storage tier should you use for each time frame? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Explanation:
First week: Hot
Hot - Optimized for storing data that is accessed frequently.
After one month: Cool
Cool - Optimized for storing data that is infrequently accessed and stored for at least 30 days.
After one year: Cool
Incorrect Answers:
Archive: Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements (on the order of hours).
References:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers
NEW QUESTION 82
You need to recommend the appropriate storage and processing solution?
What should you recommend?
- A. Flush the blob cache using Windows PowerShell.
- B. Configure the reading speed using Azure Data Studio.
- C. Enable auto-shrink on the database.
- D. Enable Apache Spark RDD (RDD) caching.
- E. Enable Databricks IO (DBIO) caching.
Answer: D
Explanation:
Explanation/Reference:
Explanation:
Scenario: You must be able to use a file system view of data stored in a blob. You must build an
architecture that will allow Contoso to use the DB FS filesystem layer over a blob store.
Databricks File System (DBFS) is a distributed file system installed on Azure Databricks clusters. Files in
DBFS persist to Azure Blob storage, so you won't lose data even after you terminate a cluster.
The Databricks Delta cache, previously named Databricks IO (DBIO) caching, accelerates data reads by
creating copies of remote files in nodes' local storage using a fast intermediate data format. The data is
cached automatically whenever a file has to be fetched from a remote location. Successive reads of the
same data are then performed locally, which results in significantly improved reading speed.
References:
https://docs.databricks.com/delta/delta-cache.html#delta-cache
Question Set 4
NEW QUESTION 83
......
DP-201 Dumps To Pass Azure Data Engineer Associate Exam in One Day : https://www.testsimulate.com/DP-201-study-materials.html