Free Professional-Data-Engineer Exam Braindumps

Pass your Google Professional Data Engineer Exam exam with these free Questions and Answers

Page 15 of 54
QUESTION 66

- (Exam Topic 5)
Which methods can be used to reduce the number of rows processed by BigQuery?

  1. A. Splitting tables into multiple tables; putting data in partitions
  2. B. Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
  3. C. Putting data in partitions; using the LIMIT clause
  4. D. Splitting tables into multiple tables; using the LIMIT clause

Correct Answer: A
If you split a table into multiple tables (such as one table for each day), then you can limit your query to the data in specific tables (such as for particular days). A better method is to use a partitioned table, as long as your data can be separated by the day.
If you use the LIMIT clause, BigQuery will still process the entire table. Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

QUESTION 67

- (Exam Topic 6)
You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster’s local Hadoop Distributed File System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

  1. A. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDF
  2. B. Mount the Hive tables locally.
  3. C. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluste
  4. D. Mount the Hive tables locally.
  5. E. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluste
  6. F. Then run the Hadoop utility to copy them do HDF
  7. G. Mount the Hive tables from HDFS.
  8. H. Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive table
  9. I. Replicate external Hive tables to the native ones.
  10. J. Load the ORC files into BigQuer
  11. K. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive table
  12. L. Replicate external Hive tables to the native ones.

Correct Answer: BC

QUESTION 68

- (Exam Topic 5)
Which of the following statements about Legacy SQL and Standard SQL is not true?

  1. A. Standard SQL is the preferred query language for BigQuery.
  2. B. If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
  3. C. One difference between the two query languages is how you specify fully-qualified table names (i.
  4. D. table names that include their associated project name).
  5. E. You need to set a query language for each dataset and the default is Standard SQL.

Correct Answer: D
You do not set a query language for each dataset. It is set each time you run a query and the default query language is Legacy SQL.
Standard SQL has been the preferred query language since BigQuery 2.0 was released.
In legacy SQL, to query a table with a project-qualified name, you use a colon, :, as a separator. In standard SQL, you use a period, ., instead.
Due to the differences in syntax between the two query languages (such as with project-qualified table names), if you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
Reference:
https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

QUESTION 69

- (Exam Topic 5)
Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

  1. A. dataflow.worker
  2. B. dataflow.compute
  3. C. dataflow.developer
  4. D. dataflow.viewer

Correct Answer: A
The dataflow.worker role provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline
Reference: https://cloud.google.com/dataflow/access-control

QUESTION 70

- (Exam Topic 1)
Your company’s customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

  1. A. Add a node to the MySQL cluster and build an OLAP cube there.
  2. B. Use an ETL tool to load the data from MySQL into Google BigQuery.
  3. C. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
  4. D. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Correct Answer: C

Page 15 of 54

Post your Comments and Discuss Google Professional-Data-Engineer exam with other Community members: