
Professional-Data-Engineer Free Study Guide! with New Update 373 Exam Questions
Get up-to-date Real Exam Questions for Professional-Data-Engineer UPDATED [2024]
NEW QUESTION # 184
You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?
- A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.
- B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
- C. Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.
- D. Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
Answer: D
Explanation:
Explanation/Reference:
NEW QUESTION # 185
You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?
- A. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type. the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.
- B. Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP type. Reload the data.
- C. Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values. the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.
- D. Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.
- E. Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numeric values from the column TS for each row. the column TS instead of the column DT from now on.
Answer: D
NEW QUESTION # 186
You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longtitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.
What should you do?
- A. Provide latitude and longtitude as input vectors to your neural net.
- B. Create a feature cross of latitude and longtitude, bucketize at the minute level and use L1 regularization during optimization.
- C. Create a numeric column from a feature cross of latitude and longtitude.
- D. Create a feature cross of latitude and longtitude, bucketize it at the minute level and use L2 regularization during optimization.
Answer: B
Explanation:
Reference https://cloud.google.com/bigquery/docs/gis-dataa
NEW QUESTION # 187
You are building a streaming Dataflow pipeline that ingests noise level data from hundreds of sensors placed near construction sites across a city. The sensors measure noise level every ten seconds, and send that data to the pipeline when levels reach above 70 dBA.
You need to detect the average noise level from a sensor when data is received for a duration of more than 30 minutes, but the window ends when no data has been received for 15 minutes What should you do?
- A. Use tumbling windows with a 15-mmute window and a fifteen-minute. withAllowedLateness operator.
- B. Use session windows with a 30-mmute gap duration.
- C. Use hopping windows with a 15-mmute window, and a thirty-minute period.
- D. Use session windows with a 15-minute gap duration.
Answer: A
Explanation:
Session windows are dynamic windows that group elements based on the periods of activity. They are useful for streaming data that is irregularly distributed with respect to time. In this case, the noise level data from the sensors is only sent when it exceeds a certain threshold, and the duration of the noise events may vary. Therefore, session windows can capture the average noise level for each sensor during the periods of high noise, and end the window when there is no data for a specified gap duration. The gap duration should be 15 minutes, as the requirement is to end the window when no data has been received for 15 minutes. A 30-minute gap duration would be too long and may miss some noise events that are shorter than 30 minutes. Tumbling windows and hopping windows are fixed windows that group elements based on a fixed time interval. They are not suitable for this use case, as they may split or overlap the noise events from the sensors, and do not account for the periods of inactivity. Reference:
Windowing concepts
Session windows
Windowing in Dataflow
NEW QUESTION # 188
Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)
- A. A good use for the wide and deep model is a recommender system.
- B. A good use for the wide and deep model is a small-scale linear regression problem.
- C. The wide model is used for memorization, while the deep model is used for generalization.
- D. The wide model is used for generalization, while the deep model is used for memorization.
Answer: A,C
Explanation:
Can we teach computers to learn like humans do, by combining the power of memorization and generalization? It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.
Reference: https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html
NEW QUESTION # 189
Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer dat
a. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have
asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?
- A. Increase the size of the Google Persistent Disk on your server.
- B. Increase the CPU size on your server.
- C. Increase your network bandwidth from Compute Engine to Cloud Storage.
- D. Increase your network bandwidth from your datacenter to GCP.
Answer: D
NEW QUESTION # 190
You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data.
How can you adjust your application design?
- A. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.
- B. Re-write the application to load accumulated data every 2 minutes.
- C. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.
- D. Convert the streaming insert code to batch load for individual messages.
Answer: A
NEW QUESTION # 191
You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average
200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?
- A. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
- B. Increase the size of your parquet files to ensure them to be 1 GB minimum.
- C. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.
- D. Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.
Answer: A
NEW QUESTION # 192
You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?
- A. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.
- B. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.
- C. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.
- D. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.
Answer: D
Explanation:
highly available = multi-regional:
https://cloud.google.com/bigquery/docs/locations
recovery strategy of this data that minimizes cost = point-in-time snapshot:
https://cloud.google.com/solutions/bigquery-data-warehouse#backup-and-recovery
NEW QUESTION # 193
An online brokerage company requires a high volume trade processing architecture. You need to create a secure queuing system that triggers jobs. The jobs will run in Google Cloud and cat the company's Python API to execute trades. You need to efficiently implement a solution. What should you do?
- A. Write an application hosted on a Compute Engine instance that makes a push subscription to the Pub/Sub topic
- B. Use Cloud Composer to subscribe to a Pub/Sub tope and can the Python API.
- C. Write an application that makes a queue in a NoSQL database
- D. Use a Pub/Sub push subscription to trigger a Cloud Function to pass the data to tie Python API.
Answer: C
NEW QUESTION # 194
Case Study: 2 - MJTelco
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to- many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments ?development/test, staging, and production ?
to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately
100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis.
Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?
- A. Rowkey: data_pointColumn data: device_id, date
- B. Rowkey: device_idColumn data: date, data_point
- C. Rowkey: date#device_idColumn data: data_point
- D. Rowkey: dateColumn data: device_id, data_point
- E. Rowkey: date#data_pointColumn data: device_id
Answer: A
NEW QUESTION # 195
You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?
- A. Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs.
Configure the job to use non-default Compute Engine machine types when needed. - B. Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.
- C. Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.
- D. Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.
Answer: B
Explanation:
Dataflow is good with autoscaling and stackdriver to monitor CPU and Storage.
NEW QUESTION # 196
You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?
- A. Refresh your browser tab showing the visualizations.
- B. Clear your browser history for the past hour then reload the tab showing the virtualizations.
- C. Disable caching in BigQuery by editing table details.
- D. Disable caching by editing the report settings.
Answer: D
Explanation:
Reference https://support.google.com/datastudio/answer/7020039?hl=en
NEW QUESTION # 197
If you're running a performance test that depends upon Cloud Bigtable, all the choices except one below are recommended steps. Which is NOT a recommended step to follow?
- A. Run your test for at least 10 minutes.
- B. Do not use a production instance.
- C. Use at least 300 GB of data.
- D. Before you test, run a heavy pre-test for several minutes.
Answer: B
Explanation:
Explanation
If you're running a performance test that depends upon Cloud Bigtable, be sure to follow these steps as you plan and execute your test:
Use a production instance. A development instance will not give you an accurate sense of how a production instance performs under load.
Use at least 300 GB of data. Cloud Bigtable performs best with 1 TB or more of data. However, 300 GB of data is enough to provide reasonable results in a performance test on a 3-node cluster. On larger clusters, use
100 GB of data per node.
Before you test, run a heavy pre-test for several minutes. This step gives Cloud Bigtable a chance to balance data across your nodes based on the access patterns it observes.
Run your test for at least 10 minutes. This step lets Cloud Bigtable further optimize your data, and it helps ensure that you will test reads from disk as well as cached reads from memory.
Reference: https://cloud.google.com/bigtable/docs/performance
NEW QUESTION # 198
You are loading CSV files from Cloud Storage to BigQuery. The files have known data quality issues, including mismatched data types, such as STRINGS and INT64s in the same column, and inconsistent formatting of values such as phone numbers or addresses. You need to create the data pipeline to maintain data quality and perform the required cleansing and transformation. What should you do?
- A. Use Data Fusion to convert the CSV files lo a self-describing data formal, such as AVRO. before loading the data to BigOuery.
- B. Use Data Fusion to transform the data before loading it into BigQuery.
- C. Create a table with the desired schema, toad the CSV files into the table, and perform the transformations in place using SQL.
- D. Load the CSV files into a staging table with the desired schema, perform the transformations with SQL.
and then write the results to the final destination table.
Answer: B
Explanation:
Data Fusion's advantages:
Visual interface: Offers a user-friendly interface for designing data pipelines without extensive coding, making it accessible to a wider range of users.
Built-in transformations: Includes a wide range of pre-built transformations to handle common data quality issues, such as:
Data type conversions
Data cleansing (e.g., removing invalid characters, correcting formatting) Data validation (e.g., checking for missing values, enforcing constraints) Data enrichment (e.g., adding derived fields, joining with other datasets) Custom transformations: Allows for custom transformations using SQL or Java code for more complex cleaning tasks.
Scalability: Can handle large datasets efficiently, making it suitable for processing CSV files with potential data quality issues.
Integration with BigQuery: Integrates seamlessly with BigQuery, allowing for direct loading of transformed data.
NEW QUESTION # 199
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use?
(Choose three.)
- A. Clustering to divide the transactions into N categories based on feature similarity.
- B. Supervised learning to predict the location of a transaction.
- C. Reinforcement learning to predict the location of a transaction.
- D. Unsupervised learning to predict the location of a transaction.
- E. Supervised learning to determine which transactions are most likely to be fraudulent.
- F. Unsupervised learning to determine which transactions are most likely to be fraudulent.
Answer: A,C,F
NEW QUESTION # 200
What is the HBase Shell for Cloud Bigtable?
- A. The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables.
- B. The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables.
- C. The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances.
- D. The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.
Answer: B
Explanation:
The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables. The Cloud Bigtable HBase client for Java makes it possible to use the HBase shell to connect to Cloud Bigtable.
NEW QUESTION # 201
Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable.
The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data.
They want to improve this performance while minimizing cost. What should they do?
- A. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.
- B. Redefine the schema by evenly distributing reads and writes across the row space of the table.
- C. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.
- D. The performance issue should be resolved over time as the site of the BigDate cluster is increased.
Answer: B
NEW QUESTION # 202
A web server sends click events to a Pub/Sub topic as messages. The web server includes an event Timestamp attribute in the messages, which is the time when the click occurred. You have a Dataflow streaming job that reads from this Pub/Sub topic through a subscription, applies some transformations, and writes the result to another Pub/Sub topic for use by the advertising department. The advertising department needs to receive each message within 30 seconds of the corresponding click occurrence, but they report receiving the messages late. Your Dataflow job's system lag is about 5 seconds, and the data freshness is about 40 seconds. Inspecting a few messages show no more than 1 second lag between their event Timestamp and publish Time. What is the problem and what should you do?
- A. Messages in your Dataflow job are taking more than 30 seconds to process. Optimize your job or increase the number of workers to fix this.
- B. Messages in your Dataflow job are processed in less than 30 seconds, but your job cannot keep up with the backlog in the Pub/Sub subscription. Optimize your job or increase the number of workers to fix this.
- C. The advertising department is causing delays when consuming the messages. Work with the advertising department to fix this.
- D. The web server is not pushing messages fast enough to Pub/Sub. Work with the web server team to fix this.
Answer: B
Explanation:
To ensure that the advertising department receives messages within 30 seconds of the click occurrence, and given the current system lag and data freshness metrics, the issue likely lies in the processing capacity of the Dataflow job. Here's why option B is the best choice:
System Lag and Data Freshness:
The system lag of 5 seconds indicates that Dataflow itself is processing messages relatively quickly.
However, the data freshness of 40 seconds suggests a significant delay before processing begins, indicating a backlog.
Backlog in Pub/Sub Subscription:
A backlog occurs when the rate of incoming messages exceeds the rate at which the Dataflow job can process them, causing delays.
Optimizing the Dataflow Job:
To handle the incoming message rate, the Dataflow job needs to be optimized or scaled up by increasing the number of workers, ensuring it can keep up with the message inflow.
Steps to Implement:
Analyze the Dataflow Job:
Inspect the Dataflow job metrics to identify bottlenecks and inefficiencies.
Optimize Processing Logic:
Optimize the transformations and operations within the Dataflow pipeline to improve processing efficiency.
Increase Number of Workers:
Scale the Dataflow job by increasing the number of workers to handle the higher load, reducing the backlog.
Reference:
Dataflow Monitoring
Scaling Dataflow Jobs
NEW QUESTION # 203
You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?
- A. Move your data to Cloud Storage. Run your jobs on Dataproc.
- B. Rewrite your jobs in Apache Beam. Run your jobs in Dataflow.
- C. Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances.
- D. Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach.
Answer: A
Explanation:
Dataproc's Compatibility with Apache Spark: Dataproc is a managed service for running Hadoop and Spark clusters on Google Cloud. This means it is designed to seamlessly run Apache Spark jobs with minimal code changes. Your existing Spark jobs should run on Dataproc with little to no modification.
Cloud Storage as a Scalable Data Lake: Cloud Storage provides a highly scalable and durable storage solution for your data. It's designed to handle large volumes of data that Spark jobs typically process.
Minimizing Operational Overhead: By using Dataproc, you eliminate the need to manage and maintain a Hadoop cluster yourself. Google Cloud handles the infrastructure, allowing you to focus on your data processing tasks.
Tight Timeline and Minimal Code Changes: This option directly addresses the requirements of the question. It offers a quick and easy way to migrate your Spark jobs to Google Cloud with minimal disruption to your existing codebase.
Why other options are not suitable:
A . Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances: This option requires you to manage the underlying infrastructure yourself, which contradicts the requirement of using managed services.
C . Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach: While BigQuery is a powerful data warehouse, converting Spark scripts to SQL would require substantial code changes and might not be feasible within a tight timeline.
D . Rewrite your jobs in Apache Beam. Run your jobs in Dataflow: Rewriting jobs in Apache Beam would be a significant undertaking and not suitable for a quick migration with minimal code changes.
NEW QUESTION # 204
You are integrating one of your internal IT applications and Google BigQuery, so users can query BigQuery from the application's interface. You do not want individual users to authenticate to BigQuery and you do not want to give them access to the dataset. You need to securely access BigQuery from your IT application.
What should you do?
- A. Integrate with a single sign-on (SSO) platform, and pass each user's credentials along with the query request
- B. Create groups for your users and give those groups access to the dataset
- C. Create a service account and grant dataset access to that account. Use the service account's private key to access the dataset
- D. Create a dummy user and grant dataset access to that user. Store the username and password for that user in a file on the files system, and use those credentials to access the BigQuery dataset
Answer: C
NEW QUESTION # 205
You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes.
The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required.
You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)
- A. Redis
- B. Cassandra
- C. HDFS with Hive
- D. HBase
- E. MySQL
- F. MongoDB
Answer: C,D,F
NEW QUESTION # 206
Which of these statements about exporting data from BigQuery is false?
- A. The only supported export destination is Google Cloud Storage.
- B. To export more than 1 GB of data, you need to put a wildcard in the destination filename.
- C. The only compression option available is GZIP.
- D. Data can only be exported in JSON or Avro format.
Answer: D
Explanation:
Data can be exported in CSV, JSON, or Avro format. If you are exporting nested or repeated data, then CSV format is not supported.
Reference: https://cloud.google.com/bigquery/docs/exporting-data
NEW QUESTION # 207
Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?
- A. Use a row key of the form <timestamp>#<sensorid>.
- B. Use a row key of the form <sensorid>.
- C. Use a row key of the form >#<sensorid>#<timestamp>.
- D. Use a row key of the form <timestamp>.
Answer: C
Explanation:
Best practices of bigtable states that rowkey should not be only timestamp or have timestamp at starting.
It's better to have sensorid and timestamp as rowkey.
NEW QUESTION # 208
You need to create a new transaction table in Cloud Spanner that stores product sales data. You are deciding what to use as a primary key. From a performance perspective, which strategy should you choose?
- A. The original order identification number from the sales system, which is a monotonically increasing integer
- B. The current epoch time
- C. A random universally unique identifier number (version 4 UUID)
- D. A concatenation of the product name and the current epoch time
Answer: C
Explanation:
https://cloud.google.com/spanner/docs/schema-and-data-model#choosing_a_primary_key
NEW QUESTION # 209
......
Google Professional-Data-Engineer certification exam is one of the most sought-after certifications in the field of data engineering. Google Certified Professional Data Engineer Exam certification is ideal for individuals who are looking to take their career to the next level and want to demonstrate their expertise in cloud-based data engineering. By earning this certification, individuals can prove to employers that they have the necessary skills and knowledge to design and implement data processing systems using Google Cloud Platform technologies.
Pass Google Professional-Data-Engineer Exam in First Attempt Guaranteed: https://www.verifieddumps.com/Professional-Data-Engineer-valid-exam-braindumps.html
Pass Professional-Data-Engineer Exam Latest Practice Questions: https://drive.google.com/open?id=1mLavbBVmMkvu3J4dbh0zc0_aO5IO5imY
