Michael E. Byczek, Technical Consultant
Michael E. Byczek

Google Cloud-based Analytics

The Google Cloud Platform consists of Big Query, Cloud Dataflow, Dataproc, Datalab, and Cloud Pub/Sub.

Big Query
  • Fully managed data warehouse for data analysis
  • Petabyte-scale database
  • Load data from Google Cloud or other datastore
  • Protect data with ACLs and replicated storage
  • Streams of 100,000 rows per second for real time analysis
  • Processing power for SQL queries on terabytes of data per second
  • Also read/write with Hadoop and Spark
Cloud Dataflow
  • Develop/execute data processing patterns
  • Batch and streaming big data processing
  • Unified programming model
  • Similar to Hadoop MapReduce
  • Customer extensions with Java-based SDK or alternate runtimes like Spark
  • Dynamically provision resources
  • Process big datasets through managed clusters with hundreds of nodes
  • Managed Spark, MapReduce, Pig, and Hive
  • Resizable clusters with variety of virtual machine types, disk sizes, and number of nodes
  • Image versioning to switch between versions of Hadoop/Spark
  • Interactive tool for large-scale data exploration, analysis, and visualization
  • Built-in Jupyter for machine learning and statistics
  • Analyze data with Python, SQL, and JavaScript
  • Git-based source control with Github operability
Cloud Pub/Sub
  • Real-time messaging and streaming data
  • Scalability for more than one million messages per second
  • Same technology as Google Ads and Gmail
  • Replicated storage and message encryption

Copyright © 2016. Michael E. Byczek. All Rights Reserved.