Michael E. Byczek, Technical Consultant
Michael E. Byczek

Microsoft Cloud-based Analytics

Microsoft Azure has dedicated services for cloud-based analytics: Data Lake, HDInsight, Machine Learning, Stream Analytics, Data Factory, SQL Data Warehouse, Data Catalog, and Power BI.
  • Fully managed big data management and advanced analytics suite
  • One monthly subscription for all analytics
  • Interface with Cortana as personal assistant
  • The Cortana Analytics Process: (1) ingest data; (2) explore and pre-process data; (3) create features; (4) create model; and (5) deploy the model.
Machine Learning
  • Use Python or R with built-in packages and support for custom code
  • Drag-and-drop interface to deploy algorithms in seconds
  • Embed predictive analytics into applications
  • Deploy model into production as a web service within minutes
  • Large algorithm library
  • Visual studio to build and test model without programming
  • Ready-to-consume options recommendations, text analytics, and anomaly detection
  • Datasets up to 10GB of dense numerical data
  • Supports over 400 CRAN Packages in R
  • Jupyter Notebooks or standard Python modules
  • Cheat sheet to pick right machine learning algorithm from library
  • Embed Python scripts into the experiment
Data Lake, Data Lake Store, and Data Lake Analytics
  • Repository for big data analytics workload
  • Store data of any size, shape, or speed
  • Supports data in the terabyte or exabyte range
  • Designed for massive throughput involving petabytes of data; Built on YARN
  • Store relational and non-relational data in native format without schema definition
  • Run Hadoop filesystem on Windows or Linux with data analysis using MapReduce or Hive
  • Batch, streaming, and interactive analytics with U-SQL, Spark, Hive, HBase, and Storm
  • Integrate with Active Directory for access, rules and user management
  • Design/tune big data queries with Visual Studio, U-SQL, SQL, Hadoop, Hive, Storm, Spark, .NET, Power BI, Tableau, and Qlik
  • Dynamically provision resources
  • 100% Hadoop-based cloud service with integration for Excel and on-site Hadoop clusters
  • Create, configure, submit, and monitor Hadoop jobs in Java, .NET, and C++
  • Process unstructured and semi-structured data, such as click streams, social media, and server logs
Stream Analytics
  • Develop/deploy solutions for real-time insights from devices, sensors, websites, social media, and IoT
  • SQL-based language
  • Millions of events per second (1 GB per second)
  • Detect anomalies and trigger alerts for specific conditions
Data Factory
  • Compose, schedule, and orchestrate data pipelines
  • Cloud-based data movement services
  • Transform raw data into finished ready-to-use info for BI tools
  • Understand when data arrived, where came from, and when ready for processing
SQL Data Warehouse
  • Enterprise-class SQL server
  • Dynamically deploy, grow, shrink, and pause computation
  • Massively parallel processing architecture
Data Catalog
  • Enterprise-wide metadata catalog for efficient datasource discovery
  • Spend more time analyzing data than searching for it
  • All employees have access to the data for contribution

Copyright © 2016. Michael E. Byczek. All Rights Reserved.