Michael E. Byczek, Software Engineer

Big Data and Apache Hadoop Framework

Apache Hadoop is an open source framework for distributed processing of large data sets (structured and unstructured). Files are split into blocks and distributed across nodes in a cluster the scale from one to thousands of computers.

Most of an average company's data is unstructured (i.e. emails and social media) that don't fit perfectly into rows and columns. Hadoop handles any file or format of data.

The framework consists of:
Hadoop includes a larger collection of services that include other tools, such as:
MapReduce
Example: multiple files contain high temperatures for the same cities on different days. The "map" job calculates the highest temperature in each file in distinct jobs for each file. Those individual jobs are fed into the "reduce" phases which combines all the high temperatures to arrive at a single high temperature for each city.

Spark
Pig
Hive
Mahout

Technology Sections

Programming Languages
Python
Mobile App Development
Databases
Software Engineering
Data Science
Spreadsheets
Algorithms
Operating Systems
Cloud Platform
Big Data
Cyber Security
eDiscovery
Legal Software



Michael E. Byczek
Chicago, Illinois
312.434.9409
Email: michael@byczek.pro

Profile - Contact

Full Site or Mobile Version
Current View: Mobile

Copyright © 2017. Michael E. Byczek. All Rights Reserved.