Cloudera Apache Hadoop开发员(CCA)
企业内训新闻
热门课程
全部课程
视频中心
立即预约
考试服务
企业培训
关于我们
0551-65770388
课程名称:Cloudera Apache Hadoop开发员(CCA)
开班类型:周末+脱产班
推荐星级:5星
课程时长:4天/24小时
授课方式:面授、远程
-
课程描述:
4天的课程包涵了解Apache Spark的基础知识及其与Hadoop整体生态系统的集成方式。
本课程将重温HDFS的基础内容,学习如何使用Sqoop/Flume摄取数据,利用Spark处理分布式数据,学习在Impala和Hive上数据建模,以及在数据存储方面的最佳实践。培训对象:
企业管理者、CIO、CTO、政府信息部门官员、项目(开发)经理、咨询顾问;IT经理,IT咨询顾问,IT支持专家;系统工程师、数据中心管理员、云计算管理员及想加入云计算队伍的您
学员基础:
具备基本Linux系统管理经验;不需要事先掌握Hadoop相关知识
认证证书:
通过考试可获得Cloudera Certified Administrator for Apache Hadoop (CCAH)
课程目标:
How data is distributed, stored, and processed in a Hadoop cluster
How to use Sqoop and Flume to ingest data
How to process distributed data with Apache Spark
How to model structured data as tables in Impala and Hive
How to choose the best data storage format for different data usage patterns
Best practices for data storage课程内容:
Introduction to Hadoop and the Hadoop Ecosystem
Problems with Traditional Large-scale Systems Hadoop!
The Hadoop EcoSystem
Hadoop Architecture and HDFS
Distributed Processing on a Cluster
Storage: HDFS Architecture
Storage: Using HDFS
Resource Management: YARN Architecture
Resource Management: Working with YARN
Importing Relational Data with Apache Sqoop
Sqoop Overview
Basic Imports and Exports
Limiting Results
Improving Sqoop’s Performance
Sqoop 2
Introduction to Impala and Hive
Introduction to Impala and Hive
Why Use Impala and Hive?
Comparing Hive to Traditional Databases
Hive Use Cases
Modeling and Managing Data with Impala and Hive
Data Storage Overview
Creating Databases and Tables
Loading Data into Tables
HCatalog
Impala Metadata Caching
Data Formats
Selecting a File Format
Hadoop Tool Support for File Formats
Avro Schemas
Using Avro with Hive and Sqoop
Avro Schema Evolution
Compression
Data Partitioning
Partitioning Overview
Partitioning in Impala and Hive
Capturing Data with Apache Flume
What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Channels
Flume Configuration
Spark Basics
What is Apache Spark?
Using the Spark Shell
RDDs (Resilient Distributed Datasets)
Functional Programming in Spark
Working with RDDs in Spark
A Closer Look at RDDs
Key-Value Pair RDDs
MapReduce
Other Pair RDD Operations
Writing and Deploying Spark Applications
Spark Applications vs. Spark Shell
Creating the SparkContext
Building a Spark Application (Scala and Java)
Running a Spark Application
The Spark Application Web UI
Configuring Spark Properties
Logging
Parallel Programming with Spark
Review: Spark on a Cluster
RDD Partitions
Partitioning of File-based RDDs
HDFS and Data Locality
Executing Parallel Operations
Stages and Tasks
Spark Caching and Persistence
RDD Lineage
Caching Overview
Distributed Persistence
Common Patterns in Spark Data Processing
Common Spark Use Cases
Iterative Algorithms in Spark
Graph Processing and Analysis
Machine Learning
Example: k-means
Preview: Spark SQL
Spark SQL and the SQL Context
Creating DataFrames
Transforming and Querying DataFrames
Saving DataFrames
Comparing Spark SQL with Impala -
-
-