This is a project driven course designed to provide techniques for acquiring, managing and analyzing massive unstructured data. Consideration will be given to both batch mode processing and real time analytics Specific topics include the MapReduce parallel computing paradigm, distributed file systems, the Hadoop Ecosystem and its components such as Pig, Hive, HBase, Oozie, Yarn and Mahout, NoSQL databases, cloud computing, techniques for clustering and visualizing big data, Web analytics, machine learning in a big data setting and data setting and data security issues. Applications in business, engineering, health care and social networks will also be covered.
Prerequisite: CPSC 405 (3 credits)
This course and its outcomes support the Computing Learning Outcomes of Problem Solving and Critical Thinking (PS&CT), and Ethical and Professional Responsibilities (E&PR). These Computing Learning Outcomes are tied directly to the University Wide Outcomes of Critical Thinking and Digital Citizenship.
Learning Outcomes | Course Objectives |
---|---|
PS & CT a. Formulate project requirements and alternative solutions appropriate to the computing problems | 1. Demonstrate an understanding of unstructured data, distributed file systems, methods of big data acquisition and database management, and how to create scalable problems. |
2. Employ best practices for project design, indexing, job scheduling. | |
PS & CT d. Implement computing solutions that consist of system and application software written in various programming languages | 3. Wite effective MapReduce programs, use Hadoop’s data file system, recognize design patterns, and be able to effectively implement big data clustering and visualization techniques. |
4. Perform cloud computing for massive storage capability | |
5. Perform analyses using specialized big data analytics tools including Pig, Hive, HBase, Oozie, Yarn, and Mahout. | |
E & PR d. Plan for and ensure the security, privacy, and integrity of data | 6. demonstrate techniques for maintaining data security including encryption, authentication, authorization, and single-sign-on. |
Additional Course Objectives include:
The student will be able to: