CPSC 485 - Big Data Analytics

Catalog Description

This is a project driven course designed to provide techniques for acquiring, managing and analyzing massive unstructured data. Consideration will be given to both batch mode processing and real time analytics Specific topics include the MapReduce parallel computing paradigm, distributed file systems, the Hadoop Ecosystem and its components such as Pig, Hive, HBase, Oozie, Yarn and Mahout, NoSQL databases, cloud computing, techniques for clustering and visualizing big data, Web analytics, machine learning in a big data setting and data setting and data security issues. Applications in business, engineering, health care and social networks will also be covered.

Prerequisite: CPSC 405 (3 credits)

Course Outcomes

This course and its outcomes support the Computing Learning Outcomes of Problem Solving and Critical Thinking (PS&CT), and Ethical and Professional Responsibilities (E&PR). These Computing Learning Outcomes are tied directly to the University Wide Outcomes of Critical Thinking and Digital Citizenship.

Learning OutcomesCourse Objectives
PS & CT a. Formulate project requirements and alternative solutions appropriate to the computing problems1. Demonstrate an understanding of unstructured data, distributed file systems, methods of big data acquisition and database management, and how to create scalable problems.
2. Employ best practices for project design, indexing, job scheduling.
PS & CT d. Implement computing solutions that consist of system and application software written in various programming languages3. Wite effective MapReduce programs, use Hadoop’s data file system, recognize design patterns, and be able to effectively implement big data clustering and visualization techniques.
4. Perform cloud computing for massive storage capability
5. Perform analyses using specialized big data analytics tools including Pig, Hive, HBase, Oozie, Yarn, and Mahout.
E & PR d. Plan for and ensure the security, privacy, and integrity of data6. demonstrate techniques for maintaining data security including encryption, authentication, authorization, and single-sign-on.

Additional Course Objectives include:

The student will be able to:

  1. Perform real time analytics.
  2. Analyze Web data including Twitter feeds, Facebook, and Web mining.
  3. Implement and leverage machine learning tools for big data.
  4. Demonstrate a fundamental understanding of the advantages of NoSQL database and how to use them for data analytics.