Big Data Summer Program Curriculum
The curriculum for the Big Data Summer Program consists of four courses. Students will complete the required sequence of courses and gain in-depth knowledge and skill in the areas of: Data Systems, Data Querying & Preparation, Data Science & Analytics and Deriving Knowledge from Data at Scale. *Free tuition for qualifying students. Courses must be taken in the following sequence:
Data Systems (3 cr.)
The first half of the course will begin by reviewing traditional database technologies and architectures including both open source and commercial offerings. The theoretical basis of persistent data stores will be covered along with the various architectural patterns that make up production implementations within the enterprise (e.g., OLTP, OLAP, DW, Object-Relational). Each pattern will be reviewed and critiqued for strengths and weaknesses. The second half of the course will cover new paradigms of data systems including those that fall within the realm of "Big Data," which evolved from Google, LinkedIn, and Facebook. The student will be exposed to the original source papers that have driven the Big Data movement as well as emerging architectures such as real- time and complex event processing toolsets.
Data Querying and Preparation (3 cr.)
The goal of this course is to develop the student's skills in Extract, Transform, and Load (ETL) and Extract, Load, and Transfer (ELT) methods as a means of sourcing data sets from myriad data silos and producing a unified data repository for analytic processing. The first half of the course will begin by reviewing various data query techniques used in practice by data professionals as well as traditional database technologies and architectures. The second half of the course will take those skills and apply them towards a student project that will begin with a set of business requirements and numerous sources of data. The student will be presented with various open source and commercial products, choose a toolset, and implement a data analytics sandbox. The final phase of the course project will require the student to present to the class via oral presentation an executive briefing of how the solution met the business requirements and defend the technologies selected.
Data Science and Analytics (3 cr.)
The goal of this class is to review and apply various mathematical and statistical frameworks and methods towards common business analytics requirements. The class will first begin with an overview of the various prerequisite mathematical materials to ensure the class starts from a common baseline. The class will rapidly progress into methods of how to classify and decompose a business requirement into a testable and quantifiable problem. From there the student will be assigned to a team and as a group will apply the various techniques (regression, clustering, decision trees, hyper graphs, and various machine learning algorithms) towards an applied analytics case study. The final project and its results will be presented to the class for final sign off.
Deriving Knowledge from Data at Scale (3 cr.)
The goal of this course is to explore emerging techniques of large-scale distributed data processing in deriving knowledge from massive data sets. The course will begin with a high-level overview of the distributed processing paradigm, its various complexities, and the rapidly evolving technology landscape to coral this complexity. The course will survey a number of the use- cases specifically applicable to massive scale processing and deep dive into the various technologies available in the enterprise. Topics covered will be the Map/Reduce programming model, the Hadoop processing and storage framework, and NoSQL and columnar data stores. The class will be project driven and result in a final presentation to the class.