CSCI 4180

East Carolina University
Department of Computer Science

CSCI 4180
Big Data Analytics
Standard Syllabus

3 credits

Prepared by Nasseh Tabrizi, May 2018

Catalog entry

P: CSCI 3700. Hands-on introduction to very big data and the practical issues surrounding how the data is stored, processed, analyzed, and visualized. Work with cloud-based high performance computing systems, large data collections, and high velocity data streams.

Course summary

Introduces students to technologies, algorithms, and architecture for analyzing very large data sets.

Course topics

Introduction to the terminology of data analytics
Technologies used for Handling Big Data
Natural Language Processing
Distributed file systems and map-reduce as a tool for creating parallel algorithms that succeed on very large amounts of data.
Similarity search, including the key techniques of minhashing and localitysensitive hashing.
Stream data analytics and algorithms for dealing with data in real time.
The technology of search engines, including Google’s PageRank, link-spam detection, and the hubs-and-authorities approach.
Frequent-itemset mining, including association rules, market-baskets, the A-Priori Algorithm and its improvements.
Algorithms for clustering very large, high-dimensional datasets.
Two key problems for Web applications: managing advertising and recommendation systems.
Machine-learning algorithms that can be applied to very large data, such as perceptrons, support-vector machines, and gradient descent.

Student learning outcomes

Explain the fundamentals of Big Data standards for creating parallel algorithms
To demonstrate effective use of Map Reduce and No SQL system
Explain the fundamental techniques and tools used to design and analyze large volumes of data.
Implement the most important and popular algorithms used in data mining and machine leaning.
Acquire deep understanding of what the algorithms are supposed to do
Demonstrate the effectiveness of predictive data analytics project lifecycle
Demonstrate effective use of machine learning-based predictive tools used in Big Data.

Textbook

Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman

Grading

The course is graded using a combination of a group project (20%), Homeworks (10%), weekly quizzes (10%), a midterm exam (30%), and a final exam (30%). Letter grades are as follows: 94 or higher is an A; 90 or higher is an A-; 87 or higher is a B+; 83 or higher is a B; 80 or higher is a B-; 77 or higher is a C+; 73 or higher is a C; 70 or higher is a C-; 67 or higher is a D+; 63 or higher is a D; 60 or higher is a D-; and lower than 60 is an F.

Grade meanings

Grade	Meaning
A	Achievement substantially exceeds basic course expectations
A−
B+
B	Achievement exceeds basic course expectations
B−
B+
C	Achievement adequately meets basic course expectations
C−
D+
D	Achievement falls below basic course expectations
D−
F	Failure – achievement does not justify credit for course

East Carolina University Department of Computer Science CSCI 4180 Big Data Analytics Standard Syllabus