East Carolina University
Department of Computer Science
CSCI 4180
Big Data Analytics
Standard Syllabus
3 credits |
Prepared by Nasseh Tabrizi, May 2018 |
Catalog entry
P: CSCI 3700. Hands-on introduction to very big data and the
practical issues surrounding how the data is stored, processed,
analyzed, and visualized. Work with cloud-based high performance
computing systems, large data collections, and high velocity data
streams.
Course summary
Introduces students to technologies, algorithms, and architecture for analyzing very large data sets.
Course topics
-
Introduction to the terminology of data analytics
-
Technologies used for Handling Big Data
-
Natural Language Processing
-
Distributed file systems and map-reduce as a tool for creating
parallel algorithms that succeed on very large amounts of data.
-
Similarity search, including the key techniques of minhashing and localitysensitive hashing.
-
Stream data analytics and algorithms for dealing with data in real time.
-
The technology of search engines, including Google’s PageRank,
link-spam detection, and the hubs-and-authorities approach.
-
Frequent-itemset mining, including association rules,
market-baskets, the A-Priori Algorithm and its improvements.
-
Algorithms for clustering very large, high-dimensional datasets.
-
Two key problems for Web applications: managing advertising and recommendation systems.
-
Machine-learning algorithms that can be applied to very large
data, such as perceptrons, support-vector machines, and gradient
descent.
Student learning outcomes
-
Explain the fundamentals of Big Data standards for creating parallel algorithms
-
To demonstrate effective use of Map Reduce and No SQL system
-
Explain the fundamental techniques and tools used to design and analyze large volumes of data.
-
Implement the most important and popular algorithms used in data mining and machine leaning.
-
Acquire deep understanding of what the algorithms are supposed to do
-
Demonstrate the effectiveness of predictive data analytics project lifecycle
-
Demonstrate effective use of machine learning-based predictive tools used in Big Data.
Textbook
Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
Grading
The course is graded using a combination of a group project
(20%), Homeworks (10%), weekly quizzes (10%), a midterm exam
(30%), and a final exam (30%). Letter grades are as follows: 94 or
higher is an A; 90 or higher is an A-; 87 or higher is a B+; 83 or
higher is a B; 80 or higher is a B-; 77 or higher is a C+; 73 or
higher is a C; 70 or higher is a C-; 67 or higher is a D+; 63 or
higher is a D; 60 or higher is a D-; and lower than 60 is an
F.
Grade meanings
Grade |
Meaning |
A |
Achievement substantially exceeds basic course expectations |
A− |
|
B+ |
|
B |
Achievement exceeds basic course expectations |
B− |
|
B+ |
|
C |
Achievement adequately meets basic course expectations |
C− |
|
D+ |
|
D |
Achievement falls below basic course expectations |
D− |
|
F |
Failure – achievement does not justify credit for course |