East Carolina University
Department of Computer Science
CSCI 4130
Information Retrieval
Standard Syllabus
3 credits |
Prepared by Venkat Gudivada, May 2018 |
Catalog entry
P: CSCI 2540; MATH 2228 or MATH 2283. Theory and algorithms for
modeling and retrieving text. Text representation, IR models, query
operations, retrieval evaluation, information extraction, text
classification and clustering, enterprise and Web search, and
recommender systems.
Course summary
Have you ever wondered about how web search engines scour the
web and find relevant documents to a query in a fraction of a
second? Information Retrieval (IR) is the foundation for modern
search engines.
Search is not limited to web documents. You can use IR
technologies for searching digital libraries, enterprise document
collections, and documents on your desktop computer. To see the
power of IR in desktop search, try Spotlight Search on
Mac computers.
In this course you will learn the core topics underlying the
modern search technologies -- algorithms, data structures,
indexing, query execution, and ranking search results. You will
also learn about evaluating IR systems and developing search
applications using open-source IR libraries and frameworks.
Course topics
-
Document preprocessing
-
Dictionary and postings lists
-
Tolerant retrieval
-
Index construction and compression
-
Term weighting
-
Set-theoretic models: Boolean, Fuzzy Set, and Extended Boolean
-
Algebraic models: Vector Space, Latent Semantic Indexing, Generalized Vector Space, Neural Network
-
Probabilistic models: Classical, Statistical Language Modeling,
Divergence from Randomness (DFR), and Probabilistic Inference
-
Axiomatic thinking for information retrieval
-
Information retrieval systems evaluation
-
Relevance feedback and query expansion
-
Document classification and clustering
-
Question-answer systems
Student learning outcomes
-
Design and implement solutions to information retrieval tasks and evaluate them.
-
Develop search solutions using open-source information retrieval libraries and search engines.
-
Analyze performance of information retrieval systems using standard test collections.
-
Make practical recommendations about deploying information retrieval systems in different search domains.
Textbook
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze.
Introduction to information retrieval.
New York, NY: Cambridge University Press, 2008. ISBN: 978-0521865715.
Other required material
- Martin Kleppmann. Designing Data-Intensive Applications:
The Big Ideas Behind Reliable, Scalable, and Maintainable
Systems. Sebastopol, California: O’Reilly Media, 2017. ISBN:
978-1449373320.
- Radu Gheorghe, Matthew Lee Hinman, and Roy
Russo. Elasticsearch in Action. Shelter Island, New York:
Manning Publications, 2015. ISBN: 978-1617291623.
- Clinton Gormley and Zachary Tong. Elasticsearch: The
Definitive Guide. A Distributed Real-Time Search and Analytics
Engine. Sebastopol, California: O’Reilly Media, 2015. ISBN:
978-1449358549.
- Grant S. Ingersoll, Thomas S. Morton, and Andrew
L. Farris. Taming text: how to find, organize, and manipulate
it. Shelter Island, NY: Manning, 2013. ISBN:
978-1933988382.
- Dan McCreary and Ann Kelly. Making Sense of NoSQL: A guide
for managers and the rest of us. Shelter Island, NY: Manning
Publications, 2013. ISBN: 978-1617291074.
- Stefan Büttcher, Charles L. A. Clarke, and Gordon
V. Cormack. Information retrieval: implementing and evaluating
search engines. Cambridge, Massachusetts: The MIT Press,
2010. ISBN: 978-0262026512.
- W. Bruce Croft, Donald Metzler, and Trevor
Strohman. Search engines: information retrieval in
practice. Boston, Massachusetts: Addison-Wesley, 2010. ISBN:
978-0136072249.
- Michael McCandless, Erik Hatcher, and Otis
Gospodnetic. Lucene in Action. Second. Shelter Island, New
York: Manning, 2010. ISBN: 978-1933988177.
- Richard K. Belew. Finding Out About: A Cognitive
Perspective on Search Engine Technology and the WWW. New York,
NY: Cambridge University Press, 2001. ISBN:
978-0521630283.
Grading
Course grade is based on four components:
Activity |
Weight (%) |
Assignments (paper-and-pencil) |
20 |
Assignments (programming) |
30 |
Midterm exam |
20 |
Final exam |
30 |
Grade meanings
Grade |
Meaning |
A |
Achievement substantially exceeds basic course expectations |
A− |
|
B+ |
|
B |
Achievement exceeds basic course expectations |
B− |
|
B+ |
|
C |
Achievement adequately meets basic course expectations |
C− |
|
D+ |
|
D |
Achievement falls below basic course expectations |
D− |
|
F |
Failure – achievement does not justify credit for course |