Advanced Data Mining
Semester: Β,
ECTS: 7.5

Yannis Manolopoulos
(Course Coordinator)
Syllabus – IMC26
Week 1: Introduction and main concepts
Week 2: Advanced hashing part 1, the basics
Week 3: Advanced hashing part 2, minhash
Week 4: Advanced hashing part 3, simhash
Week 5: Data streams part 1, the basics
Week 6: Data streams part 2, sampling-based algorithms
Week 7: Data streams part 3, sketching-based algorithms
Week 8: Graph mining part 1, random walks
Week 9: Graph mining part 2, dense subgraphs
Week 10: Graph mining part 3, triangles
Week 11: Graph mining part 4, network representation learrning
Week 12: Dimensionality reduction techniques
Week 13: Algorithms for recommendation systems
Week 2: Advanced hashing part 1, the basics
Week 3: Advanced hashing part 2, minhash
Week 4: Advanced hashing part 3, simhash
Week 5: Data streams part 1, the basics
Week 6: Data streams part 2, sampling-based algorithms
Week 7: Data streams part 3, sketching-based algorithms
Week 8: Graph mining part 1, random walks
Week 9: Graph mining part 2, dense subgraphs
Week 10: Graph mining part 3, triangles
Week 11: Graph mining part 4, network representation learrning
Week 12: Dimensionality reduction techniques
Week 13: Algorithms for recommendation systems
Suggested Bibliography
- “Mining of Massive Datasets”, by Jure Leskovec, Anand Rajaraman, Jeff Ullman, Cambridge University Press, 2020.
- G. Cormode and S. Muthukrishnan. “An improved data stream summary: The count-min sketch and its applications”. Journal of Algorithms, 55(1):58–75, 2004.
- P. Flajolet and G. N. Martin. “Probabilistic counting algorithms for database applications”. Journal of Computer and System Sciences, 31:182–209, 1985.
- M. Garofalakis. “Querying and mining data streams: you only get one look”. Tutorial at VLDB 2002.
- G. S. Manku and R. Motwani. “Approximate frequency counts over data streams”, VLDB 2002.
- Jeffrey S. Vitter. Random Sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1), 1985.
- Moses S. Charikar, “Similarity Estimation Techniques from Rounding Algorithms”, STOC 2002.
- Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma,, “Detecting Near-Duplicates for Web Crawling”, WWW 2007.
- A.Z. Broder,“On the resemblance and containment of documents”, Proc. Compression and Complexity of Sequences, pp. 21–29, Positano Italy, 1997.
- A. Andoni, P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions”, Communications of the ACM 51:1, pp. 117–122, 2008.