Technologies for Big Data Analytics

Semester: Α,
ECTS: 7.5

Apostolos Papadopoulos

(Course Coordinator)

Syllabus

Week 1: Introduction and main concepts
Week 2: Apache Hadoop part 1, architecture
Week 3: Apache Hadoop part 2, the MapReduce programming model
Week 4: Apache Spark, part 1, the basics
Week 5: Apache Spark, part 2, details
Week 6: Apache Spark, part 3, dataframes
Week 7: Apache Spark, part 4, data mining with Spark
Week 8: NOSQL part 1, introduction
Week 9: NOSQL part 2, MongoDB
Week 10: NOSQL part 3, HBase
Week 11: NOSQL part 4, Neo4j
Week 12: Data streams
Week 13: Data lakes

Suggested Bibliography

  • “Hadoop: The Definitive Guide Third Edition” by Tom White, O’Reilly Media, 2012.
  • “Spark: The Definitive Guide” by Bill Chambers, Matei Zaharia, O’Reilly Media, 2018.
  • “High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark” by Holden Karau, Rachel Warren, O’Reilly Media, 2017.
  • “Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data” by Byron Ellis and Justin Langseth, Wiley, 2014.
  • “Storm Applied: Strategies for real-time event processing” by Sean T. Allen, Matthew Jankowski, and Peter Pathirana, Manning, 2015.
  • “Mastering Apache Storm” by Ankit Jain and Ashish Sarin, Packt, 2017.>
  • “Stream Processing with Apache Flink” by Fabian Hueske and Vasia Kalavri, O’Reilly Media, 2019.
  • “Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing” by Tyler Akidau, Slava Chernyak, Reuven Lax, O’Reilly Media, 2018.
  • “Graph Databases” 2nd Edition, by Ian Robinson, Jim Webber, and Emil Eifrém, O’Reilly Media, 2015.
  • “MongoDB: The Definitive Guide: Powerful and Scalable Data Storage” 2nd Edition, by Kristina Chodorow, O’Reilly Media, 2013.
  • “HBase: The Definitive Guide”, by Lars George, O’Reilly Media, 2011.