News | General Information | Syllabus | Class Schedules | Previous Years
- NOTICE: We have received some notifications requesting access to the Google Form for exam registration from personal Gmail accounts. Please remember to use your institutional credentials (
@studenti.uniroma1.it
) when accessing the form. Requests made with personal accounts will NOT be granted.
If you experience any technical issues with your institutional account, please reach out to the university's IT support team for assistance.
Thank you for your cooperation. Best regards and Happy New Year! 🥳 - ANNOUNCEMENT: You can book for the February exam using this Google From
- ANNOUNCEMENT: You can book for the January exam using this Google Form
- ANNOUNCEMENT: Those who did not complete the OPIS questionnaires in class are strongly encouraged to do so as soon as possible, using the following code: 3XIR60BN.
- ANNOUNCEMENT: There will be no class on Thursday, December 12, due to the IT Meeting event, which you are all welcome to join.
- ANNOUNCEMENT: On Thursday, November 21, 2024, from 1:00 to 1:30 p.m. in Room L1, Prof. Marco Schaerf, Dean of Faculty, will provide an update on the critical logistics issues affecting classroom spaces this semester. All students are welcome to attend. Please note that our class will tentatively begin 30 minutes later than usual to accommodate this meeting.
- ANNOUNCEMENT: The second part of the course will start on Wednesday, November 13, 2024.
- NOTICE: The first part of the course completed today, 30.10.2024. Keep an eye on Github to get updates on when the second part of the course will start.
- NOTICE:
Thursday October 24th lecture will finish at 3PM rather than 4PM.Nope, nevermind, it will finish at 4PM as usual. - NOTICE: We will not have class on Wednesday October 23rd.
- NOTICE: Starting from Wednesday October 16th, lectures on Wednesday will regularly take place in Aula Magna - RM111, Viale Regina Elena 295.
- NOTICE: 09.10.2024 lecture will take place on Aula 301 Viale Regina Elena, Edificio D.
- NOTICE: 02.10.2024 lecture will take place on Aula 101 Viale Regina Elena, Edificio D.
- NOTICE: The very first class, scheduled for Wednesday, 25th September 2025, will exceptionally take place in Room V "Guido Castelnuovo". For additional information, please take a look at the first-week class schedule available here.
- All the students that are willing to attend this class must subscribe ASAP to the Moodle web page of the course, as indicated below.
- Classes will start on Wednesday, September 25 2024. Students are kindly asked to refer to the class schedule at the following link.
Welcome to the Big Data Computing class!
This is a first-semester course of the MSc in Computer Science at the Sapienza University of Rome.
This repository contains class material along with any useful information for the 2024-2025 academic year.
The Big Data Computing course is divided into two distinct modules, each one carrying 3 CFUs (credits).
Prof. Daniele De Sensi will lead the first module, while the second module will be taught by Prof. Gabriele Tolomei.
Importantly, these modules will not run concurrently; once the first module concludes, the second will begin.
- Wednesday from 10:00 a.m. to 12:00 p.m. (Aula Magna - "Building C" - Viale Regina Elena 295 [map])
- Thursday from 1:00 p.m. to 4:00 p.m. (Room 1L - Via Del Castro Laurenziano, 7a [map])
Students must subscribe to the Moodle web page using the same credentials (username/password) to access the Wi-Fi network and Infostud services at the following link: https://elearning.uniroma1.it/course/view.php?id=18525 All the information will be provided through GitHub, whereas Moodle will be used as a repository for the course material.
Prof. Daniele De Sensi
- Email: [email protected]
- Website: https://danieledesensi.github.io
- Bacheca Sapienza: https://corsidilaurea.uniroma1.it/it/users/danieledesensiuniroma1it
Prof. Gabriele Tolomei
- Email: [email protected]
- Website: https://www.di.uniroma1.it/~tolomei
- Bacheca Sapienza: https://corsidilaurea.uniroma1.it/it/users/gabrieletolomeiuniroma1it
Prof. Daniele De Sensi
Please drop me a message at [email protected] in case you would like to schedule a meeting, either online (i.e., via Google Meet or Zoom) or in-person (i.e., in Room 306 located at the 3rd floor of Building E in Viale Regina Elena 295).
Prof. Gabriele Tolomei
Please drop me a message at [email protected] in case you would like to schedule a meeting, either online (i.e., via Google Meet or Zoom) or in-person (i.e., in Room 106 located at the 1st floor of Building E in Viale Regina Elena 295).
The amount, variety, and rate at which data is being generated nowadays, both by humans and machines, are unprecedented. This opens up a number of challenges on dealing with those data, as traditional computing paradigms are not conceived to operate at such a scale.
"Big Data" is the umbrella term that has rapidly become popular to describe methodologies and tools specifically designed for collecting, storing, and processing very large or complex data sets. In addition to addressing foundational computer science problems, such as searching and sorting, big data computing mainly focuses on extracting knowledge - thereby value - from large-scale data sets using advanced data analysis techniques, such as machine learning.
This course is intended to provide graduate-level students with a deep understanding of programming models and computer architectures that are suitable for the large-scale analysis of data. More specifically, the course will give students the ability to understand challenges and solutions in developing big data/machine learning workloads, and to tackle real-world problems faced by the so-called "Big Five" tech companies (i.e., Apple, Amazon, Google, Microsoft, and Facebook): text/graph analysis, classification/regression, and recommendation, just to name a few.
The course assumes that students are familiar with the basics of data analysis and machine learning, properly supported by a strong knowledge of foundational concepts of calculus, linear algebra, probability, statistics, and computer architectures.
The exam will consist in an oral exam.
No textbooks are mandatory to successfully follow this course. However, there is a huge set of references which may be worth mentioning, especially to those who wants to dig deeper into some specific topics. Among those, some readings I would like to suggest are as follows:
- Mining of Massive Datasets [Leskovec, Rajaraman, Ullman] available online.
- Big Data Analysis with Python [Marin, Shukla, VK]
- Large Scale Machine Learning with Python [Sjardin, Massaron, Boschetti]
- Spark: The Definitive Guide [Chambers, Zaharia]
- Learning Spark: Lightning-Fast Big Data Analysis [Karau, Konwinski, Wendell, Zaharia]
- Hadoop: The Definitive Guide [White]
- Python for Data Analysis [Mckinney]
Lecture # | Date | Topic | Material |
---|---|---|---|
Lecture 1 | 25/09/2024 | Introduction to Big Data: Motivations and Challenges | [slides: PPT, PDF, recording: Recording] |
Lecture 2 | 26/09/2024 | Distributed Deep Learning | [slides: PPT, PDF, recording: Recording] |
Lecture 3 | 02/10/2024 | Introduction to Hardware Architectures for Big Data Processing | [slides: PPT, PDF, recording: Recording] |
Lecture 4 | 03/10/2024 | Network Topology Design | [slides: PPT, PDF, recording: Recording] |
Lecture 5 | 09/10/2024 | TCP/IP Limitations and Intro to RDMA | [slides: PPT, PDF, recording: Recording] |
Lecture 6 | 10/10/2024 | RDMA and SmartNICs | [slides: PPT, PDF, recording: Recording] |
Lecture 7 | 16/10/2024 | Congestion Control | [slides: PPT, PDF, recording: Recording] |
Lecture 8 | 17/10/2024 | Load Balancing and In-Network Compute | [slides: PPT, PDF, recording: Recording] |
Lecture 10 | 30/10/2024 | Recap & Outlook | [slides: PPT, PDF, recording: Recording] |
Guest Seminar | 06/12/2024 | Prof. Marco Canini Guest Seminar | [slides: Slides |
Lecture 11 | 13/11/2024 | Introduction to Big Data (Part II) | [slides: PDF] |
Lecture 12 | 14/11/2024 | The Curse of Dimensionality | [slides: PDF, notebook: ipynb] |
Lecture 13 | 11/20/2024 | Clustering: A General Framework | [slides: PDF] |
Lectures 14-15 | 11/21/2024-11/27/2024 | Clustering: K-means | [slides: PDF] |
Lecture 16 | 11/28/2024 | Dimensionality Reduction: Principal Component Analysis (Part I) | [slides: PDF, notes: PDF] |
Lecture 17 | 12/04/2024 | Dimensionality Reduction: Principal Component Analysis (Part II) | [slides: PDF] |
Lecture 18 | 12/05/2024 | Recommender Systems [Part I & II] | [slides: PDF(I), PDF(II)] |
Lecture 19 | 12/11/2024 | Recommender Systems (Matrix Factorization) [Part III] | [slides: PDF(III)] |
Lectures 20-21 | 12/18/2024-12/19/2024 | PageRank | [slides: PDF, notes: PDF] |
In the following, you can quickly navigate through Big Data Computing class information and material from previous years.
NOTE: The folder containing the class material is unique, and it is subject to changes and/or updates; as such, there may be differences between the content displayed on this website and what has been shown in class in the past.