CentraleSupélecDépartement informatique
Plateau de Moulon
3 rue Joliot-Curie
F-91192 Gif-sur-Yvette cedex
1CC1000 - Information Systems and Programming - Lab: ETL processes with Python

Activities of this lab:

  • Learn how to manipulate semi-structured data with Pandas.
  • Learn what ETL processes are.
  • PistusResa ETL module: clean and import data into a relational database.
  • PistusResa Student module: read data from a relational database using Python code.

Introduction to Pandas

The data used in PistusResa is stored in two CSV files. We'll use Pandas to read the data from the CSV files, clean them and load them into the Pistus database.

The goal of this section is to learn how to use Pandas.


Download this notebook and import it into Visual Studio Code. Read the notebook and do the exercises before proceeding to the next section.

⏰ Expected duration of this part: at most 1.5 hours.


The PistusResa ETL module

The goal of this section is to extract the data from the input CSV files, transform and load them into the Pistus database.


Download this notebook and import it into Visual Studio Code. Read the notebook and do the exercises before proceeding to the next section.

⏰ Expected duration of this part: at most 30 minutes.


Implementation of PistusResa

You're now ready to work on the implementation of PistusResa.


  • Download the skeleton of the application PistusResa and import it into Visual Studio Code.
  • Copy the database file pistus.db to the folder data.

⏰ Time to spend on this part: at most 1 hour.


The Student module

The Student module contains the functions necessary to add and edit student information in the database.


The code of the Student module is in file mstudent.py.


In file mstudent.py, the following functions are defined but not implemented.

  • get_student, returns a student's data from the database.
  • get_associations, returns all associations stored in the database.
  • get_roles, returns the student roles in an association.
  • get_memberships, returns the associations of a specific student.
  • add_email_address, adds an email address to a student.
  • add_student, adds a student to the database.
  • add_membership, adds a student to an association in the database.
  • delete_email_address, removes the email address of a student.
  • delete_membership, deletes a student's membership in an association.
  • update_first_name, updates student's first name.
  • update_last_name, updates a student's last name.
  • update_gender, updates a student's gender.
  • update_email_address, updates the email address of a student.
  • update_membership, updates an existing membership of a student.

In file mstudent_test.py we defined and implemented a test function for each of the functions listed above. These test functions are called in the main section of file mstudent.py: see the code after the line 487.


Here's what you have to do:


  • Open file mstudent.py.
  • Implement the functions in the order in which they are defined: the order is important as the tests depend on that.
  • Each function is commented in file mstudent.py. Read carefully all the comments to understand how to implement each function.
  • You'll have to use SQL queries in each function. Refer to the SQL cheat sheet if you have doubts about the SQL syntax.
  • Whenever you finish the implementation of a function, run file mstudent.py as a Python script. This will trigger the execution of the tests.
  • If a test fails, you'll need to understand the error messages and fix the problem.

👉 Remember to remove the instruction raise NotImplementedError when you implement a function. Otherwise, your function would always be considered as not implemented.

👉 The Registration module (file mregistration.py) is already implemented. Look at the code to understand how the Student module should be implemented.

Next lab

In the next lab we will:

  • code a login window for PistusResa.
  • code the deadline and authentication modules of PistusResa (if time allows it).

👉 If you reached the end of this lab, and you still have time, you can start the next lab.