Activities of this lab:
- Learn how to manipulate semi-structured data with
Pandas
. - Learn what ETL processes are.
PistusResa
ETL module: clean and import data into a relational database.PistusResa
Student
module: read data from a relational database using Python code.
Introduction to Pandas
The data used in PistusResa
is stored in two CSV files.
We'll use Pandas
to read the data from the CSV files, clean them and load them into the Pistus database.
The goal of this section is to learn how to use Pandas
.
Download this notebook and import it into Visual Studio Code. Read the notebook and do the exercises before proceeding to the next section.
⏰ Expected duration of this part: at most 1.5 hours.
The PistusResa ETL module
The goal of this section is to extract the data from the input CSV files, transform and load them into the Pistus database.
Download this notebook and import it into Visual Studio Code. Read the notebook and do the exercises before proceeding to the next section.
⏰ Expected duration of this part: at most 30 minutes.
ANSWER ELEMENTS
You can download here the notebook with the solutions.
Implementation of PistusResa
You're now ready to work on the implementation of PistusResa.
- Download the skeleton of the application
PistusResa
and import it into Visual Studio Code. - Copy the database file
pistus.db
to the folderdata
.
⏰ Time to spend on this part: at most 1 hour.
The Student module
The Student
module contains the functions necessary to add and edit student information in the database.
The code of the Student
module is in file mstudent.py
.
In file mstudent.py
, the following functions are defined but not implemented.
get_student
, returns a student's data from the database.get_associations
, returns all associations stored in the database.get_roles
, returns the student roles in an association.get_memberships
, returns the associations of a specific student.add_email_address
, adds an email address to a student.add_student
, adds a student to the database.add_membership
, adds a student to an association in the database.delete_email_address
, removes the email address of a student.delete_membership
, deletes a student's membership in an association.update_first_name
, updates student's first name.update_last_name
, updates a student's last name.update_gender
, updates a student's gender.update_email_address
, updates the email address of a student.update_membership
, updates an existing membership of a student.
In file mstudent_test.py
we defined and implemented a test function for each of the functions listed above.
These test functions are called in the main section of file mstudent.py
: see the code after the line 487.
Here's what you have to do:
- Open file
mstudent.py
. - Implement the functions in the order in which they are defined: the order is important as the tests depend on that.
- Each function is commented in file
mstudent.py
. Read carefully all the comments to understand how to implement each function. - You'll have to use SQL queries in each function. Refer to the SQL cheat sheet if you have doubts about the SQL syntax.
- Whenever you finish the implementation of a function, run file
mstudent.py
as a Python script. This will trigger the execution of the tests. - If a test fails, you'll need to understand the error messages and fix the problem.
👉 Remember to remove the instruction raise NotImplementedError
when you implement a function. Otherwise, your function would always be considered as not implemented.
👉 The Registration
module (file mregistration.py
) is already implemented. Look at the code to understand how the Student
module should be implemented.
ANSWER ELEMENTS
You can download the skeleton with the solutions here.
def get_student(stud_number, cursor): stud_data = None # Gets the data from the table Student try: cursor.execute('SELECT * FROM student WHERE stud_number=?', \ (stud_number,)) stud_data = cursor.fetchone() except sqlite3.Error as error: print(error) return None # If no student can be found, return an empty tuple if stud_data is None: return () email_addresses = [] try: # Gets the email addresses of the student. cursor.execute("SELECT email FROM emailAddress WHERE stud_number=?", (stud_number,)) result = cursor.fetchall() for row in result: email_addresses.append(row[0]) except sqlite3.Error as error: print(error) return None return (stud_data[0], stud_data[1], stud_data[2], stud_data[3], email_addresses) ####### def get_associations(cursor): associations = [] try: # Gets the data from the table Association. cursor.execute('SELECT * FROM association') result = cursor.fetchall() for row in result: associations.append((row[0], row[1])) except sqlite3.Error as error: print(error) return None return associations ####### def get_roles(cursor): roles = [] try: # Get the student roles in the associations. cursor.execute('SELECT DISTINCT stud_role FROM membership') result = cursor.fetchall() for row in result: roles.append(row[0]) except sqlite3.Error as error: print(error) return None return roles ####### def get_memberships(stud_number, cursor): memberships = [] try: cursor.execute('SELECT asso_name, stud_role FROM membership WHERE stud_number=?', \ (stud_number,)) result = cursor.fetchall() for row in result: memberships.append((row[0], row[1])) except sqlite3.Error as error: print(error) return None return memberships ####### def add_email_address(stud_number, email_address, cursor): try: cursor.execute('INSERT INTO emailAddress VALUES (?, ?)',\ (email_address, stud_number)) except sqlite3.IntegrityError as error: return (False, DUPLICATE_EMAIL_ADDRESS, email_address) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def add_student(stud_number, first_name, last_name, gender, email_addresses, cursor): try: # Insert the values in the table Student cursor.execute('INSERT INTO student VALUES (?, ?, ?, ?)', \ (stud_number, first_name, last_name, gender)) except sqlite3.IntegrityError as error: return (False, DUPLICATE_STUD_NUMBER, None) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) # Add the email addresses of the student for email in email_addresses: res = add_email_address(stud_number, email, cursor) if res[0] == False: return res return (True, None, None) ####### def add_membership(stud_number, membership, cursor): try: cursor.execute('INSERT INTO membership VALUES (?, ?, ?)', \ (stud_number, membership[0], membership[1])) except sqlite3.IntegrityError as error: return (False, DUPLICATE_MEMBERSHIP, membership[0]) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def delete_email_address(stud_number, email_address, cursor): try: cursor.execute('DELETE FROM emailAddress WHERE stud_number=? AND email=?',\ (stud_number, email_address)) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def delete_membership(stud_number, asso_name, cursor): try: cursor.execute('DELETE FROM membership WHERE stud_number=? and asso_name=?', \ (stud_number, asso_name)) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def update_first_name(stud_number, first_name, cursor): try: cursor.execute('UPDATE student SET first_name=? WHERE stud_number=?',\ (first_name, stud_number)) except sqlite3.Error as error: print(error) return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def update_last_name(stud_number, last_name, cursor): try: cursor.execute('UPDATE student SET last_name=? WHERE stud_number=?',\ (last_name, stud_number)) except sqlite3.Error as error: print(error) return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def update_gender(stud_number, gender, cursor): try: cursor.execute('UPDATE student SET gender=? WHERE stud_number=?',\ (gender, stud_number)) except sqlite3.Error as error: print(error) return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def update_email_address(stud_number, old_email_address, new_email_address, cursor): try: cursor.execute('UPDATE emailAddress SET email=? WHERE stud_number=? AND email=?',\ (new_email_address, stud_number, old_email_address)) except sqlite3.IntegrityError as error: return (False, DUPLICATE_EMAIL_ADDRESS, new_email_address) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) return (True, None, None) ####### def update_membership(stud_number, old_association, new_association, role, cursor): try: cursor.execute('UPDATE membership SET asso_name=?, stud_role=? WHERE stud_number=?\ AND asso_name=?', (new_association, role, stud_number, old_association)) except sqlite3.IntegrityError as error: return (False, DUPLICATE_MEMBERSHIP, new_association) except sqlite3.Error as error: return (False, UNEXPECTED_ERROR, error) return (True, None, None)
Next lab
In the next lab we will:
- code a login window for
PistusResa
. - code the deadline and authentication modules of
PistusResa
(if time allows it).
👉 If you reached the end of this lab, and you still have time, you can start the next lab.