CentraleSupélecDépartement informatique
Plateau de Moulon
3 rue Joliot-Curie
F-91192 Gif-sur-Yvette cedex
Tutorial 13: Data pipeline and analysis with Pandas

Mandatory activity

Objective. This tutorial shows how to use Pandas to build a data pipeline and perform basic data analysis and visualization.

Preparation. Follow these instructions to prepare the working environment:

  1. Fork this project in GitLab.
  2. Clone the forked project using the git clone command.
  3. Open the project with Visual Studio Code.
  4. Open a terminal and create a Python virtual environment: python3 -m venv .venv
  5. Activate the virtual environment: source .venv/bin/activate
  6. Upgrade pip: pip install --upgrade pip
  7. Install the required libraries in the virtual environment: pip install -r requirements.txt

Work to do. Open notebook notebooks/td13-pandas-en.ipynb and follow the instructions.


ANSWER ELEMENTS

Solutions available on GitLab: git clone git@gitlab-research.centralesupelec.fr:sip/students/solutions/td13-pandas-solutions.git


Optional: interactive maps with Folium

For interactive maps, you can use the Folium library (user guide).

For this activity, it is better to write your code in a Python script instead of a notebook, because the Visual Studio Code notebook extension does not support Folium maps very well.

Creating a map in Folium is relatively simple:

import folium
m = folium.Map()
m.show_in_browser()

A map will be displayed in your web browser. Press Ctrl+C to exit the program.

Position and zoom

With the previous code, you display a full world map. You can, however, control both the portion of the world you want to visualize and the zoom level.

  • Using the documentation, display a map centered on France with an appropriate zoom level.

ANSWER ELEMENTS

m = folium.Map([47, 2], zoom_start=6)
m.show_in_browser()

Bounding boxes and markers

Bounding boxes and markers are useful to highlight or pinpoint precise locations on a map.

  • Using the documentation, add a red bounding box to the map that encloses metropolitan France (including Corsica).
  • Add a marker at the location of Paris. You can refer to this source to get the coordinates of the bounding box.

ANSWER ELEMENTS

import folium

m = folium.Map([47, 2], zoom_start=6)

folium.PolyLine(
    [
        [41.0, -5.5], 
        [52.0, -5.5], 
        [52.0, 10.0], 
        [41.0, 10.0], 
        [41.0, -5.5]
    ], 
    color='red'
).add_to(m)

folium.Marker(
    [48.8575, 2.3514], 
    popup = "Paris, our beautiful capital!"
).add_to(m)

m.show_in_browser()

If you followed the examples in the documentation, you may have added a simple popup to the Paris marker using the popup argument of the Marker constructor (if not, try it). The popup text wraps over several lines because the default popup size is small.

  • Use the same documentation page to find a way to adjust the popup size for the Paris marker.

ANSWER ELEMENTS

import folium

m = folium.Map([47, 2], zoom_start=6)

folium.PolyLine(
    [
        [41.0, -5.5], 
        [52.0, -5.5], 
        [52.0, 10.0], 
        [41.0, 10.0], 
        [41.0, -5.5]
    ], 
    color='red'
).add_to(m)

paris_marker = folium.Marker([48.8575, 2.3514])
paris_popup = folium.Popup("Paris, our beautiful capital!", max_width=300)
paris_marker.add_child(paris_popup)

paris_marker.add_to(m)

m.show_in_browser()

In the previous exercises, we used a PolyLine to draw a rectangular bounding box. Folium also provides more advanced vector layers: These layers let you overlay geometric shapes on a map.

  • Overlay a lightly shaded hexagon over France.

ANSWER ELEMENTS

import folium

m = folium.Map([47, 2], zoom_start=6)

locations = [
    [48.4, -4.8],   # Brest
    [51.1, 2.4],    # Dunkirk  
    [48.7, 7.8],    # Strasbourg
    [43.7, 7.5],    # Nice
    [42.5, 2.9],    # Perpignan
    [43.4, -1.8]    # Biarritz
]

folium.Polygon(
    locations = locations,
    color = 'red',
    weight = 6,
    fill_color = 'red',
    fill_opacity = 0.2,
    fill = True
).add_to(m)

m.show_in_browser()

Choropleth maps

Now we want to draw an interactive choropleth map displaying the median house price in France. For this, you will need to use the Folium Choropleth class. You will need to look at examples and the complete class reference:

Here are a few important points:

  1. Internally, the Choropleth class uses GeoJSON, an open standard for representing spatial features.
  2. The geo_data argument of the Choropleth constructor accepts a GeoPandas GeoDataFrame; internally, this is converted to GeoJSON. Therefore, any column in the GeoDataFrame is accessible through feature.properties.column_name. This is important when you specify the GeoDataFrame attribute used to merge the GeoDataFrame with the DataFrame containing the statistics to visualize (price per square meter).
  3. The Choropleth class applies a GeoJSON overlay to the map. Information about this overlay is kept in the geojson attribute of the Choropleth object. The constructor arguments allow you to modify the underlying GeoJSON without using it explicitly.
  4. For some tasks (for example, adding a popup), you will need to use the geojson object directly.
  • Using the examples and the Choropleth class specification, create a choropleth map of France showing the median house price per square meter, by department or by region.
  • Complement the map with clickable popups displaying the department or region name. You can use GeoJsonPopup (documented here) and add it to the choropleth map's geojson object.

ANSWER ELEMENTS

import folium
import geopandas
import pandas as pd

# Read the DataFrames that we obtained from the first part of the tutorial.
cities_gdf = geopandas.read_file('./data/transformed/geo-communes-2026.zip')
transactions_df = pd.read_parquet('./data/transformed/valeursfoncieres-2025.parquet')

# Dissolve the geometry to obtain polygons for the departments
cities_gdf = cities_gdf.to_crs(2154).dissolve(by='dept_code').reset_index()
cities_gdf['geometry'] = cities_gdf['geometry'].buffer(0.0001)

# Select only the department code and the m2_price
transactions_df = transactions_df[['dept_code', 'm2_price']]

# Compute the median m2 price per department
transactions_df = transactions_df\
    .groupby('dept_code', as_index=False)\
    .median()

# Initalize the map.
m = folium.Map([46, 2], zoom_start=6)

# Create the choropleth map.
choropleth = folium.Choropleth(
    geo_data=cities_gdf,                    # The GeoDataFrame containing spatial data.
    data=transactions_df,                   # The DataFrame containing the statistics to display.
    columns=["dept_code", "m2_price"],      # This must be contain exactly two columns: the key used to merge data with geo_data and the value.
    key_on="feature.properties.dept_code",  # This refers to the dept_code in the GeoJson representation of the GeoDataFrame.
    fill_opacity=0.9,                       # Opacity of the polygons.
    legend_name="Price per m2"              # Legend attached to the map.
)

# Add the choropleth to the map
choropleth.add_to(m)

# Create a popup, where the text is taken from the column dept_name
popup = folium.GeoJsonPopup(fields=["dept_name"])

# Add the popup to the geojson overlay.
popup.add_to(choropleth.geojson)

# Show the map
m.show_in_browser()

Now you are ready to explore more Folium features on your own. For example, you may want to create a timeline showing the evolution of house prices in France across months within a year, or across years, since you have access to five years of data.