CentraleSupélecDépartement informatique
Plateau de Moulon
3 rue Joliot-Curie
F-91192 Gif-sur-Yvette cedex
Physical model of a relational database

A physical model of a database is derived from a logical model by adding:

  • a type to each column.
  • any integrity constraint that a database management system (DBMS) must enforce on the data.

Unlike the logical model, the physical model is tied to a chosen DBMS; all column types and integrity constraints must be recognized by the chosen DBMS.


What is a Database management system (DBMS)?

A database management system (DBMS) is a software system that is used to create, maintain, and provide controlled access to user databases (J.Hoffer, V. Ramesh, H. Topi, Modern database management). A relational database management system (RDBMS) is a DBMS managing relational databases.

Most DBMSs are programs that are executed on a computer and listen to a port for incoming connections from other applications. In this configuration, the DBMS acts as a server, waiting for requests from clients (the other applications). For this reason, they are called client-server DBMSs. Note that:

  • The data is stored on the same computer where the DBMS runs.
  • The client applications do not usually run on the same computer as the DBMS.

Examples of client-server RDBMS include: MySQL, Oracle DB, Microsoft SQL Server, PostgreSQL.

Instead of running as server applications, other DBMSs are implemented as software libraries. They are collections of functions that an application can call and execute. The application directly manages the database through the software library. In other words, the database management system is tightly integrated (or, embedded) into the application. This is why such DBMSs are called embedded. Note that the data is stored on the same computer where the application runs.

Examples of embedded RDBMS include: SQLite, Apache Derby, HSQLDB.


👉 In our course, we use SQLite as a relational DBMS.

Column types

SQLite recognizes the following column types:

  • INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
  • REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
  • TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
  • BLOB. The value is a binary sequence.

Booleans

SQLite does not have a specific type for Boolean values. Boolean values are stored as integers: 0 means "false", 1 means "true".

Dates and times

SQLite does not have a specific type to represent dates and times. Instead, a date/time value can be stored as either a TEXT, REAL or INTEGER:

  • TEXT as an ISO8601 string ("YYYY-MM-DD HH:MM:SS.SSS").
  • REAL as a Julian day number, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
  • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

👉 In this course, we'll use the first option (TEXT).

👉 SQLite provides functions to manipulate dates and times. However, we won't use them in our course. Instead, we'll read a date/time value from a database as is, and manipulate it with Python functions.

Integrity constraints

When we create a new table, we can specify one or more constraints for each column. The following are commonly used constraints:

  • PRIMARY KEY. Uniquely identifies each row in a table. Columns that form a primary key cannot contain NULL values.
  • NOT NULL. A column cannot have a NULL value.
  • UNIQUE. All values in a column are different.
  • FOREIGN KEY. Preserves the referential integrity between tables.
  • CHECK. The values in a column must satisfy a specific condition.
  • DEFAULT. Sets a default value for a column if no value is specified.

An example

Let's consider the logical model that we obtained here.


segment(seg_id, seg_color, x1, y1, x2, y2, label_id)
point(x, y, p_color, label_id)
label(label_id, label_text, font_size, font_color)
polygon(pol_id, label_id)
side(seg_id, pol_id, index)

Foreign keys:

  • segment(x1, y1) references point(x, y)
  • segment(x2, y2) references point(x, y)
  • segment(label_id) references label(label_id)
  • point(label_id) references label(label_id)
  • polygon(label_id) references label(label_id)
  • side(seg_id) references segment(seg_id)
  • side(pol_id) references polygon(pol_id)

Table label

Here is the definition of table label.

label (
  label_id REAL PRIMARY KEY, 
  label_text REAL NOT NULL, 
  font_size INTEGER CHECK(font_size > 0) DEFAULT 11, 
  font_color TEXT DEFAULT "#000000" CHECK (SUBSTR(font_color, 1, 1)='#')
)

Note that:

  • We can specify more than one integrity constraint for a column (see columns font_size and font_color).
  • The CHECK constraint takes a predicate as an argument.
  • We can use any function supported by the DBMS in the definition of a CHECK constraint (e.g., the function SUBSTR). You can find the list of supported functions in SQLite in the documentation.
  • For more complex CHECK constraints, we should define our own functions. However, SQLite does not support user defined functions by default. SQL user defined functions are out of the scope of this course.

Table point

Here is the definition of table point.

point (
  x REAL, 
  y REAL, 
  p_color TEXT DEFAULT "#000000" CHECK (SUBSTR(p_color, 1, 1)='#'),
  label_id INTEGER,
  PRIMARY KEY(x, y),
  FOREIGN KEY(label_id) REFERENCES label(label_id)
)

👉 When the primary key is composed of more than one column, we must specify it at the end of the table definition.

👉 Column label_id in table point is linked to column label_id in table label through a foreign key constraint.

  • Every value that occurs in column label_id of table point must occur in the column label_id of table label. This is called a referential integrity constraint. In other words, a point cannot have a label that does not exist in the table label.
  • On the other hand, column label_id in table label can contain values that do not necessarily occur in column label_id in table point. In other words, a label is not necessarily associated to a point.

Table segment

Here is the definition of table segment.

segment (
   seg_id INTEGER PRIMARY KEY, 
   seg_color TEXT DEFAULT "#000000" CHECK (SUBSTR(seg_color, 1, 1)='#'), 
   x1 REAL NOT NULL, 
   y1 REAL NOT NULL, 
   x2 REAL NOT NULL, 
   y2 REAL NOT NULL, 
   label_id INTEGER,
   FOREIGN KEY label REFERENCES label(label_id),
   FOREIGN KEY (x1, y1) REFERENCES point(x, y) ON DELETE CASCADE ON UPDATE CASCADE,
   FOREIGN KEY (x2, y2) REFERENCES point(x, y) ON DELETE CASCADE ON UPDATE CASCADE
)

  • In this table, two foreign key constraints link pairs of columns. These constraints ensure that a segment cannot have as an endpoint a point that does not exist.
  • The options ON DELETE and ON UPDATE dictate how the DBMS must behave when any value in column x or y in table point is deleted or updated.
  • ON DELETE CASCADE means that if a point is removed from the table point, all segments referencing to that point are removed too.
  • ON UPDATE CASCADE means that if the x-coordinate (respectively, y-coordinate) of a point is updated, the values in columns x1 and x2 (respectively, y1 and y2) referencing to that point are updated accordingly.
  • If we don't specify the options ON DELETE and ON UPDATE, the DBMS won't allow a point to be removed or updated.

Table polygon

Here is the definition of table polygon.

polygon (
  pol_id INTEGER PRIMARY KEY, 
  label_id INTEGER,
  FOREIGN KEY(label_id) REFERENCES label(label_id) 
)

Table side

Here is the definition of table side.

side (
  seg_id INTEGER, 
  pol_id INTEGER,
  index INTEGER NOT NULL, 
  PRIMARY KEY(seg_id, pol_id),
  FOREIGN KEY(seg_id) REFERENCES segment(seg_id),
  FOREIGN KEY(pol_id) REFERENCES polygon(pol_id),
)