A physical model of a database is derived from a logical model by adding:
- a type to each column.
- any integrity constraint that a database management system (DBMS) must enforce on the data.
Unlike the logical model, the physical model is tied to a chosen DBMS; all column types and integrity constraints must be recognized by the chosen DBMS.
What is a Database management system (DBMS)?
A database management system (DBMS) is a software system that is used to create, maintain, and provide controlled access to user databases (J.Hoffer, V. Ramesh, H. Topi, Modern database management). A relational database management system (RDBMS) is a DBMS managing relational databases.
Most DBMSs are programs that are executed on a computer and listen to a port for incoming connections from other applications. In this configuration, the DBMS acts as a server, waiting for requests from clients (the other applications). For this reason, they are called client-server DBMSs. Note that:
- The data is stored on the same computer where the DBMS runs.
- The client applications do not usually run on the same computer as the DBMS.
Examples of client-server RDBMS include: MySQL, Oracle DB, Microsoft SQL Server, PostgreSQL.
Instead of running as server applications, other DBMSs are implemented as software libraries. They are collections of functions that an application can call and execute. The application directly manages the database through the software library. In other words, the database management system is tightly integrated (or, embedded) into the application. This is why such DBMSs are called embedded. Note that the data is stored on the same computer where the application runs.
Examples of embedded RDBMS include: SQLite, Apache Derby, HSQLDB.
👉 In our course, we use SQLite as a relational DBMS.
Column types
SQLite recognizes the following column types:
- INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
- REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
- TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
- BLOB. The value is a binary sequence.
Booleans
SQLite does not have a specific type for Boolean values. Boolean values are stored as integers: 0 means "false", 1 means "true".
Dates and times
SQLite does not have a specific type to represent dates and times. Instead, a date/time value can be stored as either a TEXT, REAL or INTEGER:
- TEXT as an ISO8601 string ("YYYY-MM-DD HH:MM:SS.SSS").
- REAL as a Julian day number, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
- INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.
👉 In this course, we'll use the first option (TEXT).
👉 SQLite provides functions to manipulate dates and times. However, we won't use them in our course. Instead, we'll read a date/time value from a database as is, and manipulate it with Python functions.
Integrity constraints
When we create a new table, we can specify one or more constraints for each column. The following are commonly used constraints:
- PRIMARY KEY. Uniquely identifies each row in a table. Columns that form a primary key cannot contain NULL values.
- NOT NULL. A column cannot have a NULL value.
- UNIQUE. All values in a column are different.
- FOREIGN KEY. Preserves the referential integrity between tables.
- CHECK. The values in a column must satisfy a specific condition.
- DEFAULT. Sets a default value for a column if no value is specified.
An example
Let's consider the logical model that we obtained here.
segment | (seg_id, seg_color, x1, y1, x2, y2, label_id) |
point | (x, y, p_color, label_id) |
label | (label_id, label_text, font_size, font_color) |
polygon | (pol_id, label_id) |
side | (seg_id, pol_id, index) |
Foreign keys:
segment(x1, y1)
referencespoint(x, y)
segment(x2, y2)
referencespoint(x, y)
segment(label_id)
referenceslabel(label_id)
point(label_id)
referenceslabel(label_id)
polygon(label_id)
referenceslabel(label_id)
side(seg_id)
referencessegment(seg_id)
side(pol_id)
referencespolygon(pol_id)
Table label
Here is the definition of table label
.
label ( label_id REAL PRIMARY KEY, label_text REAL NOT NULL, font_size INTEGER CHECK(font_size > 0) DEFAULT 11, font_color TEXT DEFAULT "#000000" CHECK (SUBSTR(font_color, 1, 1)='#') )
Note that:
- We can specify more than one integrity constraint for a column (see columns
font_size
andfont_color
). - The CHECK constraint takes a predicate as an argument.
- We can use any function supported by the DBMS in the definition of a CHECK constraint (e.g., the function SUBSTR). You can find the list of supported functions in SQLite in the documentation.
- For more complex CHECK constraints, we should define our own functions. However, SQLite does not support user defined functions by default. SQL user defined functions are out of the scope of this course.
Table point
Here is the definition of table point
.
point ( x REAL, y REAL, p_color TEXT DEFAULT "#000000" CHECK (SUBSTR(p_color, 1, 1)='#'), label_id INTEGER, PRIMARY KEY(x, y), FOREIGN KEY(label_id) REFERENCES label(label_id) )
👉 When the primary key is composed of more than one column, we must specify it at the end of the table definition.
👉 Column label_id
in table point
is linked to column label_id
in table label
through a foreign key constraint.
- Every value that occurs in column
label_id
of tablepoint
must occur in the columnlabel_id
of tablelabel
. This is called a referential integrity constraint. In other words, a point cannot have a label that does not exist in the tablelabel
. - On the other hand, column
label_id
in tablelabel
can contain values that do not necessarily occur in columnlabel_id
in tablepoint
. In other words, a label is not necessarily associated to a point.
Table segment
Here is the definition of table segment
.
segment ( seg_id INTEGER PRIMARY KEY, seg_color TEXT DEFAULT "#000000" CHECK (SUBSTR(seg_color, 1, 1)='#'), x1 REAL NOT NULL, y1 REAL NOT NULL, x2 REAL NOT NULL, y2 REAL NOT NULL, label_id INTEGER, FOREIGN KEY label REFERENCES label(label_id), FOREIGN KEY (x1, y1) REFERENCES point(x, y) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (x2, y2) REFERENCES point(x, y) ON DELETE CASCADE ON UPDATE CASCADE )
- In this table, two foreign key constraints link pairs of columns. These constraints ensure that a segment cannot have as an endpoint a point that does not exist.
- The options ON DELETE and ON UPDATE dictate how the DBMS must behave when any value in column
x
ory
in tablepoint
is deleted or updated. - ON DELETE CASCADE means that if a point is removed from the table
point
, all segments referencing to that point are removed too. - ON UPDATE CASCADE means that if the x-coordinate (respectively, y-coordinate) of a point is updated, the values in columns
x1
andx2
(respectively,y1
andy2
) referencing to that point are updated accordingly. - If we don't specify the options ON DELETE and ON UPDATE, the DBMS won't allow a point to be removed or updated.
Table polygon
Here is the definition of table polygon
.
polygon ( pol_id INTEGER PRIMARY KEY, label_id INTEGER, FOREIGN KEY(label_id) REFERENCES label(label_id) )
Table side
Here is the definition of table side
.
side ( seg_id INTEGER, pol_id INTEGER, index INTEGER NOT NULL, PRIMARY KEY(seg_id, pol_id), FOREIGN KEY(seg_id) REFERENCES segment(seg_id), FOREIGN KEY(pol_id) REFERENCES polygon(pol_id), )