Unveiling the Flavors of Database Design - Understanding Relational Models, Keys, and Relationships

July 1, 2023 · 12 min read

OLake Maintainer

Unveiling the Flavors of Database Design - Understanding Relational Models, Keys, and Relationships cover image

Check out OLake, our latest open source offering for replicating databases to lakehouses.

Imagine your kitchen as a bustling database, with recipe books serving as the tables and ingredients neatly organized in rows and columns. Just as a well-structured recipe book empowers you to create delicious meals, relational database models and design allow organizations to store, retrieve, and manipulate data efficiently.

In this blog, we'll embark on a flavorful journey to understand these concepts, using relatable daily life examples to make you see databases in a new light.

Relational Model

The relational model is a fundamental framework for organizing and managing data in a database. It involves several key components that work together to provide a structured and efficient way of storing and retrieving information.

Let's explore these components and tie them up with a music streaming platform example:

Tables: Structured containers that hold related data in a database.
Entities: Real-world objects or concepts we want to represent in the database.
Relationships: Connections established between entities in different tables, defining associations and dependencies.
Keys: Unique identifiers used to identify and distinguish rows within a table or establish relationships between tables.
Rows: Individual instances or records within a table containing specific data related to an entity.
Columns: Attributes or properties of an entity representing specific data types within a table.

Now, let's tie it all together with an example. Imagine our music streaming platform has a "Playlists" table. Each row in the "Playlists" table represents a specific playlist created by a user. Columns in this table could include attributes like Playlist ID, Name, and Creator.

Using relationships and keys, we can connect the "Playlists" table with the "Songs" table using a foreign key such as Song ID. This allows us to associate specific songs with a playlist, enabling users to create personalized collections of their favorite tracks.

Database Table Relationships

In a relational database, table relationships are established through the use of keys that define how tables are connected to each other. There are different types of table relationships, including:

One-to-One (1:1) Relationship:

In a one-to-one relationship, each record in one table is associated with exactly one record in another table.
This relationship is not commonly used, as the information could often be combined into a single table.
An example could be a "Person" table and a "Passport" table, where each person has only one passport.

One-to-Many (1:N) Relationship:

In a one-to-many relationship, a record in one table can be associated with multiple records in another table, but each record in the second table is associated with only one record in the first table.
This is the most common type of relationship.
An example could be a "Customer" table and an "Orders" table, where each customer can have multiple orders, but each order belongs to only one customer.

Many-to-Many (N:N) Relationship:

In a many-to-many relationship, multiple records in one table can be associated with multiple records in another table.
To implement this relationship, a third table called a junction table or an associative table is used, which contains foreign keys from both tables.
An example could be a "Students" table and a "Courses" table, where each student can enroll in multiple courses, and each course can have multiple students.

These relationships are established by using primary keys and foreign keys. A primary key uniquely identifies each record in a table, while a foreign key is a field in one table that refers to the primary key in another table.

By defining and utilizing these relationships, data can be structured and organized in a way that allows for efficient querying, data integrity, and data consistency within a relational database system.

Keys

In a relational database, keys play a crucial role in identifying and establishing relationships between tables. A few of the most common keys are:

Primary Key:

A primary key is a unique identifier for each record in a table.
It ensures that each record can be uniquely identified and accessed within the table.
Typically, a primary key is defined on one or more columns in a table.
It enforces the uniqueness and integrity of the data in the table.
Examples of primary keys include a user ID, product code, or order number.

Foreign Key:

A foreign key is a field or set of fields in one table that refers to the primary key in another table.
It establishes relationships between tables, enabling the creation of links or associations between related data.
The foreign key in a table references the primary key in another table.
It helps enforce referential integrity and maintains consistency between related tables.
For example, in a "Customers" table, the "OrderID" column could be a foreign key referencing the "Orders" table's primary key.

Unique Key:

A unique key ensures that the values in one or more columns are unique within a table.
Unlike a primary key, a unique key does not necessarily serve as the primary means of identification.
It can have null values, but if not null, it enforces uniqueness.
A table can have multiple unique keys.
An example could be a "username" field in a "Users" table, where each username must be unique.

Composite Key:

A composite key is a key that consists of two or more columns to uniquely identify a record in a table.
It is used when a single column cannot uniquely identify a record.
The combination of the columns in a composite key must be unique.
For instance, in a "Sales" table, a composite key may consist of "OrderID" and "ProductID" to uniquely identify each sale.

These keys help ensure data integrity, establish relationships between tables, and facilitate efficient querying and retrieval of data in a relational database system.

Index

Indexes in a database are data structures that improve the speed and efficiency of data retrieval operations. They work by creating an organized representation of the data in a specific column or set of columns, allowing the database management system (DBMS) to quickly locate and access the desired data.

Primary Index:

A primary index is an index that is based on the primary key of a table.
It defines the physical order of data within the table.
Typically, a primary index is implemented as a clustered index, where the physical order of the rows in the table matches the order of the primary key.
There can only be one primary index in a table.

Secondary Index:

A secondary index is an index that is based on a non-primary key column(s) of a table.
It provides an alternative way to access data in a table without relying on the primary key.
Unlike a primary index, the order of data in the table is not determined by the secondary index.
A table can have multiple secondary indexes, allowing for different columns to be indexed for efficient querying.

Multi-level Index:

A multi-level index, also known as a hierarchical index, is an index structure that consists of multiple levels or layers.
It is designed to improve the efficiency of accessing data in large databases with many records.
The index hierarchy allows for faster searching and reduces the number of disk I/O operations required.
The top level of the index points to lower-level indexes or directly to the actual data.
Each subsequent level provides a more detailed index to narrow down the search space.

It's important to note that the exact implementation and terminology can vary depending on the database management system (DBMS) being used. Different databases may have their own variations and additional types of indexes.

Normalization

Normalization is a process in database design that aims to eliminate redundancy and improve data integrity by organizing data into logical and efficient structures.

It involves breaking down larger tables into smaller, well-structured tables while maintaining the relationships between them.

The normalization process follows a set of rules or normal forms, with each normal form representing a level of data organization.

Here are the commonly recognized normal forms:

First Normal Form (1NF)

Eliminates duplicate rows and ensures atomicity of data.
Each column in a table must hold atomic values, meaning it cannot be further divided.
There should be a unique identifier (primary key) for each row.
Example: Ensuring that each cell in a table contains only a single value, not a list or multiple values.

Second Normal Form (2NF)

Builds upon 1NF and ensures that non-key attributes are functionally dependent on the entire primary key.
In other words, all non-key attributes should depend on the entire primary key, not just part of it.
Example: If a table has a composite primary key, each non-key attribute should be dependent on the entire composite key, not just a subset of it.

Third Normal Form (3NF)

Builds upon 2NF and ensures that non-key attributes are not transitively dependent on the primary key.
It aims to eliminate transitive dependencies by moving them to separate tables.
Example: If a non-key attribute depends on another non-key attribute, it should be moved to a separate table.

There are higher normal forms beyond 3NF, such as Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF), and Fifth Normal Form (5NF), which address more complex dependencies and further, reduce redundancy. However, they are less commonly used in typical database design.

ACID Properties

ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These properties are a set of principles that ensure reliable and robust transaction processing in a database system. Here's a brief summary of the ACID properties:

Atomicity: Ensures that a transaction is treated as a single, indivisible unit of work. It either succeeds completely or fails completely, preventing partial or inconsistent changes.
Consistency: Ensures that a transaction brings the database from one consistent state to another consistent state, satisfying predefined rules or constraints. Data remains accurate and valid.
Isolation: Ensures that concurrent transactions do not interfere with each other. Each transaction is executed as if it were the only transaction running, preventing issues like dirty reads or non-repeatable reads.
Durability: Ensures that once a transaction is committed, its changes are permanent and will survive any subsequent system failures or crashes. Committed data is stored in non-volatile memory for long-term reliability.

These properties collectively provide reliability, consistency, isolation, and durability in transaction processing, ensuring data integrity and maintaining the reliability of database systems.

Codd's Rule

Codd's rules, also known as Edgar F. Codd's 12 rules, were proposed by Edgar F. Codd, a pioneer in the field of relational databases.

These rules define a set of principles that a database management system (DBMS) must adhere to to be considered a truly relational DBMS. Here is a summary of Codd's 12 rules:

Rule 0: The Foundation Rule:

A DBMS must be based on a solid theoretical foundation, which is the relational model.

Rule 1: The Information Rule:

All information in a relational database is to be represented explicitly as values in tables.

Rule 2: Guaranteed Access Rule:

Each data value in a relational database is guaranteed to be accessible by using a combination of a table name, primary key value, and column name.

Rule 3: Systematic Treatment of Null Values:

A relational DBMS must support a systematic way to represent and handle missing or unknown information using NULL values.

Rule 4: Active Online Catalog:

The structure and definition of the database, including table definitions, must be stored in an online, active catalog that can be accessed by authorized users.

Rule 5: Comprehensive Data Sublanguage Rule:

A relational DBMS must support a comprehensive data sublanguage that can be used for defining, manipulating, and querying the database.

Rule 6: View Updating Rule:

Views (derived relations) that are theoretically updatable must also be updatable by the system.

Rule 7: High-Level Insert, Update, and Delete:

A relational DBMS must support high-level insert, update, and delete operations, allowing users to specify changes to the database at a conceptual level, without needing to specify low-level details.

Rule 8: Physical Data Independence:

Changes to the physical storage or access methods of the database should not require any changes to the logical or conceptual aspects of the database.

Rule 9: Logical Data Independence:

Changes to the logical structure or organization of the database should not require any changes to the application programs using the database.

Rule 10: Integrity Independence:

Integrity constraints should be specified and stored in the database and should not be embedded in application programs.

Rule 11: Distribution Independence:

The distribution of the database across different locations and systems should be invisible to users and should not affect the conceptual or logical aspects of the database.

These rules were originally formulated by Codd in 1985 as guidelines to define the characteristics of a true relational DBMS, providing a foundation for the development and evaluation of relational database technologies.

Conclusion

In a world of databases and tables, relationships and keys hold the keys to success. Indexes come to the rescue, speeding up data access with finesse. Normalization saves the day, fighting redundancy with might.

ACID properties ensure transactions shine bright. And Codd's rules, the guiding light, lead us to the land of relational delights. So, embrace these concepts, and in the realm of databases, you'll always do right!

Relational Model​

Database Table Relationships​

One-to-One (1:1) Relationship:​

One-to-Many (1:N) Relationship:​

Many-to-Many (N:N) Relationship:​

Keys​

Primary Key:​

Foreign Key:​

Unique Key:​

Composite Key:​

Index​

Primary Index:​

Secondary Index:​

Multi-level Index:​

Normalization​

First Normal Form (1NF)​

Second Normal Form (2NF)​

Third Normal Form (3NF)​

ACID Properties​

Codd's Rule​

Rule 0: The Foundation Rule:​

Rule 1: The Information Rule:​

Rule 2: Guaranteed Access Rule:​

Rule 3: Systematic Treatment of Null Values:​

Rule 4: Active Online Catalog:​

Rule 5: Comprehensive Data Sublanguage Rule:​

Rule 6: View Updating Rule:​

Rule 7: High-Level Insert, Update, and Delete:​

Rule 8: Physical Data Independence:​

Rule 9: Logical Data Independence:​

Rule 10: Integrity Independence:​

Rule 11: Distribution Independence:​

Conclusion​