Relational vs. non-relational database
Contents
%%html
<!-- The customized css for the slides -->
<link rel="stylesheet" type="text/css" href="../../assets/styles/basic.css"/>
<link rel="stylesheet" type="text/css" href="../../assets/styles/python-programming-basic.css"/>
43.5. Relational vs. non-relational database#
43.5.1. 1. Relational databases#
A relational database is a collection of information that organizes data in predefined relationships where data is stored in one or more tables (or “relations”) of columns and rows, making it easy to see and understand how different data structures relate to each other.[1]
1. What is a relational database (Rdbms)? (n.d.). Google Cloud. Retrieved 7 February 2023, from https://cloud.google.com/learn/what-is-a-relational-database
43.5.1.1. It all starts with tables#
Let’s begin our exploration by starting a table to store information about cities. We might start with their name and country. You could store this in a table as follows:
City |
Country |
---|---|
Tokyo |
Japan |
Atlanta |
United States |
Auckland |
New Zealand |
Notice the column names of city
, country
and population
describe the data being stored, and each row has information about one city.
43.5.1.2. The shortcomings of a single table approach#
Let’s start to add some additional data to our burgeoning database - annual rainfall (in millimeters). We’ll focus on the years 2018, 2019 and 2020. If we were to add it for Tokyo, it might look something like this:
City |
Country |
Year |
Amount |
---|---|---|---|
Tokyo |
Japan |
2020 |
1690 |
Tokyo |
Japan |
2019 |
1874 |
Tokyo |
Japan |
2018 |
1445 |
You might notice we’re duplicating the name and country of the city over and over.
Let’s add new columns for each year:
City |
Country |
2018 |
2019 |
2020 |
---|---|---|---|---|
Tokyo |
Japan |
1445 |
1874 |
1690 |
Atlanta |
United States |
1779 |
1111 |
1683 |
Auckland |
New Zealand |
1386 |
942 |
1176 |
While this avoids the row duplication, it adds a couple of other challenges.
We would need to modify the structure of our table each time there’s a new year.
Additionally, as our data grows having our years as columns will make it trickier to retrieve and calculate values.
This is why we need multiple tables and relationships.
43.5.1.3. The concepts of relationships#
Let’s return to our data and determine how we want to split things up.
City |
Country |
---|---|
Tokyo |
Japan |
Atlanta |
United States |
Auckland |
New Zealand |
We need to figure out how to reference each city. We need some form of an identifier, ID or (in technical database terms) a primary key.
43.5.1.3.1. cities#
city_id |
City |
Country |
---|---|---|
1 |
Tokyo |
Japan |
2 |
Atlanta |
United States |
3 |
Auckland |
New Zealand |
With our cities table created, let’s store the rainfall.
43.5.1.3.2. rainfall#
rainfall_id |
city_id |
Year |
Amount |
---|---|---|---|
1 |
1 |
2018 |
1445 |
2 |
1 |
2019 |
1874 |
3 |
1 |
2020 |
1690 |
4 |
2 |
2018 |
1779 |
5 |
2 |
2019 |
1111 |
6 |
2 |
2020 |
1683 |
7 |
3 |
2018 |
1386 |
8 |
3 |
2019 |
942 |
9 |
3 |
2020 |
1176 |
43.5.1.4. Retrieving the data#
If we are using a relational database such as MySQL, SQL Server or Oracle, we can use a language called Structured Query Language or SQL.
To retrieve data you use the command SELECT
.
SELECT city
FROM cities;
-- Output:
-- Tokyo
-- Atlanta
-- Auckland
SELECT
is where you list the columns, and FROM
is where you list the tables.
Let’s imagine we only wanted to display cities in New Zealand. We need some form of a filter. The SQL keyword for this is WHERE
.
SELECT city
FROM cities
WHERE country = 'New Zealand';
-- Output:
-- Auckland
43.5.1.5. Joining data#
Until now we’ve retrieved data from a single table. Now we want to bring the data together from both cities
and rainfall
. This is done by joining them together.
Let’s retrieve the rainfall for 2019 for all our cities.
SELECT cities.city
rainfall.amount
FROM cities
INNER JOIN rainfall ON cities.city_id = rainfall.city_id
Now we can add the WHERE
statement to filter out the only year 2019.
SELECT cities.city
rainfall.amount
FROM cities
INNER JOIN rainfall ON cities.city_id = rainfall.city_id
WHERE rainfall.year = 2019
-- Output
-- city | amount
-- -------- | ------
-- Tokyo | 1874
-- Atlanta | 1111
-- Auckland | 942
43.5.2. 2. Non-relational data#
A non-relational database stores data in a non-tabular form, and tends to be more flexible than the traditional, SQL-based, relational database structures. It does not follow the relational model provided by traditional relational database management systems.[1]
1. What is a non-relational database? (n.d.). MongoDB. Retrieved 9 February 2023, from https://www.mongodb.com/databases/non-relational
1. Duca, A. L. (2021, October 5). Relational vs non relational databases. Medium. https://towardsdatascience.com/relational-vs-non-relational-databases-f2ac792482e3
43.5.2.1. NoSQL#
NoSQL is an umbrella term for the different ways to store non-relational data and can be interpreted as “non-SQL”, “non-relational” or “not only SQL”.
And these types of database systems can be categorized into 4 types.
Key-value databases pair unique keys, which are unique identifiers associated with a value. These pairs are stored using a hash table with an appropriate hashing function.
1. admin. (2018, March 18). Azure Cosmos DB - key-value database in the cloud. Michał Białecki Blog. https://www.michalbialecki.com/en/2018/03/18/azure-cosmos-db-key-value-database-cloud/
Graph databases describe relationships in data and are represented as a collection of nodes and edges. A node represents an entity, something that exists in the real world such as a student or bank statement. Edges represent the relationship between two entities Each node and edge have properties that provide additional information about each node and edge.