%%html
<!-- The customized css for the slides -->
<link rel="stylesheet" type="text/css" href="../../assets/styles/basic.css"/>
<link rel="stylesheet" type="text/css" href="../../assets/styles/python-programming-basic.css"/>

43.5. Relational vs. non-relational database#

43.5.1. 1. Relational databases#

A relational database is a collection of information that organizes data in predefined relationships where data is stored in one or more tables (or “relations”) of columns and rows, making it easy to see and understand how different data structures relate to each other.[1]

1. What is a relational database (Rdbms)? (n.d.). Google Cloud. Retrieved 7 February 2023, from https://cloud.google.com/learn/what-is-a-relational-database

43.5.1.1. It all starts with tables#

Let’s begin our exploration by starting a table to store information about cities. We might start with their name and country. You could store this in a table as follows:

City

Country

Tokyo

Japan

Atlanta

United States

Auckland

New Zealand

Notice the column names of city, country and population describe the data being stored, and each row has information about one city.

43.5.1.2. The shortcomings of a single table approach#

Let’s start to add some additional data to our burgeoning database - annual rainfall (in millimeters). We’ll focus on the years 2018, 2019 and 2020. If we were to add it for Tokyo, it might look something like this:

City

Country

Year

Amount

Tokyo

Japan

2020

1690

Tokyo

Japan

2019

1874

Tokyo

Japan

2018

1445

You might notice we’re duplicating the name and country of the city over and over.

Let’s add new columns for each year:

City

Country

2018

2019

2020

Tokyo

Japan

1445

1874

1690

Atlanta

United States

1779

1111

1683

Auckland

New Zealand

1386

942

1176

While this avoids the row duplication, it adds a couple of other challenges.

  • We would need to modify the structure of our table each time there’s a new year.

  • Additionally, as our data grows having our years as columns will make it trickier to retrieve and calculate values.

This is why we need multiple tables and relationships.

43.5.1.3. The concepts of relationships#

Let’s return to our data and determine how we want to split things up.

City

Country

Tokyo

Japan

Atlanta

United States

Auckland

New Zealand

We need to figure out how to reference each city. We need some form of an identifier, ID or (in technical database terms) a primary key.

43.5.1.3.1. cities#

city_id

City

Country

1

Tokyo

Japan

2

Atlanta

United States

3

Auckland

New Zealand

With our cities table created, let’s store the rainfall.

43.5.1.3.2. rainfall#

rainfall_id

city_id

Year

Amount

1

1

2018

1445

2

1

2019

1874

3

1

2020

1690

4

2

2018

1779

5

2

2019

1111

6

2

2020

1683

7

3

2018

1386

8

3

2019

942

9

3

2020

1176

43.5.1.4. Retrieving the data#

If we are using a relational database such as MySQL, SQL Server or Oracle, we can use a language called Structured Query Language or SQL.

To retrieve data you use the command SELECT.

SELECT city
FROM cities;

-- Output:
-- Tokyo
-- Atlanta
-- Auckland

SELECT is where you list the columns, and FROM is where you list the tables.

Let’s imagine we only wanted to display cities in New Zealand. We need some form of a filter. The SQL keyword for this is WHERE.

SELECT city
FROM cities
WHERE country = 'New Zealand';

-- Output:
-- Auckland

43.5.1.5. Joining data#

Until now we’ve retrieved data from a single table. Now we want to bring the data together from both cities and rainfall. This is done by joining them together.

Let’s retrieve the rainfall for 2019 for all our cities.

SELECT cities.city
    rainfall.amount
FROM cities
    INNER JOIN rainfall ON cities.city_id = rainfall.city_id

Now we can add the WHERE statement to filter out the only year 2019.

SELECT cities.city
    rainfall.amount
FROM cities
    INNER JOIN rainfall ON cities.city_id = rainfall.city_id
WHERE rainfall.year = 2019

-- Output

-- city     | amount
-- -------- | ------
-- Tokyo    | 1874
-- Atlanta  | 1111
-- Auckland |  942

43.5.2. 2. Non-relational data#

A non-relational database stores data in a non-tabular form, and tends to be more flexible than the traditional, SQL-based, relational database structures. It does not follow the relational model provided by traditional relational database management systems.[1]

1. What is a non-relational database? (n.d.). MongoDB. Retrieved 9 February 2023, from https://www.mongodb.com/databases/non-relational

Relational VS Non Relational Databases[1]

1. Duca, A. L. (2021, October 5). Relational vs non relational databases. Medium. https://towardsdatascience.com/relational-vs-non-relational-databases-f2ac792482e3

43.5.2.1. NoSQL#

NoSQL is an umbrella term for the different ways to store non-relational data and can be interpreted as “non-SQL”, “non-relational” or “not only SQL”.

And these types of database systems can be categorized into 4 types.

Key-value databases pair unique keys, which are unique identifiers associated with a value. These pairs are stored using a hash table with an appropriate hashing function.

Graphical representation of a key-value data store showing 4 unique numerical keys that are associated with 4 various values[1]