%%html
<!-- The customized css for the slides -->
<link rel="stylesheet" type="text/css" href="../../assets/styles/basic.css"/>
<link rel="stylesheet" type="text/css" href="../../assets/styles/python-programming-basic.css"/>

43.5. Relational vs. non-relational database#

43.5.1. 1. Relational databases#

A relational database is a collection of information that organizes data in predefined relationships where data is stored in one or more tables (or “relations”) of columns and rows, making it easy to see and understand how different data structures relate to each other.^[1]

1. What is a relational database (Rdbms)? (n.d.). Google Cloud. Retrieved 7 February 2023, from https://cloud.google.com/learn/what-is-a-relational-database

43.5.1.1. It all starts with tables#

Let’s begin our exploration by starting a table to store information about cities. We might start with their name and country. You could store this in a table as follows:

City	Country
Tokyo	Japan
Atlanta	United States
Auckland	New Zealand

Notice the column names of city, country and population describe the data being stored, and each row has information about one city.

43.5.1.2. The shortcomings of a single table approach#

Let’s start to add some additional data to our burgeoning database - annual rainfall (in millimeters). We’ll focus on the years 2018, 2019 and 2020. If we were to add it for Tokyo, it might look something like this:

City	Country	Year	Amount
Tokyo	Japan	2020	1690
Tokyo	Japan	2019	1874
Tokyo	Japan	2018	1445

You might notice we’re duplicating the name and country of the city over and over.

Let’s add new columns for each year:

City	Country	2018	2019	2020
Tokyo	Japan	1445	1874	1690
Atlanta	United States	1779	1111	1683
Auckland	New Zealand	1386	942	1176

While this avoids the row duplication, it adds a couple of other challenges.

We would need to modify the structure of our table each time there’s a new year.
Additionally, as our data grows having our years as columns will make it trickier to retrieve and calculate values.

This is why we need multiple tables and relationships.

43.5.1.3. The concepts of relationships#

Let’s return to our data and determine how we want to split things up.

City	Country
Tokyo	Japan
Atlanta	United States
Auckland	New Zealand

We need to figure out how to reference each city. We need some form of an identifier, ID or (in technical database terms) a primary key.

43.5.1.3.1. cities#

city_id	City	Country
1	Tokyo	Japan
2	Atlanta	United States
3	Auckland	New Zealand

With our cities table created, let’s store the rainfall.

43.5.1.3.2. rainfall#

rainfall_id	city_id	Year	Amount
1	1	2018	1445
2	1	2019	1874
3	1	2020	1690
4	2	2018	1779
5	2	2019	1111
6	2	2020	1683
7	3	2018	1386
8	3	2019	942
9	3	2020	1176

43.5.1.4. Retrieving the data#

If we are using a relational database such as MySQL, SQL Server or Oracle, we can use a language called Structured Query Language or SQL.

To retrieve data you use the command SELECT.

SELECT city
FROM cities;

-- Output:
-- Tokyo
-- Atlanta
-- Auckland

SELECT is where you list the columns, and FROM is where you list the tables.

Let’s imagine we only wanted to display cities in New Zealand. We need some form of a filter. The SQL keyword for this is WHERE.

SELECT city
FROM cities
WHERE country = 'New Zealand';

-- Output:
-- Auckland

43.5.1.5. Joining data#

Until now we’ve retrieved data from a single table. Now we want to bring the data together from both cities and rainfall. This is done by joining them together.

Let’s retrieve the rainfall for 2019 for all our cities.

SELECT cities.city
    rainfall.amount
FROM cities
    INNER JOIN rainfall ON cities.city_id = rainfall.city_id

Now we can add the WHERE statement to filter out the only year 2019.

SELECT cities.city
    rainfall.amount
FROM cities
    INNER JOIN rainfall ON cities.city_id = rainfall.city_id
WHERE rainfall.year = 2019

-- Output

-- city     | amount
-- -------- | ------
-- Tokyo    | 1874
-- Atlanta  | 1111
-- Auckland |  942

43.5.2. 2. Non-relational data#

A non-relational database stores data in a non-tabular form, and tends to be more flexible than the traditional, SQL-based, relational database structures. It does not follow the relational model provided by traditional relational database management systems.^[1]

1. What is a non-relational database? (n.d.). MongoDB. Retrieved 9 February 2023, from https://www.mongodb.com/databases/non-relational

Relational VS Non Relational Databases^[1]

1. Duca, A. L. (2021, October 5). Relational vs non relational databases. Medium. https://towardsdatascience.com/relational-vs-non-relational-databases-f2ac792482e3

43.5.2.1. NoSQL#

NoSQL is an umbrella term for the different ways to store non-relational data and can be interpreted as “non-SQL”, “non-relational” or “not only SQL”.

And these types of database systems can be categorized into 4 types.

Key-value databases pair unique keys, which are unique identifiers associated with a value. These pairs are stored using a hash table with an appropriate hashing function.

Graphical representation of a key-value data store showing 4 unique numerical keys that are associated with 4 various values^[1]

Type	Notable examples of this type
Key–value cache	Apache Ignite, Couchbase, Coherence, eXtreme Scale, Hazelcast, Infinispan, Memcached, Redis, Velocity
Key–value store	Azure Cosmos DB, ArangoDB, Amazon DynamoDB, Aerospike, Couchbase, ScyllaDB
Key–value store (eventually consistent)	Azure Cosmos DB, Oracle NoSQL Database, Riak, Voldemort
Key–value store (ordered)	FoundationDB, InfinityDB, LMDB, MemcacheDB
Tuple store	Apache River, GigaSpaces, Tarantool, TIBCO ActiveSpaces, OpenLink Virtuoso
Triplestore	AllegroGraph, MarkLogic, Ontotext-OWLIM, Oracle NoSQL database, Profium Sense, Virtuoso Universal Server
Object database	Objectivity/DB, Perst, ZopeDB, db4o, GemStone/S, InterSystems Caché, JADE, ObjectDatabase++, ObjectDB, ObjectStore, ODABA, Realm, OpenLink Virtuoso, Versant Object Database, ZODB
Document store	Azure Cosmos DB, ArangoDB, BaseX, Clusterpoint, Couchbase, CouchDB, DocumentDB, eXist-db, IBM Domino, MarkLogic, MongoDB, RavenDB, Qizx, RethinkDB, Elasticsearch, OrientDB
Wide Column Store	Azure Cosmos DB, Amazon DynamoDB, Bigtable, Cassandra, Google Cloud Datastore, HBase, Hypertable, ScyllaDB
Native multi-model database	ArangoDB, Azure Cosmos DB, OrientDB, MarkLogic, Apache Ignite,[22][23] Couchbase, FoundationDB, Oracle Database
Graph database	Azure Cosmos DB, AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso
Multivalue database	D3 Pick database, Extensible Storage Engine (ESE/NT), InfinityDB, InterSystems Caché, jBASE Pick database, mvBase Rocket Software, mvEnterprise Rocket Software, Northgate Information Solutions Reality (the original Pick/MV Database), OpenQM, Revelation Software’s OpenInsight (Windows) and Advanced Revelation (DOS), UniData Rocket U2, UniVerse Rocket U2

Ocademy Open Machine Learning Book

Relational vs. non-relational database

Contents

43.5. Relational vs. non-relational database#

43.5.1. 1. Relational databases#

43.5.1.1. It all starts with tables#

43.5.1.2. The shortcomings of a single table approach#

43.5.1.3. The concepts of relationships#

43.5.1.3.1. cities#

43.5.1.3.2. rainfall#

43.5.1.4. Retrieving the data#

43.5.1.5. Joining data#

43.5.2. 2. Non-relational data#

43.5.2.1. NoSQL#

43.5.2.1.1. NoSQL databaseTypes and examples^{1^#}

43.5.3. 4. Your turn! 🚀#

43.5.4. 5. References#

Ocademy Open Machine Learning Book

Relational vs. non-relational database

Contents

43.5. Relational vs. non-relational database#

43.5.1. 1. Relational databases#

43.5.1.1. It all starts with tables#

43.5.1.2. The shortcomings of a single table approach#

43.5.1.3. The concepts of relationships#

43.5.1.3.1. cities#

43.5.1.3.2. rainfall#

43.5.1.4. Retrieving the data#

43.5.1.5. Joining data#

43.5.2. 2. Non-relational data#

43.5.2.1. NoSQL#

43.5.2.1.1. NoSQL databaseTypes and examples1#

43.5.3. 4. Your turn! 🚀#

43.5.4. 5. References#

43.5.2.1.1. NoSQL databaseTypes and examples^{1^#}