Introduction to Databases
Introducing Database Usage with SQLite
Today we are going to be dipping our toes into the world of databases. Up until this point we have just been utilizing Excel to store and manage our data. This is.... less than ideal to say the least. I could wax poetic about how Excel is not the ideal medium to host our data in, but instead I will point you to a fairly good article going over the differences between excel and databases and when to use each.
https://towardsdatascience.com/excel-vs-sql-a-conceptual-comparison-dcfbee640c83
Now, the database system we are going to utilize is SQLite because it is built into the native python distribution and is pretty easy to pick up and use. There are a few basics we are going to go over today, but as a quick background to databases, you will generally have a central table known as a 'fact table' that contains all of the relational information to connect those records to a 'dimension table'. So for instance, you could have a fact table of all of the players, and their respective keys to relate them to your stat tables, player lists from DraftKings and Fanduel, etc.
We'll go over the ins and outs more in depth when we actually start building things out, but for now just getting familiar with some of the terminology will go a long way.
Let's get started with the actual code here, first off let's go ahead and imoprt our sql module, and no pip install is needed here since it is included in the native python distribution
import sqlite3
Next up, we are going to simply define a file path to create our database file with.
dbFile = 'testDB.db'
Now there are a few terminology things to go over while we create our database and get ready to start working in it.
First up, we need to define what's called a connection, this is simply connecting our python code to the database itself, and if there is no database existing at the file path specified, it will create a database at that location.
conn = sqlite3.connect(dbFile)
Now that we have connected our python shell to the database we need to establish what's called a cursor. This is basically the tool that will allow us to do work within our database. The way we can work here is by defining a cursor object, and then passing SQL statements through the cursor to perform actions within our database.
If you are unfamiliar with SQL as a language, don't panic. It's different than Python, but it's probably one of the easiest to read and understand as it's pretty close to natural english as far as reading and comprehending.
cur = conn.cursor()
cur.execute("""CREATE TABLE IF NOT EXISTS players(
playerid INT PRIMARY KEY,
First_Name TEXT,
Last_Name TEXT,
Team TEXT);
""")
conn.commit()
Okay, so what we did above here is creating a table within our database called 'players', then defining a few column names and the data type that can be accepted into it.
Then we're going to be doing the same thing creating a teams table using the same method.
cur.execute("""CREATE TABLE IF NOT EXISTS teams(
teamid INT PRIMARY KEY,
city TEXT,
state TEXT,
team TEXT,
abbreviation TEXT);
""")
conn.commit()
Now that we've created a couple of tables, we're going to insert a row into our table with the relevant player information. To do this, the format we use is an insert statement that defines the table, then a tuple with the columns we're adding data to, then another tuple carrying the actual values we are populating it with.
cur.execute("""INSERT INTO players(playerid, First_Name, Last_Name, Team)
VALUES(00001, 'Marcus', 'Smart', 'Celtics');""")
conn.commit()
Next, we're going to go over the basic syntax for selecting records from a table.
cur.execute("SELECT * FROM players;")
cur.fetchall()
[(1, 'Marcus', 'Smart', 'Celtics')]
newPlayers = [('0002', 'Jayson', 'Tatum', 'Celtics'),
('0003', 'Lebron', 'James', 'Lakers')]
cur.executemany("INSERT INTO players VALUES(?, ?, ?, ?);",
newPlayers)
conn.commit()
cur.execute("SELECT * FROM players;")
cur.fetchall()
[(1, 'Marcus', 'Smart', 'Celtics'),
(2, 'Jayson', 'Tatum', 'Celtics'),
(3, 'Lebron', 'James', 'Lakers')]
newTeams = [('01', 'Boston', 'Massachusetts', 'Celtics', 'BOS'),
('02', 'Los Angeles', 'California', 'Lakers', 'LAL')]
cur.executemany("INSERT INTO teams VALUES(?, ?, ?, ?, ?);",
newTeams)
conn.commit()
cur.execute("SELECT * FROM teams;")
cur.fetchall()
[(1, 'Boston', 'Massachusetts', 'Celtics', 'BOS'),
(2, 'Los Angeles', 'California', 'Lakers', 'LAL')]
cur.execute("""SELECT teams.team, players.First_Name, players.Last_Name FROM
teams LEFT JOIN players ON players.Team=teams.team;""")
cur.fetchall()
[('Celtics', 'Jayson', 'Tatum'),
('Celtics', 'Marcus', 'Smart'),
('Lakers', 'Lebron', 'James')]
cur.execute("""SELECT teams.team, players.First_Name, players.Last_Name FROM teams
LEFT JOIN players ON players.Team=teams.team
WHERE teams.team = 'Celtics';""")
print(cur.fetchall())
[('Celtics', 'Jayson', 'Tatum'), ('Celtics', 'Marcus', 'Smart')]
cur.execute("""SELECT teams.team, players.First_Name, players.Last_Name FROM teams
LEFT JOIN players ON players.Team=teams.team
WHERE teams.team = 'Lakers';""")
print(cur.fetchall())
[('Lakers', 'Lebron', 'James')]
cur.execute("""SELECT teams.team, players.First_Name, players.Last_Name FROM teams
LEFT JOIN players ON players.Team=teams.team
WHERE teams.state = 'California';""")
print(cur.fetchall())
[('Lakers', 'Lebron', 'James')]
cur.execute(""" SELECT * FROM players WHERE team='Celtics'""")
cur.fetchall()
[(1, 'Marcus', 'Smart', 'Celtics'), (2, 'Jayson', 'Tatum', 'Celtics')]
conn.close()