In Part 1 of this series about database keys, we looked at the difference between natural and surrogate keys, using a table with data about friends. Here’s that table again:
SELECT * FROM friends; first_name | last_name | hair_color | id ------------+-----------+------------+---- Monica | Geller | | 1 Ross | Geller | | 2 Chandler | Bing | | 3 Joey | Tribbiani | | 4 Rachel | Green | | 5 Phoebe | Buffet | blonde | 6 Monica | Lewinsky | | 7 Al | Green | | 8 Joey | Kangaroo | orange | 9 (9 rows)
The table contains columns for data about first name, last name, and hair color, plus an id
column that was created to uniquely identify each row, regardless of whether or not the values in the other columns were unique.
Primary Keys
The id
column of the friends
table is an example of a primary key. A primary key is used to uniquely identify a row in the same table. Columns can be explicitly specified as primary keys in PostgreSQL using PRIMARY KEY
, as was done when the column was created in the previous article:
ALTER TABLE friends ADD COLUMN id serial PRIMARY KEY;
A primary key is usually, though not always, an automatically-incrementing integer. But no matter the data type, a primary key must be unique, and so surrogate keys make great primary keys.
Foreign Keys
While a primary key identifies a row in its own table, a foreign key identifies a row in another table in the database.
Let’s say we want to store data about the city where each of our friends lives. Since many of our friends live in the same city, we’ll just store cities in a separate table. This way, we only have to enter each city once, preventing duplication. Each city will have its own id, which is its primary key. Let’s create the table and add some data:
CREATE TABLE cities( id serial PRIMARY KEY, name text ); INSERT INTO cities (name) VALUES ('New York'), ('Memphis'), ('Sydney'), ('Washington, D.C.');
So the cities
table looks like this:
SELECT * FROM cities; id | name ----+------------------ 1 | New York 2 | Memphis 3 | Sydney 4 | Washington, D.C. (4 rows)
Now, to link the city data to the friends data, we can add another column in the friends
table to store each the id of the appropriate city. We’ll name this column city_id
. The city id, which was the primary key for the cities
table, becomes a foreign key in the friends
table.
ALTER TABLE friends ADD COLUMN city_id integer REFERENCES cities(id);
When creating this new column, the REFERENCES
keyword specifies that it is a foreign key column, and cities(id)
specifies which primary key the column is referencing.
Now we can update the friends
table with the city data:
UPDATE friends SET city_id = 1 WHERE id BETWEEN 1 AND 6; UPDATE friends SET city_id = 2 WHERE id = 8; UPDATE friends SET city_id = 3 WHERE id = 9; UPDATE friends SET city_id = 4 WHERE id = 7;
Here is our friends
table with both primary and foreign key columns populated:
SELECT * FROM friends; first_name | last_name | hair_color | id | city_id ------------+-----------+------------+----+--------- Monica | Geller | | 1 | 1 Ross | Geller | | 2 | 1 Chandler | Bing | | 3 | 1 Joey | Tribbiani | | 4 | 1 Rachel | Green | | 5 | 1 Phoebe | Buffet | blonde | 6 | 1 Al | Green | | 8 | 2 Joey | Kangaroo | orange | 9 | 3 Monica | Lewinsky | | 7 | 4 (9 rows)
In this table, while the primary key must be unique, the foreign key does not have this restriction. After all, it makes sense – six of our friends live in New York, and therefore share the same city id.
So, to summarize:
- Primary keys uniquely identify data within a table.
- Foreign keys identify data in a separate table.
So hopefully these two posts have been useful in clearing up any confusion about the types of keys in a database, whether natural or surrogate, primary or foreign.