-
Notifications
You must be signed in to change notification settings - Fork 71
API Core
We'll start by creating all the tables that will hold the data and place them in a separate schema called data
. There is no particular requirement to have the tables in that schema nor do they all have to be in a single schema, but for the purpose of this tutorial let's keep them there, it will give us an understanding of how your models are decoupled from the api you expose.
Create one file per table (notice we already have a users.sql file)
-- sql/data/clients.sql
create table data.clients (
id serial primary key,
name text not null,
address text,
user_id int not null references data.users(id),
created_on timestamptz not null default now(),
updated_on timestamptz
);
create index clients_user_id_index on data.clients(user_id);
-- sql/data/projects.sql
create table data.projects (
id serial primary key,
name text not null,
client_id int not null references data.clients(id),
user_id int not null references data.users(id),
created_on timestamptz not null default now(),
updated_on timestamptz
);
create index projects_user_id_index on data.projects(user_id);
create index projects_client_id_index on data.projects(client_id);
-- sql/data/tasks.sql
create table data.tasks (
id serial primary key,
name text not null,
completed bool not null default false,
project_id int not null references data.projects(id),
user_id int not null references data.users(id),
created_on timestamptz not null default now(),
updated_on timestamptz
);
create index tasks_user_id_index on data.tasks(user_id);
create index tasks_project_id_index on data.tasks(project_id);
-- sql/data/project_comments.sql
create table data.project_comments (
id serial primary key,
body text not null,
project_id int not null references data.projects(id),
user_id int not null references data.users(id),
created_on timestamptz not null default now(),
updated_on timestamptz
);
create index project_comments_user_id_index on data.project_comments(user_id);
create index project_comments_project_id_index on data.project_comments(project_id);
-- sql/data/task_comments.sql
create table data.task_comments (
id serial primary key,
body text not null,
task_id int not null references data.tasks(id),
user_id int not null references data.users(id),
created_on timestamptz not null default now(),
updated_on timestamptz
);
create index task_comments_user_id_index on data.task_comments(user_id);
create index task_comments_project_id_index on data.task_comments(project_id);
We added user_id
column on each table because it will help us later with enforcing access rights for each row with a simple rule (instead of doing complicated joins).
Having two separate tables for comments (task_comments, project_comments) is not the best schema design and it's a bit forced but it will be useful for this tutorial to showcase how the api can be decoupled from the underlying tables.
Edit sql/data/schema.sql
to look like this
\echo # Loading data schema
create schema data;
\ir users.sql;
\ir clients.sql;
\ir projects.sql;
\ir tasks.sql;
\ir project_comments.sql;
\ir task_comments.sql;
\ir sample_data/data.sql;
This scehma (api
) will be the schema that we will be exposing for REST and GraphQL. It will contain only views and stored procedures.
Create one file per view (ignore the other files that are there for the moment)
-- sql/api/clients.sql
create or replace view api.clients as
select id, name, address, created_on, updated_on from data.clients;
-- sql/api/projects.sql
create or replace view api.projects as
select id, name, client_id, created_on, updated_on from data.projects;
-- sql/api/tasks.sql
create or replace view api.tasks as
select id, name, completed, project_id, created_on, updated_on from data.tasks;
-- sql/api/tasks.sql
create or replace view api.tasks as
select id, name, completed, project_id, created_on, updated_on from data.tasks;
-- sql/api/comments.sql
create or replace view api.comments as
select
id, body, 'project'::text as parent_type, project_id as parent_id,
project_id, null as task_id, created_on, updated_on
from data.project_comments
union
select id, body, 'task'::text as parent_type, task_id as parent_id,
null as project_id, task_id, created_on, updated_on
from data.task_comments;
Edit sql/api/schema.sql
to look like this.
\echo # Loading api schema
create schema api;
grant usage on schema api to anonymous;
grant usage on schema api to webuser;
\ir me.sql;
\ir login.sql;
\ir logout.sql;
\ir refresh_token.sql;
\ir clients.sql;
\ir projects.sql;
\ir tasks.sql;
\ir comments.sql;
Notice how we were able to expose only the fields we wanted and only the tables we wanted. We could have renamed the columns also, for example maybe your front end devs turn their faces in disgust when the see something like first_name
, no worries, you can make them happy and rename that column in the view to firstName
and everything will work just fine.
We also inherited a data
schema which has two separate tables to hold the comments but for the api we were able to expose them using a single entity and hide the underlying tables (we'll get back to making insert/update/delete work for it a bit later). This is how you can elegantly decouple your implementation details (the underlying source tables) from the api you expose, using views or in more complex cases, stored procedures.
It's no fun testing with empty tables, let's add some sample data but instead of writing long boring sql statements we'll use datafiller to generate some data.
If we were to directly run it on our current schema we would have the same number of items in each table but usually we want a different distribution so let's tell datafiller
to do that for us by adding a few small comments.
Find the respective files in sql/data/
directory and add the comment for each one.
Make sure each file ends with an empty new line (we'll need that a bit later when we will use the cat
command)
create table data.users ( -- df: mult=1.0
...
updated_on timestamptz -- df: isnull
create table data.clients ( -- df: mult=3.0
...
updated_on timestamptz -- df: isnull
create table data.projects ( -- df: mult=5.0
...
updated_on timestamptz -- df: isnull
create table data.tasks ( -- df: mult=20.0
...
updated_on timestamptz -- df: isnull
create table data.project_comments ( -- df: mult=60.0
...
updated_on timestamptz -- df: isnull
create table data.task_comments ( -- df: mult=60.0
...
updated_on timestamptz -- df: isnull
Let's generate some data
# get a shell in the devtools container
sub0
# go to the dir holding our tables
cd /project/sql/data
# generate data
cat users.sql clients.sql projects.sql tasks.sql project_comments.sql task_comments.sql | datafiller --size=1 > sample_data/data.sql
# exit the container
exit
If you want to generate more data, you just change the size parameter. This cool program (datafiller) has a lot more nice feature that will allow you to generate even nicer data, for example use real person names instead of random text. Check the documentation here
Since we have some data in the tables let's see how far did we get.
curl -H "Authorization: Bearer $JWT_TOKEN" http://localhost:8080/rest/clients
{"hint":null,"details":null,"code":"42501","message":"permission denied for relation clients"}
Believe it or not, this is a good thing :), we added a bunch of tables and views but this does not mean the api users can just access them, we need to explicitly state who can access what resource.
Let's start by giving the authenticated API users the ability to access clients/projects/tasks endpoints. This means we need to grant them rights to the views in the api schema.
Edit the respective files and add the grant lines
-- sql/api/clients.sql
...
grant select, insert, update, delete on api.clients to webuser;
-- sql/api/projects.sql
...
grant select, insert, update, delete on api.projects to webuser;
-- sql/api/tasks.sql
...
grant select, insert, update, delete on api.tasks to webuser;
-- sql/api/comments.sql
...
grant select, insert, update, delete on api.comments to webuser;
Let's try that request again
curl -H "Authorization: Bearer $JWT_TOKEN" http://localhost:8080/rest/clients
[{"id":1,"name":"name_5_5","address":"address_5_5_5_5_","created_on":"2017-02-13T09:53:33+00:00","updated_on":null},{"id":2,"name":"name_5_5","address":"address_4_","created_on":"2017-02-13T09:57:33+00:00","updated_on":null},{"id":3,"name":"name_4_4_4_4_4","address":"address_6_6_6","created_on":"2017-02-13T09:57:33+00:00","updated_on":null},{"id":4,"name":"name_5_5","address":"address_2_2_2_2_","created_on":"2017-02-13T09:54:33+00:00","updated_on":null},{"id":5,"name":"name_2_2","address":"address_6_6_6","created_on":"2017-02-13T09:57:33+00:00","updated_on":null},{"id":6,"name":"name_5_5","address":"address_3_3_3","created_on":"2017-02-13T09:57:33+00:00","updated_on":null}]
Look at that :)!
So we have our api, nice and decoupled, but as things stand now, anyone (that can login) is able to modify the existing data and see all of it. I would not want to use a system where everyone can see all my clients and projects. We will fix that but before we get to it, an interlude.
You are probably wondering "What's up with that, where does that role come from and why this hardcoded $JWT
token just works on my system? It does not seem very secure to me.".
Well, there is nothing magic about it, the webuser
role is defined in sql/global/roles.sql
and for this tutorial it suits us just fine, we only need to differentiate between logged in users (webuser) and the rest (anonymous). Although you can use specific roles to mean specific users (alice, bob) you'll probably use database roles to mean groups of users (admin,employee,customer). Notice how in the users
table we have a column user_type
and it's value is webuser
.
Now about the mystery JWT
token. I was only able to generate a valid token (that does not expire) because I knew the secret with which Sub0 is started (which is secret
and is defined in .env
). Head over to jwt.io site and paste in the token
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoxLCJyb2xlIjoid2VidXNlciJ9.vAN3uJSleb2Yj8RVPRsb1UBkokqmKlfl6lJ2bg3JfFg
You will see it's payload decoded
{
"user_id": 1,
"role": "webuser"
}
We use the information in the jwt payload to see who is making the request.
It's important to understand that anyone can look at the JWT payload, it's just a base64 encoded string (so don't put sensitive data inside it) but they can not alter it's payload without knowing the secret.
The jwt's can be easily generated, even inside the database, check out the login function sql/api/login.sql
, look at how simple and clear is that code.
We left things in the state of every logged in user having access to all the data rows in the main tables (through views).
So let's fix that. For this we'll use a feature introduced in PostgreSQL 9.5 called "Row Level Security", RLS for short.
Before we start defining policies for each of the tables, because the users are accessing the data in the tables through views in the api
schema, we will need to explicitly specify the owner of the view.
This is needed because when someone accesses a table with policies through a view, the current_user
switches from the (database) logged in role to the role of the view owner.
Edit the api views and add the respective lines
-- sql/api/clients.sql
...
alter view api.clients owner to api_users;
-- sql/api/projects.sql
...
alter view api.projects owner to api_users;
-- sql/api/tasks.sql
...
alter view api.tasks owner to api_users;
-- sql/api/comments.sql
...
alter view api.comments owner to api_users;
Just as with the webuser
, api_users
is a simple role defined in sql/global/roles.sql
Now we get to the interesting part, defining the policies for each table and we'll use the information contained in the JWT token to restrict the rows a user can access
-- sql/data/clients.sql
...
alter table data.clients enable row level security;
grant select, insert, update, delete on data.clients to api_users;
create policy access_own_rows on data.clients to api_users, webuser
using ( util.current_role() = 'webuser' and util.app_user_id() = user_id )
with check ( util.current_role() = 'webuser' and util.app_user_id() = user_id);
-- sql/data/projects.sql
...
alter table data.projects enable row level security;
grant select, insert, update, delete on data.projects to api_users;
create policy access_own_rows on data.projects to api_users, webuser
using ( util.current_role() = 'webuser' and util.app_user_id() = user_id )
with check ( util.current_role() = 'webuser' and util.app_user_id() = user_id);
-- sql/data/tasks.sql
...
alter table data.tasks enable row level security;
grant select, insert, update, delete on data.tasks to api_users;
create policy access_own_rows on data.tasks to api_users, webuser
using ( util.current_role() = 'webuser' and util.app_user_id() = user_id )
with check ( util.current_role() = 'webuser' and util.app_user_id() = user_id);
-- sql/data/project_comments.sql
...
alter table data.project_comments enable row level security;
grant select, insert, update, delete on data.project_comments to api_users;
create policy access_own_rows on data.project_comments to api_users, webuser
using ( util.current_role() = 'webuser' and util.app_user_id() = user_id )
with check ( util.current_role() = 'webuser' and util.app_user_id() = user_id);
-- sql/data/task_comments.sql
...
alter table data.task_comments enable row level security;
grant select, insert, update, delete on data.task_comments to api_users;
create policy access_own_rows on data.task_comments to api_users, webuser
using ( util.current_role() = 'webuser' and util.app_user_id() = user_id )
with check ( util.current_role() = 'webuser' and util.app_user_id() = user_id);
When new data is inserted, we need to give a default value for the user_id
column in each table.
Have this column definition for all tables in the data
schema look like this
...
user_id int not null references data.users(id) default util.app_user_id(),
...
Ok, so now everything is setup, everyone can get to his data and change it, but how do they get their foot in the door, how do they signup, how do they login.
This is where the stored procedures come into play, and by that I mean any stored procedure defined in the api
schema.
Whenever you have to implement something that can not be expressed using a single query, i.e. it's a sequential process, you would use a stored procedure like this. There is always a temptation to design your whole api as a set of stored procedures, it seems you have greater/more targeted control but you will be putting yourself in the corner and you will have to write a new function for each new feature. Try sticking to views whenever possible and make them as general as possible then let the clients select from them with additional filters.
If you can, write those functions in PL/pgSQL (but you can also use other languages), it does not have a shiny syntax and it can be weird in places but because of it's close relation to the database and SQL, it can express some problems very neatly and concise, even if you've never looked at this language before, you'll get it's meaning instantly. Here, I'll show you
Check out sql/api/login.sql
...
create or replace function api.login(email text, password text) returns api.me as $$
declare
usr data.users;
begin
select * into usr
from data.users as u
where u.email = $1 and u.password = crypt($2, u.password);
if not found then
raise exception 'invalid email/password';
else
return (usr.id, usr.firstname, usr.lastname, usr.email, usr.created_on, usr.updated_on,
pgjwt.sign(
json_build_object(
'role', usr.user_type,
'user_id', usr.id,
'exp', extract(epoch from now())::integer + 3600 -- token expires in 1 hour
),
util.env_var('postgrest.jwt_secret')
));
end if;
end
$$ stable security definer language plpgsql;
...
Let's try and call it
curl -i \
-d '{"email":"[email protected]","password":"pass"}' \
http://localhost:8080/rest/rpc/login?select=firstname,lastname,email
HTTP/1.1 200 OK
Server: openresty
Date: Wed, 15 Feb 2017 08:43:32 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: SESSIONID=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIiA6ICJ3ZWJ1c2VyIiwgInVzZXJfaWQiIDogMSwgImV4cCIgOiAxNDg3MTUxODEzfQ.7lUPxyfee02Ur5GCgwt-XXnrVlYEgXbhyYFJkqaqrlg; Expires=Wed, 15 Feb 2017 09:43:33 GMT; Max-Age=3601; Path=/; HttpOnly
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Engine: "nginx"
Method: POST
Request-Time: 0.020
{"lastname":"eyeaeesvnr","email":"[email protected]","firstname":"ueecansuae"}
Back in the old days, the DB was the gate keeper, it made sure who had access to what and most importantly it did not let any bad data get it (assuming the DBA knew what he was doing). It did not matter how you accessed the database and from what language, you could not write asdf
in a field that was supposed to hold a phone number. Well, we are going to bring that back again, we are going to make databases great again :) and we are going to do that by using constraints.
No more validators all over the place that check the user input (and that you always forget to implement in your new shiny client that accesses the database).
By placing a constraint on a field, no one can bypass it, even if he really wants to.
You can read more about constraints here but for our case, I'll just paste the new table definitions and notice how nice the syntax is. You can use in constraint definitions any valid sql expression that returns a bool
, even custom functions defined by you.
Here are a few examples of constraints one might choose to set up
create table users (
...
check (length(firstname)>2),
check (length(lastname)>2),
check (updated_on is null or updated_on > created_on),
check (email ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$')
);
create table clients (
...
check( length(name)>2 and length(name)<100 ),
check (updated_on is null or updated_on > created_on)
);
create table projects (
...
check(length(name)>2),
check (updated_on is null or updated_on > created_on)
);
create table tasks (
...
check(length(name)>2),
check (updated_on is null or updated_on > created_on)
);
create table task_comments (
...
check(length(body)>2),
check (updated_on is null or updated_on > created_on)
);
create table project_comments (
...
check(length(body)>2),
check (updated_on is null or updated_on > created_on)
);
Let's check if the rules are applied
curl \
-H "Authorization: Bearer $JWT_TOKEN" \
-H 'Prefer: return=representation' \
-d '{"name":"A"}' \
http://localhost:8080/rest/clients
{"hint":null,"details":null,"code":"42501","message":"permission denied for sequence clients_id_seq"}
Woops :) again, a good thing :) the database is very specific and strict about access rights.
Let's fix that
-- sql/data/clients.sql
...
grant usage on sequence data.clients_id_seq to webuser;
-- sql/data/projects.sql
...
grant usage on sequence data.projects_id_seq to webuser;
-- sql/data/tasks.sql
...
grant usage on sequence data.tasks_id_seq to webuser;
-- sql/data/task_comments.sql
...
grant usage on sequence data.task_comments_id_seq to webuser;
-- sql/data/project_comments.sql
...
grant usage on sequence data.project_comments_id_seq to webuser;
Now the request again
curl \
-H "Authorization: Bearer $JWT_TOKEN" \
-H 'Prefer: return=representation' \
-d '{"name":"A"}' \
http://localhost:8080/rest/clients
{"hint":null,"details":null,"code":"23514","message":"new row for relation \"clients\" violates check constraint \"clients_name_check\""}
curl \
-H "Authorization: Bearer $JWT_TOKEN" \
-H 'Prefer: return=representation' \
-d '{"name":"AAA"}' \
http://localhost:8080/rest/clients
{"id":8,"name":"AAA","address":null,"created_on":"2017-02-15T08:32:55.165262+00:00","updated_on":null}
Let's stop for a second and see how much we've accomplished so far.
curl -s -G \
-H "Authorization: Bearer $JWT_TOKEN" \
http://localhost:8080/rest/clients \
--data-urlencode select="id,name,projects{id,name,comments{body},tasks{id,name,comments{body}}}" | \
python -mjson.tool
What you see above is a query that will return all the clients and their projects and for each of the projects we ask also for tasks and comments. All that in a single request. At the same time, the access to those entities is checked and only the right rows are returned.
Let's see how much code we had to write for that.
$ cloc --exclude-dir=sample_data --include-lang=SQL sql/api/ sql/data/
16 text files.
16 unique files.
1 file ignored.
http://cloc.sourceforge.net v 1.64 T=0.08 s (190.1 files/s, 3397.5 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
SQL 16 34 30 222
-------------------------------------------------------------------------------
SUM: 16 34 30 222
-------------------------------------------------------------------------------
In just about 200 LOC, out of which about half are table definitions, we implemented a REST and GraphQL API (it's there, we'll look at it a bit later). We have an authorization mechanism and we can make sure each user sees only what he is supposed to see (his data). We've decoupled our api
from our underlying data
model by using a separate schema that has only views and stored procedures in it. We do our user input validation using constraints and for any process that is more complex and takes few steps to complete (login/signup) we can use stored procedures in all the languages supported by PostgreSQL. To do all this we did not have to write a single line of imperative logic, we only defined our data and the rules controlling the access.
And that is all, we have the core or our API ready, the part that deals strictly with the data inside our database. It's time to take a break from all that coding and click around a bit and check out that GraphQL I've been promising :)