URLs and REST

… It turns out the things that holds the web together make a dandy API.

URLs and REST

URLs are obviously an important part of web sites and apps.

Unfortunately, many developers don't give them enough attention.

URL Basics

Some basic ideas of URLs that you might have missed…

URL Basics

URLs are opaque.

Opaque in the sense that they are just a string to pass server ↔ client. They have no semantic meaning in HTTP.

Is http://example.com/products/wrench.html about products? Are there other products within the hierarchy? Is it HTML? Static? Maybe, maybe not.

URL Basics

The point for now: HTTP doesn't care.

A request comes in for the URL path /products/wrench.html and server sends an HTTP response. The rest is still to be decided.

URL Basics

URLs identify a resource.

An HTTP resource is a piece of content: description of a product, picture from your vacation, style information for a page.

All of those things that are on the web are resources.

URL Basics

The resource's URL identifies that piece of content: gives it a unique thing that can be used to distinguish it from all other things on the web.

The URL http://example.com/products/wrench refers to (we imagine) the description of one product at one company.

URL Basics

Identification vs location? URL = Uniform Resource Locator. URI = Uniform Resource Identifier.

Everything we see is both: web URLs identify content and give the browser a way to locate them. (Contact this server by HTTP and make a request like…)

There are non-web URIs that give no information about locating the resource. e.g. urn:isbn:1480560367

URL Basics

File formats represent a resource.

There are (at least conceptually) many ways to represent a particular content. The various file formats we know give us a way to express that content that the user (agent) will be able to handle.

Product description could be HTML, PDF, or MS Word. The vacation picture could be JPEG, WebP, or a panorama viewing app. Web page style information could be CSS, XSLT, or some future stylesheet tech.

URL Basics

Summary…

HTTP thinks URLs are just strings to identify a resource.

Those resources are one conceptual piece of content that has to be represented using some file format.

URLs Identify

If URLs are going to identify a resource, then there should be a one-to-one correspondence between resources and URLs.

That can go wrong in two ways…

URLs Identify

Multiple URLs for one resource can't be collapsed by any automated tool, even if users might imagine they are really identical.

e.g. having home page at http://www.example.com/, http://example.com/, and http://example.com/index.html.

URLs Identify

Results:

Can confuse users. Are they really identical?
Must be separately crawled by search engines.
Splits the content's PageRank n ways.
Must be cached separately.

URLs Identify

Be consistent in the URLs you use: pick one (schema) and stick with it.

If you end up with multiple URLs, redirect instead of returning the content in many places.

Specifying a <link rel="canonical" href="…" /> fixes the ambiguity for search engines, but not the other problems.

URLs Identify

Multiple resources at one URL prevent URLs from locating content.

e.g. tabbed content revealed by JS, like those created by jQueryUI tabs: there are many versions of the “content” at that URL.

e.g. unlinkable Flash blobs

URLs Identify

Results:

Google can't link to the content.
Can't be bookmarked.
Can't send my friend a link to the content. Click the link, then the third tab, then scroll down.

That's not what you want from your site.

URLs Identify

Pages that are built by JS and frontend frameworks generally suffer from this: the page is the way it is because of many things after the URL was loaded.

The JavaScript browser history API and JS framework routers can compensate if you are careful about it.

URLs Identify

Another implication of URLs identify content: they should be persistent. If a piece of content is available at that URL, then it should continue to have that identifier until the end of time.

Never put transient info (like session identifiers) in URLs.

If you have to move things, leave a redirect in their place.

URL Usability

We know that URLs are an opaque string within HTTP. The client/server/protocol doesn't care as long as they are unique identifiers.

So the shape of your URLs doesn't matter? Are these equally good URLs for a resource?

http://example.com/products/wrench
http://example.com/wp-content/324
http://example.com/40e7511c-a326-405f

URL Usability

URLs are an important part of the user interface of your site/app.

Supporting evidence: destination URLs are displayed in Google search results, Bing search results, DuckDuckGo search results. These are very carefully designed pages. They wouldn't display URLs if users weren't looking at them.

URL Usability

The technology doesn't care about URLs but users do. Conclusion: they are part of your site's user interface.

When designing a site/app, treat them like they are.

URL Usability

URLs should be helpful to the user. They should be human-readable, informative, and relatively short.

Ways to do this…

URL Usability

Most important: actually think about your users when creating URL layout.

How will they see information related? What will they care about in the URL?

URL Usability

Avoid URL shorteners.

They create a short tweetable URL, but lose any meaning for the user. Keep the short, meaningless URLs in tweets where they belong.

http://bit.ly/2fC3Pb5
http://bit.ly/AboutSFUCS
http://www.sfu.ca/computing/about.html

URL Usability

… but keep URLs short.

If users want to email/message/tweet/whatever about your site, that's a good thing. The URL had better fit in a single line/message.

URL Usability

Avoid escaped characters. Limited characters can be used in a URL; the rest must be URL encoded. These are impossible for a user to read.

e.g. space turns into %20.

URL Usability

Browsers will display these unescaped in the address bar, but they're still there if copying-and-pasting, or displaying in a message.

e.g. the URL that sometimes looks like

http://example.com/欢迎

is really

http://example.com/%E6%AC%A2%E8%BF%8E

URL Usability

Mixed case is also hard to read. Generally stick to the characters that are safe in a URL, easily pronounced, etc. That leaves us:

lowercase (English) letters,
(Arabic) digits,
- (dash) and maybe _ (underscore),
/ for hierarchy…

Sorry, non-English sites.

URL Hierarchy

One piece of a URL that has a little semantics: /

The slash-separated parts of the URL implicitly form a hierarchy. A link <a href="../foo"> takes one slash-separated component off the URL.

So, the URL http://example.com/products/wrench refers to wrench that is in some way “inside” products.

URL Hierarchy

Use / to build a hierarchy for the resources in your site.

How will (should) the users perceive the relationship between resources? Build that into the hierarchy of your site.

URL Hierarchy

The URL hierarchy should be informative and guessable.

e.g. given the example http://example.com/products/wrench, I predict these URLs might exist:

…/products/: product list/search.
…/products/hammer: another product.
…/products/wrench/specs: product technical specs.

URL Hierarchy

Any defaults provided by your framework might not be the best choice. The defaults can't know about the relationships between your models. Compare:

…/products/123/show: product info.
…/options/456/show: one colour it comes in.

…/products/hammer/: product info.
…/products/hammer/option/red: one colour it comes in.

URL Hierarchy

Design the hierarchy for users, not the implementation.

Older tech often encouraged bad URLs. What does the user care about here?

http://example.com/pim-mod/show_person.py?id=237

Don't care about code modules, language being used, database primary keys. The GET request implies show.

Do care: example.com (presumably) and person.

URL Hierarchy

A more human-friendly URL for that resource would be like:

http://example.com/people/ggbaker

Hierarchy implies that we're looking at one person in the collection of people. We can guess who that person is.

URL Hierarchy

We can continue to expand the hierarchy in reasonable ways, depending on the information we want to present:

…/people/: list of people (we're allowed to access).
…/people/jsmith: another person.
…/people/ggbaker/phone: person's phone numbers.
…/people/ggbaker/phone/office: person's office phone number.

URL Hierarchy

URL hierarchies don't have to be deep. Having too many path components can make URLs unreadably long.

e.g. Wikipedia has two: language (en) and topic (Vancouver). https://en.wikipedia.org/wiki/Vancouver

Maybe a third: disambiguation. https://en.wikipedia.org/wiki/Vancouver_(steamboat)

Implementing Good URLs

With any technology, it should be possible to design good URLs.

Implementing Good URLs

Web servers will call your logic (with WSGI, Rack, etc) as configured and give the rest of the URL path as an argument (historically, an environment variable PATH_INFO).

e.g. http://example.com/blog/2021/01/something might be code at /blog/ that would get PATH_INFO value /2021/01/something. After that, it's just programming.

Implementing Good URLs

It's possible to do URL rewriting in the frontend server. e.g. mod_rewrite

That's okay in simple cases, but it's usually a mess. Avoid if possible.

Implementing Good URLs

Modern frameworks allow arbitrary URL construction with the router/dispatcher/controller.

Incoming URLs are matched against a series of rules (regular expressions or other patterns). Each rule corresponds to a controller function that can respond to requests for URLs in that pattern.

Implementing Good URLs

In Express:

app.get('/blog/:year/:post', function (req, res) { … })

In Rails routes.rb:

get '/blog/:year/:post' => 'blog#get_post'

In Django urls.py:

url(r'^blog/<int:year>/<slug:post>$', views.view_post),

Implementing Good URLs

Give some thought to your URLs as you're constructing them, and especially as you're starting a new site/module.

Be cautious of defaults/conventions which may not make sense for your site.

Implementing Good URLs

Leave your URLs flexible. Don't hard-code URLs in templates: makes it very hard to change during development.

It makes it hard to change them during development (as the shape of the site coalesces). That smells bad.

Implementing Good URLs

Any template will allow something like this, but it's fragile:

<a href="/blog/{{ post.year }}/{{ post.slug }}">
    {{ post.title }}</a>

… but what if you decide you want to change the URLs to be like /posts/2021/post-title instead? How many places do you have to change?

Implementing Good URLs

Most (full-stack?) frameworks have a way to refer to the URL of a particular controller.

You should use these to refer logically to a piece of content in your code, without assuming the shape of the URL.

Implementing Good URLs

e.g. in Django templates:

<a href="{% url blog:view year=post.year slug=post.slug %}">
    {{ post.title }}</a>
<a href="{{ post.get_absolute_url }}">{{ post.title }}</a>

e.g. in Rails ERB templates:

<%= link_to post.title, controller: 'blog', action: 'view',
    year: @post.year, slug: @post.slug =>
<%= link_to post.title, @post =>

e.g. in Django Python code:

url = reverse('blog:view', kwargs={'year': post.year,
        'slug': post.slug})
return redirect(url)

Implementing Good URLs

These make it much easier to design your URL hierarchy during development.

It's easy to work with a temporary URL, and change once you see the site together.

Implementing Good URLs

Once you have put your site in production and URLs become public, they can't disappear.

But you can leave a redirect in their place if you have no other choice. In Express:

app.get('/blog/:year/:post', function (req, res) {
  res.redirect(301, …);
})

In Django urls.py:

url(r'…', RedirectView.as_view(pattern_name='blog:view',
        permanent=True)),

Slugs

Whatever we have in the URL must uniquely identify a piece of content. HTTP calls it a resource. In code, we more likely think of it as a row/record.

What we put in the URL must make it possible to (quickly) look up the model object(s) that we need.

Slugs

The most obvious candidate is the record's primary key, but that leads to URLs like:

http://example.com/posts/2348

… which aren't very human-readable.

Slugs

A slug is a string that:

uniquely identifies particular content,
is safe to use in a URL,
is at least a little meaningful to people.

Slugs

Maybe you'll get lucky and have something with these properties already, but not usually.

e.g. username for a person at SFU (except newly-hired faculty, or newly-arriving stuents, or temporary research visitors, or…).

Slugs

For a blog post, we might start with a title like "Man Bites Dog!" and build the slug "man-bites-dog".

If we store these in the database and index that field, we can quickly look up the content for a URL like /posts/man-bites-dog.

posts = Posts.find_all({'slug': post_slug})
if not posts: return NotFound404Response()
post = posts[0]
⋮

Slugs

A slug and good URL config transform our “bad” URL,

http://example.com/pim-mod/show_person.py?id=237

into:

http://example.com/person/greg-baker

Slugs

Another story with the same title might get "man-bites-dog-2", which isn't as beautiful, but better than an integer key.

Depending on content, it might be worth adding to your URL hierarchy, so slugs need to be unique within a smaller set of records.

/posts/man-bites-dog-9
/posts/2021/man-bites-dog-2
/posts/2021/03/man-bites-dog

Slugs

Building the slug should be done automatically: the user shouldn't have to care about unique keys.

Find a field that's meaningful and usually-unique (like title or firstname+lastname). Turn it into a slug, something like: remove everything but letters/digits/space, lowercase, space to dash.

Then make sure it's unique so you can look it up.

Slugs

Maybe it's best for someone else to do that.

e.g. django-autoslug lets you define model fields like this, and it takes care of the uniqueness.

slug = AutoSlugField(populate_from='title', unique=True)
slug = AutoSlugField(populate_from='title',
        unique_with=['pub_date__year'])
slug = AutoSlugField(populate_from='title',
        unique_with=['author', 'pub_date__year'])

Slugs

And all of that's great if you're starting with English text. The remove everything but letters/digits/spaces step gets you in trouble since we were probably thinking English letters.

django.utils.text.slugify('欢迎') == ''

So blog posts with Chinese titles will end up something like:

/posts/2021/-1
/posts/2021/-2

Slugs

The least-worst thing is probably to somehow translate to ASCII first. This can never be perfect, but it could be okay.

Unidecode is a library that converts arbitrary Unicode characters to ASCII in a hopefully-phoenetic way.

Slugs

For example:

django.utils.text.slugify(unidecode('欢迎'))
    == 'huan-ying'
django.utils.text.slugify(unidecode('приветствовать'))
    == 'privetstvovat'

This seems like a reasonable URL for a post titled 欢迎.

/posts/2021/huan-ying

REST

REST is short for REpresentational State Transfer.

Not a specific technology or standard, but a style/schema/convention for interaction with a web system.

REST

The goal is to create an API: a way for another system to interact with your site and its data.

It will happen over HTTP, using technology we already have at-hand. We just think in a slightly more API-way.

REST

Since there is no standard “REST”, there are no fixed rules about how all of this is done. All of this is guidelines, best-practices, and just what-people-do.

REST

Remember:

URLs identify resources. (≈ nouns, e.g. Greg's contacts)
File formats represent resources.
… along with Internet media types that specify format (e.g. text/html, text/vcard, application/json).

REST

Representations are exchanged (transferred) by HTTP methods: GET, POST, PUT, DELETE (≈ verbs)

The server keeps track of the state of the data (probably with a database).

REST

Representations of states are transferred by HTTP: representational state transfer, REST.

REST

We can use use URLs to represent both collections of data, and individual records. e.g.

/contacts/: collection of all contacts.
/contacts/greg-baker: contact info for one person.

REST Methods

For the API, we'll focus on the CRUD operations: Create, Retrieve, Update, Delete. We can map these operations onto HTTP methods as…

REST Methods

Op	Method	Single item	Collection
R	`GET`	retrieve representation	retrieve list of members
U	`PUT`	update item with info from provided representation	replace collection with new members
C	`POST`	create new sub-info	new entry in collection
D	`DELETE`	delete item	delete collection

REST Methods

Some examples of how these might actually be implemented on an item like /contacts/greg-baker:

PUT request with Content-type: text/vcard: replace the contact information with that from the resource.
POST request with Content-type: text/vcard: add additional contact information.

REST Methods

For read operations, we can use the Accept header to guide what representation the client wants.

GET request with Accept: text/vcard: return contact info as a vCard.
GET request with Accept: text/html: return contact info as an HTML page.

REST Methods

For a collection like /contacts/:

PUT request with Content-type: text/vcard: replace the whole contact list with this one.
POST request: add a new contact to the contact list.
GET request: return the whole contact list with a representation from the Accept header.

REST Methods

It's also fairly common to use the HTTP PATCH method to indicate update but don't totally replace.

REST Representations

We have a choice of representation for both read (GET) and write (PUT/GET) operations.

If there's a standard for your data, it's probably a good choice: vCard for contact info, GPX for GPS data, CSV for tabular data, …

Otherwise, JSON. Or maybe JSON anyway.

For GET by the browser, can honour the Accept header and return HTML for the user.

REST Representations

What just happened:

We created an API for a web site without inventing anything new, just using HTTP as it has always existed.

We don't even need new URLs: the Accept header can tell us whether we're talking to an API consumer or user. (But in practice, separate URLs are often easier.)

REST Representations

It's not clear how to use REST to represent non-CRUD operations.

e.g. transferring money between bank accounts. Are you creating a new transaction in the source account? Destination account? How do you express the amount?

The REST API Design Rulebook suggests a verb as the last component of the URL.

POST /accounts/1234/transfer

REST Status Codes

We can also make richer use of HTTP status codes to give information about success/failure.

POST to create a new item → 201 Created with Location header of item's URL.
PUT to update an item (successfully) → 204 No Content.
POST to do a big asynchronous insert → 202 Accepted with some info in the body about the job.

These start to look a lot like return values.

REST Status Codes

And these look a lot like exceptions:

PUT with XML data where server only understands JSON → 415 Unsupported Media Type.
PUT to update something immutable → 403 Forbidden or 405 Method Not Allowed.
POST with badly-formed JSON → 422 Unprocessable Entity.

REST Authentication

We need authentication too. We can't ask for a username+password in a non-interactive context.

Actually we can, with HTTP basic authentication, but it's probably not the right thing to do.

REST Authentication

Best practice for non-interactive authentication is OAuth.

It's complicated, but for a reason. They have done all the hard security work so you don't have to.

REST Libraries

Of course, you could create all of the functionality for REST by hand.

def contact(request, contact_slug):
    if request.method == 'GET':
        contact = get_object_or_404(Contact, slug=contact_slug)
        if is_acceptable_type(request, 'application/json'):
            ⋮ # return JSON representation
        elif is_acceptable_type(request, 'text/html'):
            ⋮ # return HTML-page representation
        else:
            return NotAcceptable406Response()
    elif request.method == 'POST':
        ⋮ # create object from POST data
    else:
        return MethodNotAllowed405Response()

REST Libraries

But again, your framework (or an extension) can probably make your life easier.

Rails does many REST things out-of-the-box. The Django REST Framework adds very flexible REST functionality.

REST Libraries

With even the best framework in the world, you're still going to have to worry about things like:

How do you serialize/deserialize ORM objects? The default may not be exactly what you want.
How do you express authorization checks so consumers can modify exactly the right objects?
Are there some fields that shouldn't be modified by an API consumer? How do you prevent that?

Demo

Demo of working with Django REST Framework: exercise 8 code with basic DRF setup. Notable additions:

Enforce visibility rules in queryset.
Create dynamic field generated by a function.
Force a POST request's creation have .owner reflecting logged-in user.

All of this is toward the goal of demonstrating how working with a modern heavyweight framework is wonderful and horrible, and applies to server-side, client-side, REST frameworks, etc.

REST vs GraphQL

The other somewhat-standard option for creating an API: GraphQL.

GraphQL isn't as HTTP-focussed: one URL for all API calls, often all calls are a POST request.

REST vs GraphQL

But, the way queries are made is much more standardized. GraphQL requests indicate:

What type (≈ model class, table) you're asking about.
What field(s) you want about them.
Which record(s) you want (specific id, search term, etc).
What related value(s) you also need.

REST vs GraphQL

If you need a variety of different queries on the data, GraphQL might be easier. REST APIs are generally better understaood and can be queried by anything that understands HTTP.

Conclusion? I don't know.

Creating an API

Do web sites really need an API?

My answer: Yes.

Creating an API

An API can provide data to/from…

your JavaScript framework in the browser.
your mobile app.
the mobile app that somebody else writes that's awesome but you never thought of.
another web site that does awesome things with your users' data.