… It turns out the things that holds the web together make a dandy API.
URLs are obviously an important part of web sites and apps.
Unfortunately, many developers don't give them enough attention.
Some basic ideas of URLs that you might have missed…
URLs are opaque.
Opaque in the sense that they are just a string to pass server ↔ client. They have no semantic meaning in HTTP.
Is http://example.com/products/wrench.html
about products
? Are there other products within the hierarchy? Is it HTML? Static? Maybe, maybe not.
The point for now: HTTP doesn't care.
A request comes in for the URL path /products/wrench.html
and server sends an HTTP response. The rest is still to be decided.
URLs identify a resource.
An HTTP resource is a piece of content: description of a product, picture from your vacation, style information for a page.
All of those things that are on the web
are resources.
The resource's URL identifies that piece of content: gives it a unique thing that can be used to distinguish it from all other things on the web.
The URL http://example.com/products/wrench
refers to (we imagine) the description of one product at one company.
Identification vs location? URL = Uniform Resource Locator. URI = Uniform Resource Identifier.
Everything we see is both: web URLs identify content and give the browser a way to locate them. (Contact this server by HTTP and make a request like…)
There are non-web URIs that give no information about locating the resource. e.g. urn:isbn:1480560367
File formats represent a resource.
There are (at least conceptually) many ways to represent a particular content. The various file formats we know give us a way to express that content that the user (agent) will be able to handle.
Product description could be HTML, PDF, or MS Word. The vacation picture could be JPEG, WebP, or a panorama viewing app. Web page style information could be CSS, XSLT, or some future stylesheet tech.
Summary…
HTTP thinks URLs are just strings to identify a resource.
Those resources are one conceptual piece of content that has to be represented using some file format.
If URLs are going to identify a resource, then there should be a one-to-one correspondence between resources and URLs.
That can go wrong in two ways…
Multiple URLs for one resource can't be collapsed by any automated tool, even if users might imagine they are really identical.
e.g. having home page at http://www.example.com/
, http://example.com/
, and http://example.com/index.html
.
Results:
Be consistent in the URLs you use: pick one (schema) and stick with it.
If you end up with multiple URLs, redirect instead of returning the content in many places.
Specifying a <link rel="canonical" href="…" />
fixes the ambiguity for search engines, but not the other problems.
Multiple resources at one URL prevent URLs from locating content.
e.g. tabbed content revealed by JS, like those created by jQueryUI tabs: there are many versions of the “content” at that URL.
Results:
Click the link, then the third tab, then scroll down.
That's not what you want from your site.
Pages that are built by JS and frontend frameworks generally suffer from this: the page
is the way it is because of many things after the URL was loaded.
The JavaScript browser history API and JS framework routers can compensate if you are careful about it.
Another implication of URLs identify content
: they should be persistent. If a piece of content is available at that URL, then it should continue to have that identifier until the end of time.
Never put transient info (like session identifiers) in URLs.
If you have to move things, leave a redirect in their place.
We know that URLs are an opaque string within HTTP. The client/server/protocol doesn't care as long as they are unique identifiers.
So the shape of your URLs doesn't matter? Are these equally good URLs for a resource?
http://example.com/products/wrench
http://example.com/wp-content/324
http://example.com/40e7511c-a326-405f
URLs are an important part of the user interface of your site/app.
Supporting evidence: destination URLs are displayed in Google search results, Bing search results, DuckDuckGo search results. These are very carefully designed pages. They wouldn't display URLs if users weren't looking at them.
The technology doesn't care about URLs but users do. Conclusion: they are part of your site's user interface.
When designing a site/app, treat them like they are.
URLs should be helpful to the user. They should be human-readable, informative, and relatively short.
Ways to do this…
Most important: actually think about your users when creating URL layout.
How will they see information related? What will they care about in the URL?
Avoid URL shorteners.
They create a short tweetable URL, but lose any meaning for the user. Keep the short, meaningless URLs in tweets where they belong.
http://bit.ly/2fC3Pb5
http://bit.ly/AboutSFUCS
http://www.sfu.ca/computing/about.html
… but keep URLs short.
If users want to email/message/tweet/whatever about your site, that's a good thing. The URL had better fit in a single line/message.
Avoid escaped characters. Limited characters can be used in a URL; the rest must be URL encoded. These are impossible for a user to read.
e.g. space turns into %20
.
Browsers will display these unescaped in the address bar, but they're still there if copying-and-pasting, or displaying in a message.
e.g. the URL that sometimes looks like
http://example.com/欢迎
is really
http://example.com/%E6%AC%A2%E8%BF%8E
Mixed case is also hard to read. Generally stick to the characters that are safe in a URL, easily pronounced, etc. That leaves us:
-
(dash) and maybe _
(underscore),/
for hierarchy…Sorry, non-English sites.
One piece of a URL that has a little semantics: /
The slash-separated parts of the URL implicitly form a hierarchy. A link <a href="../foo">
takes one slash-separated component off the URL.
So, the URL http://example.com/products/wrench
refers to wrench
that is in some way “inside” products
.
Use /
to build a hierarchy for the resources in your site.
How will (should) the users perceive the relationship between resources? Build that into the hierarchy of your site.
The URL hierarchy should be informative and guessable.
e.g. given the example http://example.com/products/wrench
, I predict these URLs might exist:
…/products/
: product list/search.…/products/hammer
: another product.…/products/wrench/specs
: product technical specs.Any defaults provided by your framework might not be the best choice. The defaults can't know about the relationships between your models. Compare:
…/products/123/show
: product info.…/options/456/show
: one colour it comes in.vs
…/products/hammer/
: product info.…/products/hammer/option/red
: one colour it comes in.Design the hierarchy for users, not the implementation.
Older tech often encouraged bad URLs. What does the user care about here?
http://example.com/pim-mod/show_person.py?id=237
Don't care about code modules, language being used, database primary keys. The GET
request implies
.show
Do care:
(presumably) and example.com
.person
A more human-friendly URL for that resource would be like:
http://example.com/people/ggbaker
Hierarchy implies that we're looking at one person in the collection of people. We can guess who that person is.
We can continue to expand the hierarchy in reasonable ways, depending on the information we want to present:
…/people/
: list of people (we're allowed to access).…/people/jsmith
: another person.…/people/ggbaker/phone
: person's phone numbers.…/people/ggbaker/phone/office
: person's office phone number.URL hierarchies don't have to be deep. Having too many path components can make URLs unreadably long.
e.g. Wikipedia has two: language (en
) and topic (Vancouver
). https://en.wikipedia.org/wiki/Vancouver
Maybe a third: disambiguation. https://en.wikipedia.org/wiki/Vancouver_(steamboat)
With any technology, it should be possible to design good URLs.
Web servers will call your logic (with WSGI, Rack, etc) as configured and give the rest of the URL path as an argument (historically, an environment variable PATH_INFO
).
e.g. http://example.com/blog/2021/01/something
might be code at /blog/
that would get PATH_INFO
value /2021/01/something
. After that, it's just programming.
It's possible to do URL rewriting in the frontend server. e.g. mod_rewrite
That's okay in simple cases, but it's usually a mess. Avoid if possible.
Modern frameworks allow arbitrary URL construction with the router/dispatcher/controller.
Incoming URLs are matched against a series of rules (regular expressions or other patterns). Each rule corresponds to a controller function that can respond to requests for URLs in that pattern.
In Express:
app.get('/blog/:year/:post', function (req, res) { … })
In Rails routes.rb
:
get '/blog/:year/:post' => 'blog#get_post'
In Django urls.py
:
url(r'^blog/<int:year>/<slug:post>$', views.view_post),
Give some thought to your URLs as you're constructing them, and especially as you're starting a new site/module.
Be cautious of defaults/conventions which may not make sense for your site.
Leave your URLs flexible. Don't hard-code URLs in templates: makes it very hard to change during development.
It makes it hard to change them during development (as the shape of the site coalesces). That smells bad.
Any template will allow something like this, but it's fragile:
<a href="/blog/{{ post.year }}/{{ post.slug }}"> {{ post.title }}</a>
… but what if you decide you want to change the URLs to be like /posts/2021/post-title
instead? How many places do you have to change?
Most (full-stack?) frameworks have a way to refer to the URL of a particular controller.
You should use these to refer logically to a piece of content in your code, without assuming the shape of the URL.
e.g. in Django templates:
<a href="{% url blog:view year=post.year slug=post.slug %}"> {{ post.title }}</a> <a href="{{ post.get_absolute_url }}">{{ post.title }}</a>
e.g. in Rails ERB templates:
<%= link_to post.title, controller: 'blog', action: 'view', year: @post.year, slug: @post.slug => <%= link_to post.title, @post =>
e.g. in Django Python code:
url = reverse('blog:view', kwargs={'year': post.year, 'slug': post.slug}) return redirect(url)
These make it much easier to design your URL hierarchy during development.
It's easy to work with a temporary URL, and change once you see the site together.
Once you have put your site in production and URLs become public, they can't disappear.
But you can leave a redirect in their place if you have no other choice. In Express:
app.get('/blog/:year/:post', function (req, res) { res.redirect(301, …); })
In Django urls.py
:
url(r'…', RedirectView.as_view(pattern_name='blog:view', permanent=True)),
Whatever we have in the URL must uniquely identify a piece of content. HTTP calls it a resource
. In code, we more likely think of it as a row/record.
What we put in the URL must make it possible to (quickly) look up the model object(s) that we need.
The most obvious candidate is the record's primary key, but that leads to URLs like:
http://example.com/posts/2348
… which aren't very human-readable.
A slug is a string that:
Maybe you'll get lucky and have something with these properties already, but not usually.
e.g. username for a person at SFU (except newly-hired faculty, or newly-arriving stuents, or temporary research visitors, or…).
For a blog post, we might start with a title like "Man Bites Dog!"
and build the slug "man-bites-dog"
.
If we store these in the database and index that field, we can quickly look up the content for a URL like /posts/man-bites-dog
.
posts = Posts.find_all({'slug': post_slug}) if not posts: return NotFound404Response() post = posts[0] ⋮
A slug and good URL config transform our “bad” URL,
http://example.com/pim-mod/show_person.py?id=237
into:
http://example.com/person/greg-baker
Another story with the same title might get "man-bites-dog-2"
, which isn't as beautiful, but better than an integer key.
Depending on content, it might be worth adding to your URL hierarchy, so slugs need to be unique within a smaller set of records.
/posts/man-bites-dog-9
/posts/2021/man-bites-dog-2
/posts/2021/03/man-bites-dog
Building the slug should be done automatically: the user shouldn't have to care about unique keys.
Find a field that's meaningful and usually-unique (like title or firstname+lastname). Turn it into a slug, something like: remove everything but letters/digits/space, lowercase, space to dash.
Then make sure it's unique so you can look it up.
Maybe it's best for someone else to do that.
e.g. django-autoslug lets you define model fields like this, and it takes care of the uniqueness.
slug = AutoSlugField(populate_from='title', unique=True) slug = AutoSlugField(populate_from='title', unique_with=['pub_date__year']) slug = AutoSlugField(populate_from='title', unique_with=['author', 'pub_date__year'])
And all of that's great if you're starting with English text. The remove everything but letters/digits/spaces
step gets you in trouble since we were probably thinking English letters.
django.utils.text.slugify('欢迎') == ''
So blog posts with Chinese titles will end up something like:
/posts/2021/-1
/posts/2021/-2
The least-worst thing is probably to somehow translate to ASCII first. This can never be perfect, but it could be okay.
Unidecode is a library that converts arbitrary Unicode characters to ASCII in a hopefully-phoenetic way.
For example:
django.utils.text.slugify(unidecode('欢迎')) == 'huan-ying' django.utils.text.slugify(unidecode('приветствовать')) == 'privetstvovat'
This seems like a reasonable URL for a post titled 欢迎
.
/posts/2021/huan-ying
REST is short for REpresentational State Transfer
.
Not a specific technology or standard, but a style/schema/convention for interaction with a web system.
The goal is to create an API: a way for another system to interact with your site and its data.
It will happen over HTTP, using technology we already have at-hand. We just think in a slightly more API-way.
Since there is no standard “REST”, there are no fixed rules about how all of this is done. All of this is guidelines, best-practices, and just what-people-do.
Remember:
text/html
, text/vcard
, application/json
).Representations are exchanged (transferred) by HTTP methods: GET
, POST
, PUT
, DELETE
(≈ verbs)
The server keeps track of the state of the data (probably with a database).
Representations of states are transferred by HTTP: representational state transfer, REST.
We can use use URLs to represent both collections of data, and individual records. e.g.
/contacts/
: collection of all contacts./contacts/greg-baker
: contact info for one person.For the API, we'll focus on the CRUD operations: Create, Retrieve, Update, Delete. We can map these operations onto HTTP methods as…
Op | Method | Single item | Collection |
---|---|---|---|
R | GET | retrieve representation | retrieve list of members |
U | PUT | update item with info from provided representation | replace collection with new members |
C | POST | create new sub-info | new entry in collection |
D | DELETE | delete item | delete collection |
Some examples of how these might actually be implemented on an item like /contacts/greg-baker
:
PUT
request with Content-type: text/vcard
: replace the contact information with that from the resource.POST
request with Content-type: text/vcard
: add additional contact information.For read operations, we can use the Accept
header to guide what representation the client wants.
GET
request with Accept: text/vcard
: return contact info as a vCard.GET
request with Accept: text/html
: return contact info as an HTML page.For a collection like /contacts/
:
PUT
request with Content-type: text/vcard
: replace the whole contact list with this one.POST
request: add a new contact to the contact list.GET
request: return the whole contact list with a representation from the Accept
header.It's also fairly common to use the HTTP PATCH
method to indicate update but don't totally replace
.
We have a choice of representation for both read (GET
) and write (PUT
/GET
) operations.
If there's a standard for your data, it's probably a good choice: vCard for contact info, GPX for GPS data, CSV for tabular data, …
Otherwise, JSON. Or maybe JSON anyway.
For GET
by the browser, can honour the Accept
header and return HTML for the user.
What just happened:
We created an API for a web site without inventing anything new, just using HTTP as it has always existed.
We don't even need new URLs: the Accept
header can tell us whether we're talking to an API consumer or user. (But in practice, separate URLs are often easier.)
It's not clear how to use REST to represent non-CRUD operations.
e.g. transferring money between bank accounts. Are you creating a new transaction in the source account? Destination account? How do you express the amount?
The REST API Design Rulebook suggests a verb as the last component of the URL.
POST /accounts/1234/transfer
We can also make richer use of HTTP status codes to give information about success/failure.
POST
to create a new item → 201 Created
with Location
header of item's URL.PUT
to update an item (successfully) → 204 No Content
.POST
to do a big asynchronous insert → 202 Accepted
with some info in the body about the job.These start to look a lot like return values.
And these look a lot like exceptions:
PUT
with XML data where server only understands JSON → 415 Unsupported Media Type
.PUT
to update something immutable → 403 Forbidden
or 405 Method Not Allowed
.POST
with badly-formed JSON → 422 Unprocessable Entity
.We need authentication too. We can't ask for a username+password in a non-interactive context.
Actually we can, with HTTP basic authentication
, but it's probably not the right thing to do.
Best practice for non-interactive authentication is OAuth.
It's complicated, but for a reason. They have done all the hard security work so you don't have to.
Of course, you could create all of the functionality for REST by hand.
def contact(request, contact_slug): if request.method == 'GET': contact = get_object_or_404(Contact, slug=contact_slug) if is_acceptable_type(request, 'application/json'): ⋮ # return JSON representation elif is_acceptable_type(request, 'text/html'): ⋮ # return HTML-page representation else: return NotAcceptable406Response() elif request.method == 'POST': ⋮ # create object from POST data else: return MethodNotAllowed405Response()
But again, your framework (or an extension) can probably make your life easier.
Rails does many REST things out-of-the-box. The Django REST Framework adds very flexible REST functionality.
With even the best framework in the world, you're still going to have to worry about things like:
Demo of working with Django REST Framework: exercise 8 code with basic DRF setup. Notable additions:
POST
request's creation have .owner
reflecting logged-in user.All of this is toward the goal of demonstrating how working with a modern heavyweight framework is wonderful and horrible, and applies to server-side, client-side, REST frameworks, etc.
The other somewhat-standard option for creating an API: GraphQL.
GraphQL isn't as HTTP-focussed: one URL for all API calls, often all calls are a POST
request.
But, the way queries are made is much more standardized. GraphQL requests indicate:
If you need a variety of different queries on the data, GraphQL might be easier. REST APIs are generally better understaood and can be queried by anything that understands HTTP.
Conclusion? I don't know.
Do web sites really need an API?
My answer: Yes.
An API can provide data to/from…