Security

When doing web development, you can never ignore security. Your code is open to attack by anyone, good or bad.

“99% secure” is insecure.

Security

The basic message: only allow users to do exactly what you want to allow. Do your best to mitigate if their account gets hacked, etc.

Remember that anything that destroys data is affecting you(r database), not the user's computer.

There are a few things specific to worry about for web development. Let's talk about some…

Input & Output

A web site can't trust any of its input: anything might be malicious. All output should be produced very carefully.

Maybe this is the robustness principle again.

Input & Output

It's easy to assume you know something about your input.

<select name="foo">
<option value="1">Option One</option>
<option value="2">Option Two</option>
<option value="3">Option Three</option>
</select>

You should get a form submission with only these choices, right?

Input & Output

You should know by now that requests can be generated manually with foo=5 or foo=six or any other value.

The value must be verfied server-side before storing it/processing it.

You don't know that your input looks right until you validate that it does.

Input & Output

Some things you might want to check about your input:

Is it even there?
Is it in the format you expect/one of the legal values?
Is it coming from the user you expect/allow to submit?
Will it fit into your DB field?
⋮

Input & Output

Remember where output is being sent and escape accordingly. e.g. in HTML, < must be produced by sending a <.

Look for a library function that knows the rules for HTML, SQL, URLs, ….

SQL Injection

General cause: building SQL queries from strings (like user input). A special case of code injection.

SQL Injection

Generally caused by your code doing something like this:

id = request['id']
query = 'UPDATE posts VALUES … WHERE id=' + id

… but the user made a request with id like:

0; DROP DATABASE foo

0 or posts.author='admin'

Or any other variant that lets an outside user do arbitrary DB queries.

SQL Injection

All input included in queries must be properly escaped.

Best solution: use an ORM and let it do it. (But watch for raw-SQL functions where you're in charge.)

SQL Injection

Or look for a function that builds a query safely.

build_query(
    template='SELECT … WHERE userid=? AND password=?',
    args=[user, pw])

If you must, look for a function that escapes for SQL, but you must remember to use it every time. You'll probably remember 99% of the time, which isn't enough.

Cross-Site Scripting

(Or XSS.)

You get some content from the user and then put it into your HTML. Injection can happen in a similar way to SQL: they give some HTML tags where you just expected “text” that you put un-escaped into a template.

Cross-Site Scripting

XSS happens when they inject JavaScript logic. e.g. a blog comment:

My comment is <script>alert('foo')</script>

Literal output: code runs for anyone who visits the page.

You wanted to output:

My comment is &lt;script&gt;alert('foo')&lt;/script&gt;

Cross-Site Scripting

If an attacker can get JS code into visitors' pages, then their browsers can be made to do anything JavaScript code can do: modify the page (to change links to malicious destinations), make requests to the server (for additional info the attacker wants), ….

Cross-Site Scripting

Solution #1: escape all of your output.

Most template systems do this by default, which is good. (If your templates don't, choose a different template system.)

Find the function to escape special characters to entities if you need it.

Cross-Site Scripting

Safe, but doesn't allow the user to do any formatting. Even line breaks unless you do something about it:

Welcome to my profile.

It's <em>nice</em> here.

Will display as:

Welcome to my profile. It's <em>nice</em> here.

Could easily make that:

Welcome to my profile.<br/><br/>It's &lt;em>nice&lt;/em> here.

Cross-Site Scripting

Solution 2: let the user do some formatting (e.g. in comments), but with something less powerful than HTML.

Let them enter something like Markdown or Wikitext. Those can't express anything dangerous, so no problem. They also won't produce invalid HTML content.

Make sure the library escapes entered HTML code: not all do by default.

Cross-Site Scripting

Solution 3: let the user enter HTML, but make sure it's safe.

This is hard. If you do it yourself, it will be incomplete. Look for a library like Bleach or HTML Purifier.

If you want your users to use a WYSIWYG editor like TinyMCE, this might be necessary.

Cross-Site Scripting

There are many places JS code can hide in a page, and these can only protect the text they process.

Another hiding place: you let the user enter a URL and don't validate it.

<a href="javascript:alert('foo')">…</a>

It can be hard to know them all.

Cross-Site Scripting

The Content-Security-Policy header can be used to prevent loading resources that originate in places you didn't expect. That can mitigate any XSS injections that do happen.

e.g. Can express that <script src="…"> must from from your site or cdnjs.com. Others won't load.

e.g. Can express that <form action="…"> must point to your domain, so sensitive info won't be sent elsewhere.

Insufficient Authorization

It's easy to forget all the ways requests can arrive at a URL. Many developers implicitly assume that URLs are hard to find, and that users will arrive through your menu structure.

e.g. If a users requests https://example.com/orders/12345, then they must have seen the menu where you showed them:

<a href="/orders/12345">View order #12345</a>

Insufficient Authorization

But if I can see my (private) info at …/orders/12346, then it doesn't take much insight to modify the URL to …/orders/12345.

Even a hard-to-guess URL can be found in log files, or in the browser history at an Internet café.

<a href="/orders/efb5e6a7-ed84-4174-8b10-861d984ce6b0">…</a>

Insufficient Authorization

The result: you must check each request to make sure the requesting user is actually allowed to see/modify/whatever the data you're about to send. Don't assume that because they have the link they are allowed.

An extremely common problem in student code.

Insufficient Authorization

It's very easy to write the equivalent of:

def view_order(request, order_id):
    if not request.is_logged_in():
        return forbidden_403()
    order = Orders.find(id=order.id)
    if not order:
        return not_found_404()
    ⋮ # display order

Doesn't check who owns the order. Should be like:

def view_order(request, order_id):
    if not request.is_logged_in():
        return forbidden_403()
    order = Orders.find(id=order.id, owner=request.user)
    if not order:
        return not_found_404()
    ⋮ # display order

Insufficient Authorization

Don't confuse authentication (who is this person?) with authorization (is this person allowed to see/do this?).

Just because they have logged in doesn't mean they can perform a particular action.

Insufficient Authorization

It's easy to check your code for this: every controller/view should start with an authorization check.

Project suggestion: do a quick audit of others' code for a check in each controller.

Insufficient Authorization

Don't forget to do this for the “other” content: the things you don't expect to be accessed directly. e.g. images, AJAX requests, popup “more info”, “hidden” admin interfaces, …

These must be checked as carefully as any other request.

Insufficient Authorization

Also watch for smaller information leaks, where information should be displayed but only to some users who view a page. Remember to think about parts of a page, not just whole URLs.

e.g. student can see marks; instructor can see marks, who marked it, and when.

e.g. /cmpt470/abc12/marks returns a 403 forbidden vs 404 not found. Reveals who is registered in the course?

Session Prediction/Hijacking

HTTP is mostly stateless: each request is independent of the next. There's no obvious state to track a user.

But we need that to track a user who has logged in as they navigate the site. There must be a session token that determines who the user is.

Session Prediction/Hijacking

Usually the token is an HTTP cookie that's stored in the database and sent to the browser when a user logs in:

Set-Cookie: sessionid=034deda7bde4a1fa; Domain=example.com

… and sent back by the browser with subsequent requests.

Cookie: sessionid=034deda7bde4a1fa

Session Prediction/Hijacking

The security implication: anyone who sends a request with that session token is that user. What if it's not actually that person?

Session Prediction/Hijacking

Problem 1: session prediction where the token is easy to guess. e.g. this is a disastrously bad token:

Set-Cookie: username=ggbaker

This is only a little better:

Set-Cookie: sessionid=82

… because I could guess that sessionid=81 is also a thing.

Session Prediction/Hijacking

The token should be randomly generated (by a good random number generator), and values should be sparsely used.

irb> require 'securerandom'
irb> sessionid = SecureRandom.base64(24)
irb> puts sessionid
pbgiqzsVWNv3LPDul4N4qG2SA3uWGW5v

In your database, translate session token → user.

All frameworks will do this for you: let them.

Session Prediction/Hijacking

Problem 2: session hijacking where an attacker finds the session token and can use it. Could be snooped on the network; in Referer header; found in Internet café's browser history.

Hard to prevent perfectly, but can decrease the risk significantly…

[Frameworks' session management will have settings that correspond to these.]

Session Prediction/Hijacking

Keep the token out of the URL, so it's not visible on-screen, in Referer header, or logged.

Expire the token after a reasonable time.

Set-Cookie: …; Expires=Wed, 1 Sep 2021 00:00:00 GMT

Let the user log out. (i.e. remove the session token → user association on your end.)
Use HTTPS (properly) to prevent snooping and only send the token securely.
```
Set-Cookie: …; Secure
```

Cross-Site Request Forgery

This one isn't well understood by many developers. Here's the scenario…

Cross-Site Request Forgery

User logs into example.com with their password. 😃

User visits badguy.com which contains: 😇

<form action="http://example.com/password_change"
method="post">
<input type="hidden" name="pw" value="newpass" />
<input type="text" />
<input type="submit" value="Search" />
</form>

User clicks an innocent looking Search button. 😈
POST request to example.com with user's correctly authenticated session. 😲
User's password gets changed on example.com. 😭

Cross-Site Request Forgery

It's even easier to trigger a GET request:

<img src="http://example.com/password_change?pw=newpass" />

That could even be embedded in a discussion forum.

Obviously, replace change password with any other action the user could take: disable account, add a friend, transfer credits, …

Cross-Site Request Forgery

Prevention for GET requests: don't have any side-effects. That way, if a request is made there's no harm.

For POST requests, it's trickier. You must ensure the user came from your form before taking any action.

Cross-Site Request Forgery

Standard defense: generate a random token for each user's session and include it in any POST forms/requests:

<form action="/password_change" method="post">
<input type="hidden" name="csrf" value="6c3d615b94fe1444" />
<input type="password" name="pw" />
<input type="submit" value="Change Password" />
</form>

Ensure that the token is correct before doing anything.

Cross-Site Request Forgery

Frameworks will help: automatically generate the token, and catch any POST that doesn't have it before you ever see it.

May be off by default since it breaks all POSTs if the form templates don't include the token.

Insecure Static Content

Common pattern: let the user upload a file. Store it on disk (since it's a file). Serve it as static content (since it's faster).

But then there are no authorization checks. Profile picture: probably okay. Private document: bad.

Insecure Static Content

Best solution: create a controller/view with logic to check authorization properly, and then send the appropriate media type and file contents.

Causes overhead: your framework probably can't send a file as fast as Nginx. Must be sent from server doing dynamic processing, not a content distribution network.

Insecure Static Content

If that overhead isn't acceptable, can generate a random hard-to-guess URL and serve that as static.

https://cdn.example.com/images/MRQ/Cy9/Oh1Jk0KqFnE2YX.jpg

Anyone can still access, but has to know the URL. Most big sites do this for images: Facebook, YouTube, LinkedIn, ….

Insecure Data Storage

How is your data stored? Who has access? An ultra-secure web app doesn't help if your database is accessible to the whole Internet with the default password.

Check who can access the database, log in to server, physically access server, access backups,….

Insecure Data Storage

Not all data is created equal.

If you expose your users' real names, that's bad. If you expose their credit card numbers, that's much worse.

Don't store sensitive data (e.g. credit card numbers) if you can avoid it. Consider different storage strategies (e.g. more secure tier of database) for the most sensitive data.

Insecure Data Storage

One important case: passwords.

People often use the same password across multiple sites. A data breach on one can affect others.

Insecure Data Storage

Store the password? No. Never.

Store a hash of the password? Better, but most cryptographic hash functions are too fast to compute: can pre-compute a dictionary at millions to billions per second.

Salt and hash password? Can still guess millions to billions per second.

Insecure Data Storage

The real answer? You are dangerously bad at cryptography. Don't do it. Use the password handling of your framework and hope they did it right.

Possibly-correct answer: salt and use a deliberately-slow cryptographic hash function like bcrypt.

Even better: implement two-factor authentication.

Risk Mitigation

Even with all of the best development practices, there is the possibility of something going wrong. There could be a problem with your code, a user's password being guessed, etc.

If there has been a security breach, how can you handle it and recover?

Risk Mitigation

If data is leaked, it's not coming back. You need to prevent/minimize ahead of time.

Give users access to the minimum set of data they need. That way, a hacked account has the smallest impact possible. You might trust your users, but that doesn't mean you should trust their account.

e.g. staff might not need to see all of the users in the system, only their clients, users in their region, etc.

Risk Mitigation

What happens if a user's account is hacked and the attacker starts deleting/modifying data?

Will you even notice? Can you recover and restore the data to the pre-hacked state?

Risk Mitigation

Some ways recovery might be possible:

Keep revision history of objects: usually work only with the most-recent, but can delete new versions if necessary. Bonus: can then have undo functionality in your app.
Keep snapshots of the database, so you can recover what (some of the?) data was like before the hack.
Log everything that users do, so you can manually recover what was changed in the hack.

Example Security Breaches

Have i been pwned? Search for your email address in known data leaks.
Election system hacks: We're focused on the wrong things [not SQL injections, which are the problem]
Passport applicant finds massive privacy breach
Researcher reports XSS hole in Google France
How a Hacker Can Steal [Cryptocurrency] [with CSRF]
Former Equifax CEO blames breach on a single person who failed to deploy patch [trusted input]
No, Panera Bread Doesn’t Take Security Seriously [insufficient auth]

Security

Security

Security

Input & Output

Input & Output

Input & Output

Input & Output

Input & Output

SQL Injection

SQL Injection

SQL Injection

SQL Injection

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Cross-Site Scripting

Insufficient Authorization

Insufficient Authorization

Insufficient Authorization

Insufficient Authorization

Insufficient Authorization

Insufficient Authorization

Insufficient Authorization

Insufficient Authorization

Session Prediction/​Hijacking

Session Prediction/​Hijacking

Session Prediction/​Hijacking

Session Prediction/​Hijacking

Session Prediction/​Hijacking

Session Prediction/​Hijacking

Session Prediction/​Hijacking

Cross-Site Request Forgery

Cross-Site Request Forgery

Cross-Site Request Forgery

Cross-Site Request Forgery

Cross-Site Request Forgery

Cross-Site Request Forgery

Insecure Static Content

Insecure Static Content

Insecure Static Content

Insecure Data Storage

Insecure Data Storage

Insecure Data Storage

Insecure Data Storage

Insecure Data Storage

Risk Mitigation

Risk Mitigation

Risk Mitigation

Risk Mitigation

Example Security Breaches

Session Prediction/Hijacking

Session Prediction/Hijacking

Session Prediction/Hijacking

Session Prediction/Hijacking

Session Prediction/Hijacking

Session Prediction/Hijacking

Session Prediction/Hijacking