When doing web development, you can never ignore security. Your code is open to attack by anyone, good or bad.
“99% secure” is insecure.
The basic message: only allow users to do exactly what you want to allow. Do your best to mitigate if their account gets hacked, etc.
Remember that anything that destroys data is affecting you(r database), not the user's computer.
There are a few things specific to worry about for web development. Let's talk about some…
A web site can't trust any of its input: anything might be malicious. All output should be produced very carefully.
Maybe this is the robustness principle again.
It's easy to assume you know something about your input.
<select name="foo"> <option value="1">Option One</option> <option value="2">Option Two</option> <option value="3">Option Three</option> </select>
You should get a form submission with only these choices, right?
You should know by now that requests can be generated manually with foo=5
or foo=six
or any other value.
The value must be verfied server-side before storing it/processing it.
You don't know that your input looks right until you validate that it does.
Some things you might want to check about your input:
Remember where output is being sent and escape accordingly. e.g. in HTML, <
must be produced by sending a &lt;
.
Look for a library function that knows the rules for HTML, SQL, URLs, ….
General cause: building SQL queries from strings (like user input). A special case of code injection.
Generally caused by your code doing something like this:
id = request['id'] query = 'UPDATE posts VALUES … WHERE id=' + id
… but the user made a request with id
like:
0; DROP DATABASE foo
0 or posts.author='admin'
Or any other variant that lets an outside user do arbitrary DB queries.
All input included in queries must be properly escaped.
Best solution: use an ORM and let it do it. (But watch for raw-SQL functions where you're in charge.)
Or look for a function that builds a query safely.
build_query( template='SELECT … WHERE userid=? AND password=?', args=[user, pw])
If you must, look for a function that escapes for SQL, but you must remember to use it every time. You'll probably remember 99% of the time, which isn't enough.
(Or XSS.)
You get some content from the user and then put it into your HTML. Injection can happen in a similar way to SQL: they give some HTML tags where you just expected “text” that you put un-escaped into a template.
XSS happens when they inject JavaScript logic. e.g. a blog comment:
My comment is <script>alert('foo')</script>
Literal output: code runs for anyone who visits the page.
You wanted to output:
My comment is &lt;script&gt;alert('foo')&lt;/script&gt;
If an attacker can get JS code into visitors' pages, then their browsers can be made to do anything JavaScript code can do: modify the page (to change links to malicious destinations), make requests to the server (for additional info the attacker wants), ….
Solution #1: escape all of your output.
Most template systems do this by default, which is good. (If your templates don't, choose a different template system.)
Find the function to escape special characters to entities if you need it.
Safe, but doesn't allow the user to do any formatting. Even line breaks unless you do something about it:
Welcome to my profile. It's <em>nice</em> here.
Will display as:
Welcome to my profile. It's <em>nice</em> here.
Could easily make that:
Welcome to my profile.<br/><br/>It's &lt;em>nice&lt;/em> here.
Solution 2: let the user do some formatting (e.g. in comments), but with something less powerful than HTML.
Let them enter something like Markdown or Wikitext. Those can't express anything dangerous, so no problem. They also won't produce invalid HTML content.
Make sure the library escapes entered HTML code: not all do by default.
Solution 3: let the user enter HTML, but make sure it's safe.
This is hard. If you do it yourself, it will be incomplete. Look for a library like Bleach or HTML Purifier.
If you want your users to use a WYSIWYG editor like TinyMCE, this might be necessary.
There are many places JS code can hide in a page, and these can only protect the text they process.
Another hiding place: you let the user enter a URL and don't validate it.
<a href="javascript:alert('foo')">…</a>
It can be hard to know them all.
The Content-Security-Policy
header can be used to prevent loading resources that originate in places you didn't expect. That can mitigate any XSS injections that do happen.
e.g. Can express that <script src="…">
must from from your site or cdnjs.com
. Others won't load.
e.g. Can express that <form action="…">
must point to your domain, so sensitive info won't be sent elsewhere.
It's easy to forget all the ways requests can arrive at a URL. Many developers implicitly assume that URLs are hard to find, and that users will arrive through your menu structure.
e.g. If a users requests https://example.com/orders/12345
, then they must have seen the menu where you showed them:
<a href="/orders/12345">View order #12345</a>
But if I can see my (private) info at …/orders/12346
, then it doesn't take much insight to modify the URL to …/orders/12345
.
Even a hard-to-guess URL can be found in log files, or in the browser history at an Internet café.
<a href="/orders/efb5e6a7-ed84-4174-8b10-861d984ce6b0">…</a>
The result: you must check each request to make sure the requesting user is actually allowed to see/modify/whatever the data you're about to send. Don't assume that because they have the link they are allowed.
An extremely common problem in student code.
It's very easy to write the equivalent of:
def view_order(request, order_id): if not request.is_logged_in(): return forbidden_403() order = Orders.find(id=order.id) if not order: return not_found_404() ⋮ # display order
Doesn't check who owns the order. Should be like:
def view_order(request, order_id): if not request.is_logged_in(): return forbidden_403() order = Orders.find(id=order.id, owner=request.user) if not order: return not_found_404() ⋮ # display order
Don't confuse authentication (who is this person?) with authorization (is this person allowed to see/do this?).
Just because they have logged in doesn't mean they can perform a particular action.
It's easy to check your code for this: every controller/view should start with an authorization check.
Project suggestion: do a quick audit of others' code for a check in each controller.
Don't forget to do this for the “other” content: the things you don't expect to be accessed directly. e.g. images, AJAX requests, popup “more info”, “hidden” admin interfaces, …
These must be checked as carefully as any other request.
Also watch for smaller information leaks, where information should be displayed but only to some users who view a page. Remember to think about parts of a page, not just whole URLs.
e.g. student can see marks; instructor can see marks, who marked it, and when.
e.g. /cmpt470/abc12/marks
returns a 403 forbidden vs 404 not found. Reveals who is registered in the course?
HTTP is mostly stateless: each request is independent of the next. There's no obvious state to track a user.
But we need that to track a user who has logged in as they navigate the site. There must be a session token that determines who the user is.
Usually the token is an HTTP cookie that's stored in the database and sent to the browser when a user logs in:
Set-Cookie: sessionid=034deda7bde4a1fa; Domain=example.com
… and sent back by the browser with subsequent requests.
Cookie: sessionid=034deda7bde4a1fa
The security implication: anyone who sends a request with that session token is that user. What if it's not actually that person?
Problem 1: session prediction where the token is easy to guess. e.g. this is a disastrously bad token:
Set-Cookie: username=ggbaker
This is only a little better:
Set-Cookie: sessionid=82
… because I could guess that sessionid=81
is also a thing.
The token should be randomly generated (by a good random number generator), and values should be sparsely used.
irb> require 'securerandom' irb> sessionid = SecureRandom.base64(24) irb> puts sessionid pbgiqzsVWNv3LPDul4N4qG2SA3uWGW5v
In your database, translate session token → user.
All frameworks will do this for you: let them.
Problem 2: session hijacking where an attacker finds the session token and can use it. Could be snooped on the network; in Referer
header; found in Internet café's browser history.
Hard to prevent perfectly, but can decrease the risk significantly…
[Frameworks' session management will have settings that correspond to these.]
Referer
header, or logged.Set-Cookie: …; Expires=Wed, 1 Sep 2021 00:00:00 GMT
Set-Cookie: …; Secure
This one isn't well understood by many developers. Here's the scenario…
example.com
with their password. 😃badguy.com
which contains: 😇
<form action="http://example.com/password_change" method="post"> <input type="hidden" name="pw" value="newpass" /> <input type="text" /> <input type="submit" value="Search" /> </form>
Searchbutton. 😈
POST
request to example.com
with user's correctly authenticated session. 😲example.com
. 😭It's even easier to trigger a GET
request:
<img src="http://example.com/password_change?pw=newpass" />
That could even be embedded in a discussion forum.
Obviously, replace change password
with any other action the user could take: disable account, add a friend, transfer credits, …
Prevention for GET
requests: don't have any side-effects. That way, if a request is made there's no harm.
For POST
requests, it's trickier. You must ensure the user came from your form before taking any action.
Standard defense: generate a random token for each user's session and include it in any POST
forms/requests:
<form action="/password_change" method="post"> <input type="hidden" name="csrf" value="6c3d615b94fe1444" /> <input type="password" name="pw" /> <input type="submit" value="Change Password" /> </form>
Ensure that the token is correct before doing anything.
Frameworks will help: automatically generate the token, and catch any POST
that doesn't have it before you ever see it.
May be off by default since it breaks all POST
s if the form templates don't include the token.
Common pattern: let the user upload a file. Store it on disk (since it's a file). Serve it as static content (since it's faster).
But then there are no authorization checks. Profile picture: probably okay. Private document: bad.
Best solution: create a controller/view with logic to check authorization properly, and then send the appropriate media type and file contents.
Causes overhead: your framework probably can't send a file as fast as Nginx. Must be sent from server doing dynamic processing, not a content distribution network.
If that overhead isn't acceptable, can generate a random hard-to-guess URL and serve that as static.
https://cdn.example.com/images/MRQ/Cy9/Oh1Jk0KqFnE2YX.jpg
Anyone can still access, but has to know the URL. Most big sites do this for images: Facebook, YouTube, LinkedIn, ….
How is your data stored? Who has access? An ultra-secure web app doesn't help if your database is accessible to the whole Internet with the default password.
Check who can access the database, log in to server, physically access server, access backups,….
Not all data is created equal.
If you expose your users' real names, that's bad. If you expose their credit card numbers, that's much worse.
Don't store sensitive data (e.g. credit card numbers) if you can avoid it. Consider different storage strategies (e.g. more secure tier of database) for the most sensitive data.
One important case: passwords.
People often use the same password across multiple sites. A data breach on one can affect others.
Store the password? No. Never.
Store a hash of the password? Better, but most cryptographic hash functions are too fast to compute: can pre-compute a dictionary at millions to billions per second.
Salt and hash password? Can still guess millions to billions per second.
The real answer? You are dangerously bad at cryptography. Don't do it. Use the password handling of your framework and hope they did it right.
Possibly-correct answer: salt and use a deliberately-slow cryptographic hash function like bcrypt.
Even better: implement two-factor authentication.
Even with all of the best development practices, there is the possibility of something going wrong. There could be a problem with your code, a user's password being guessed, etc.
If there has been a security breach, how can you handle it and recover?
If data is leaked, it's not coming back. You need to prevent/minimize ahead of time.
Give users access to the minimum set of data they need. That way, a hacked account has the smallest impact possible. You might trust your users, but that doesn't mean you should trust their account.
e.g. staff might not need to see all of the users in the system, only their clients, users in their region, etc.
What happens if a user's account is hacked and the attacker starts deleting/modifying data?
Will you even notice? Can you recover and restore the data to the pre-hacked state?
Some ways recovery might be possible:
undofunctionality in your app.