The Internet is fundamentally just a collection of connected computers.
They are connected so they can exchange information with each other. In order to be part of the Internet, a computer must be connected to one (or more) others.
Computers exchange information by passing it along between themselves.
Your computer is probably connected to only one other: its gateway. All of your Internet traffic flows in and out through the gateway.
Other computers may be connected to several others.
These must know how which “direction” to send information to get it to its destination: this is called routing and these computers are called routers.
You likely have a router at home: it connects to several of your computers (with ethernet or wi-fi), and to a gateway outside that's provided by your ISP. [It may be integrated with the cable/DSL modem, but it's still doing the job of a router.]
The router is basically a little special-purpose computer that passes along info for you.
Routers also come in industrial sizes, connecting many computers at very fast speeds, but the ideas are all the same.
Internet-connected computers can also be given names. For example, there's a computer (or more than one) somewhere called www.sfu.ca
.
These are a human-friendly way to refer to some computer out there that's connected to the Internet.
In order to get information between you and another computer (and back), all of the routers in between have to pass along the information.
Each connection might be using a different connection technology. There might be multiple paths: this lets the Internet be fault tolerant. If one connection goes out, data can go a different way.
As long as there's a path (at least one) and all of the routers know what they're doing, data should get through.
There are many ways computers can be connected to pass information between themselves. There are various technologies for different situations (or historical reasons).
Connection speeds are measured in bits per second (or kilobits/​megabits/​gigabits per second).
A bit is a single 0 or 1 (or on/off or however you want to think of it).
One byte is 8 bits. These are usually abbreviated as bits = b
, bytes = B
(but people aren't 100% consistent).
Some rough size guesses:
There are a few common connections for personal computers (i.e. the kind of computer you actually use).
Ethernet: The typical wired connection. Speeds usually 100 Mb/s or 1 Gb/s. (Can be 10 Gb/s but probably not to a personal computer.) Cable length <100 m.
Wi-Fi: The usual wireless connection. Speeds 50 Mb/s to 500 Mb/s. Distance depends on what's in the way but usually also <100 m. Speed depends on the technology and also what's between you and the router.
There are a few other connection methods too, like powerline networks. Basically, any way you can get data around your house will do, as long as your computer and router can both speak it.
Phone connections (modern ones, at least) carry Internet data: LTE (3G, 4G, 5G), HSPA. Phones can also use Wi-Fi when it's available.
At home, these are used to connect your computer to your router (or DSL/​cable modem). On campus, they connect you to a large router in SFU's network.
On the larger scale, the ideas are the same but everything is higher-capacity.
Fibre optic cables use strands of glass to transmit light pulses. These can transmit very high rates of data over long distances: Tb/s over many kilometres.
Once computers have a way to pass data back and forth, we need some way to make sense of it.
A protocol is a way that we agree to have computers exchange information so that both ends can understand the contents. As long as everybody is following the same rules about how to encode the information, they should be able to communicate.
An analogy: we use English words (down
, table
, synthesis
) to mean a certain things. We all know what the word means, so we can use it to communicate an idea.
Computers can do the same: a certain pattern of bits means I want to send an email to my friend
, and as long as both sides agree on that, it should work.
Different kinds of information we want to pass around the Internet have different protocols. Some examples…
HTTP is an important protocol for this course.
Basically, content is on the web
is synonymous with can be accessed using HTTP
. HTTP allows transmission of ideas like give me this web page…
and here is the page you requested…
or I can't find that page
.
When asking for a web page, the web browser on your computer is acting as an internet client: it initiates the connection to the server.
Once it connects, it makes a request for some content. HTTP defines how the request is encoded (in bits) so the server can understand it.
The server refers to both another computer (like www.sfu.ca
) and the program running on it that responds to requests.
The server will find (or generate) the requested content and send it back to the client. This is the response, also an HTTP message.
The response could also be a “not found” or “redirect” or other status/​error messages.
There are other things that can act as web clients (or sometimes called user agents).
e.g. the Google page indexer, the HTML validator (later), other automated tools.
Mobile apps also often talk to a central server with HTTP, and are thus acting as web clients as well.
HTTP requests can also be made with the HTTPS protocol. (S for secure
)
HTTPS is the same as HTTP but content is encrypted, so none of the computers between client and server can understand on the conversation.
The computers between the client and server can see the entire conversation go by: if it's not encrypted, they can also understand it.
A URL (Uniform Resource Locator) is the full address of a piece of content on the web.
Like
.http://www.w3.org/html/
Uniform: a URL works from anywhere, and “locator”: it's used to locate the resource.
Resource: a piece of content that is accessible at a URL. Could be an HTML page, or image, or any other content.
Locator: it's used to actually find (and request) the resource.
The basic parts of a URL:
The scheme indicates the protocol that will be used: http:
and https:
URLs refer to web content.
You might have seen a few other URL schemes, possibly mailto:
for indicating an email address, sftp:
for a secure file transfer, smb:
for a Windows file share.
Each implies a different protocol, but is a way to refer to (and locate) a specific piece of content.
The server (or host name) is the name of the computer acting as the server: the one that will be contacted with the request.
The path indicates which page on the server we're referring to.
We will be seeing three major pieces of technology in this course. They are also the three technologies that that make web pages work.
HTML: HyperText Markup Language.
HTML is how the content of web pages is specified. It contains headings, paragraphs, lists, etc.
… but nothing about what those pieces of content look like.
CSS: Cascading Style Sheets.
CSS is used to indicate the appearance of HTML content on the screen.
e.g. it can express things like headings are twice as big as other text, bold, and centred
.
JavaScript is a programming languaged used to indicate behaviour of parts of the page.
e.g. when the user clicks a button, something else on the page changes.
Why do we separate content, appearance, and behaviour? And why do we need three different languages for these things?
Hopefully it will become more clear as the course goes on, but…
These tasks need different languages because the jobs are so different.
It would be awkward to express this is a paragraph
and paragraphs are green
with the same language. Each of the three languages we'll see is good at its job.
We will often use the same CSS and JavaScript on multiple HTML pages.
If we write them in separate files, it's easier to reuse them. For example, use the same appearance information (CSS) for all of the pages (HTML) on our site, and make sure they have a unified look.
Using the same CSS and JavaScript across our web sites can be faster: users only have to download the appearance/​behaviour information once, even if they go to 10 different pages.
It will be easier to maintain and update: there's only one place to fix a problem.
Maybe we want separate people to handle each part: an author for the HTML, a designer for the CSS, and a programmer for the JavaScript.