The Internet and the World Wide Web

CMPT 165, Fall 2019

The Internet

The Internet is fundamentally just a collection of connected computers.

They are connected so they can exchange information with each other. In order to be part of the Internet, a computer must be connected to one (or more) others.

The Internet

Computers exchange information by passing it along between themselves.

Your computer is probably connected to only one other: its gateway. All of your Internet traffic flows in and out through the gateway.

The Internet

Other computers may be connected to several others.

These must know how which “direction” to send information to get it to its destination: this is called routing and these computers are called routers.

The Internet

You likely have a router at home: it connects to several of your computers (with ethernet or wi-fi), and to a gateway outside that's provided by your ISP. [It may be integrated with the cable/DSL modem, but it's still doing the job of a router.]

The Internet

The router is basically a little special-purpose computer that passes along info for you.

Routers also come in industrial sizes, connecting many computers at very fast speeds, but the ideas are all the same.

The Internet

Internet-connected computers can also be given names. For example, there's a computer (or more than one) somewhere called www.sfu.ca.

These are a human-friendly way to refer to some computer out there that's connected to the Internet.

The Internet

In order to get information between you and another computer (and back), all of the routers in between have to pass along the information.

Each connection might be using a different connection technology. There might be multiple paths: this lets the Internet be fault tolerant. If one connection goes out, data can go a different way.

The Internet

As long as there's a path (at least one) and all of the routers know what they're doing, data should get through.

How Computers Connect

There are many ways computers can be connected to pass information between themselves. There are various technologies for different situations (or historical reasons).

Connection speeds are measured in bits per second (or kilobits/megabits/gigabits per second).

How Computers Connect

A bit is a single 0 or 1 (or on/off or however you want to think of it).

One byte is 8 bits. These are usually abbreviated as bits = b, bytes = B (but people aren't 100% consistent).

How Computers Connect

Some rough size guesses:

An email: a 5 kilobytes = 40 kilobits (no attachments).
A camera image: 3 megabytes = 24 megabits.
A web page: a few kilobytes to a few megabytes (depending on images, etc.)
An MP3 (or streamed) song: 5 MB = 40 Mb.
A movie: 1 GB = 8 Gb.

How Computers Connect

There are a few common connections for personal computers (i.e. the kind of computer you actually use).

Ethernet: The typical wired connection. Speeds usually 100 Mb/s or 1 Gb/s. (Can be 10 Gb/s but probably not to a personal computer.) Cable length <100 m.

How Computers Connect

Wi-Fi: The usual wireless connection. Speeds 50 Mb/s to 500 Mb/s. Distance depends on what's in the way but usually also <100 m. Speed depends on the technology and also what's between you and the router.

How Computers Connect

There are a few other connection methods too, like powerline networks. Basically, any way you can get data around your house will do, as long as your computer and router can both speak it.

How Computers Connect

Phone connections (modern ones, at least) carry Internet data: LTE (3G, 4G, 5G), HSPA. Phones can also use Wi-Fi when it's available.

How Computers Connect

At home, these are used to connect your computer to your router (or DSL/cable modem). On campus, they connect you to a large router in SFU's network.

How Computers Connect

On the larger scale, the ideas are the same but everything is higher-capacity.

Fibre optic cables use strands of glass to transmit light pulses. These can transmit very high rates of data over long distances: Tb/s over many kilometres.

Protocols

Once computers have a way to pass data back and forth, we need some way to make sense of it.

A protocol is a way that we agree to have computers exchange information so that both ends can understand the contents. As long as everybody is following the same rules about how to encode the information, they should be able to communicate.

Protocols

An analogy: we use English words (down, table, synthesis) to mean a certain things. We all know what the word means, so we can use it to communicate an idea.

Computers can do the same: a certain pattern of bits means I want to send an email to my friend, and as long as both sides agree on that, it should work.

Protocols

Different kinds of information we want to pass around the Internet have different protocols. Some examples…

SMTP: (Simple Mail Transfer Protocol) The way email is transferred from the sender to the recipient's inbox.
IMAP: (Internet Message Access Protocol) One way to get email from your inbox so you can read it. Others: Exchange, the Connect web interface.

Protocols

NTP: (Network Time Protocol) Used to synchronize computers' clocks with some authoritative source.
Games: any networked/multiplayer game has some protocol that is used to communicate what's happening in the game.
HTTP: HyperText Transfer Protocol. The protocol that runs the World Wide Web.

HTTP

HTTP is an important protocol for this course.

Basically, content is on the web is synonymous with can be accessed using HTTP. HTTP allows transmission of ideas like give me this web page… and here is the page you requested… or I can't find that page.

HTTP

When asking for a web page, the web browser on your computer is acting as an internet client: it initiates the connection to the server.

Once it connects, it makes a request for some content. HTTP defines how the request is encoded (in bits) so the server can understand it.

HTTP

The server refers to both another computer (like www.sfu.ca) and the program running on it that responds to requests.

The server will find (or generate) the requested content and send it back to the client. This is the response, also an HTTP message.

The response could also be a “not found” or “redirect” or other status/error messages.

HTTP

There are other things that can act as web clients (or sometimes called user agents).

e.g. the Google page indexer, the HTML validator (later), other automated tools.

Mobile apps also often talk to a central server with HTTP, and are thus acting as web clients as well.

HTTP

HTTP requests can also be made with the HTTPS protocol. (S for secure)

HTTPS is the same as HTTP but content is encrypted, so none of the computers between client and server can understand on the conversation.

HTTP

The computers between the client and server can see the entire conversation go by: if it's not encrypted, they can also understand it.

URLs

A URL (Uniform Resource Locator) is the full address of a piece of content on the web.

Like http://www.w3.org/html/.

URLs

Uniform: a URL works from anywhere, and “locator”: it's used to locate the resource.
Resource: a piece of content that is accessible at a URL. Could be an HTML page, or image, or any other content.
Locator: it's used to actually find (and request) the resource.

URLs

The basic parts of a URL:

URLs

The scheme indicates the protocol that will be used: http: and https: URLs refer to web content.

You might have seen a few other URL schemes, possibly mailto: for indicating an email address, sftp: for a secure file transfer, smb: for a Windows file share.

Each implies a different protocol, but is a way to refer to (and locate) a specific piece of content.

URLs

The server (or host name) is the name of the computer acting as the server: the one that will be contacted with the request.

The path indicates which page on the server we're referring to.

HTML, CSS, JavaScript

We will be seeing three major pieces of technology in this course. They are also the three technologies that that make web pages work.

HTML, CSS, JavaScript

HTML: HyperText Markup Language.

HTML is how the content of web pages is specified. It contains headings, paragraphs, lists, etc.

… but nothing about what those pieces of content look like.

HTML, CSS, JavaScript

CSS: Cascading Style Sheets.

CSS is used to indicate the appearance of HTML content on the screen.

e.g. it can express things like headings are twice as big as other text, bold, and centred.

HTML, CSS, JavaScript

JavaScript is a programming languaged used to indicate behaviour of parts of the page.

e.g. when the user clicks a button, something else on the page changes.

Why Separate?

Why do we separate content, appearance, and behaviour? And why do we need three different languages for these things?

Hopefully it will become more clear as the course goes on, but…

Why Separate?

These tasks need different languages because the jobs are so different.

It would be awkward to express this is a paragraph and paragraphs are green with the same language. Each of the three languages we'll see is good at its job.

Why Separate?

We will often use the same CSS and JavaScript on multiple HTML pages.

If we write them in separate files, it's easier to reuse them. For example, use the same appearance information (CSS) for all of the pages (HTML) on our site, and make sure they have a unified look.

Why Separate?

Using the same CSS and JavaScript across our web sites can be faster: users only have to download the appearance/behaviour information once, even if they go to 10 different pages.

Why Separate?

It will be easier to maintain and update: there's only one place to fix a problem.

Maybe we want separate people to handle each part: an author for the HTML, a designer for the CSS, and a programmer for the JavaScript.