A markup language is a way to describe a document, using only plain (unformatted) text.
Whatever we want to say about the content (what it is, formatting, etc) has to be expressed using plain text.
An example markup language: wikitext.
Wikitext is simple markup language meant to be fast to learn and produce pages with relatively simple formatting. Wikipedia pages are generated from their wikitext dialect.
Some example wikitext markup:
Line with an '''important''' word. * point one * another point
When displayed that would look like:
Line with an important word.
- point one
- another point
But we aren't really concerned with wikitext: it's just a good first example of markup.
The markup language we care about is HTML.
To make a markup language work, we need somewhere to write the markup language code, and some way to turn that into our final output.
HTML (and CSS and JS) files contain only characters: no formatting. That is, they are text files.
Other things that are text files: CSV, markdown, Java, Python (and every other programming language).
Creating text files is done in a text editor. These are simple tools that let you enter characters, and are designed for working with programming languages and other plain text files.
Different text editors will display text with different fonts/​colours: those don't matter since those aren't part of the file. All that ends up being saved are the characters you type.
You can find links to download a text editor in the Study Guide. You'll need one. Word processors are not appropriate.
Basically: every text editor does fundamentally the same job. Find one you like.
HTML files are text files, saved with the extension .html
.
As long as you name the file something.html
, it will be treated as HTML.
Here are the basic contents for an HTML file:
<!DOCTYPE html> <html> <head> <meta charset="UTF-8" /> <title></title> </head> <body> </body> </html>
We can type (or copy-and-paste) that into a text editor, save as a .html
file, and open it to see a blank page in the web browser.
The code in the
are HTML tags.<…>
Tags come in opening and closing pairs, like
(except for a few we'll talk about later).<body>…</body>
The stuff between the tag's opening and closing are the contents.
The opening tag + contents + closing tag form an element.
Demo:
<title>
. Reload.<body>
, add a <p>
element with some text. Reload.Notice that all tags are nested
together so that one is either entirely inside or outside another. They cannot partially overlap.
These are one element inside another; one element after another; opening/​closing tags incorrectly overlapped.
<x><y></y></x>
<x></x><y></y>
<x><y></x></y>
Let's look at the things we saw in that first .html
file…
<!DOCTYPE html>
: The document type declaration which indicates that we're using HTML5. Not actually and HTML tag, but some meta-information about the document.<html>…</html>
: the document tag that contains the entire HTML page. Starts first and ends last.<head>…</head>
: information about the page. It will only contain the <meta> and <title> for now.<meta charset="UTF-8" />
: the character encodingfor the file, which indicates how the text is encoded into bits. For us, this tag will always appear exactly like this.
<title>…</title>
: The main title for the page. Used in the browser window's title bar, for bookmarks, etc.<body>…</body>
: The actual content of the page. Displayed in the main part of the browser window.Those tags will be on every page, arranged just like that.
But, we'll need some more tags to put actual content on our page…
<h1>
: the main heading on the page. Should probably be exactly one on every page, with the same text as the <title>
.<p>
: a paragraph. Should be used to enclose each paragraph on the page.<em>
: emphasized text. Part of a paragraph (or heading or similar) that you want to give emphasis: important words.<ul>
: an unordered list. A bulletted list where the order of items isn't important. Can contain only list items.<li>
: a list item. Must go in a <ul>
(or other list structure).Now we have enough to recreate the Markdown example from earlier in HTML:
<p>Line with an <em>important</em> word.</p> <ul> <li>point one</li> <li>another point</li> </ul>
Notice that all of the formatting must be done with HTML (and later CSS). That is, a line break in the HTML code has no effect on the browser display.
If you look in the HTML reference we're using (or any good HTML documentation), you'll see that the HTML tags are defined by their meaning not appearance.
e.g. <em>
is described as like emphasized text
, not italics
.
When choosing tags for content, pay attention to the meaning of tags, not what they look like. We'll make our content look the way we want later.
The <meta>
element we used in the first HTML page we had:
<meta charset="UTF-8" />
What's the charset="UTF-8"
part? None of the other tags had that.
<meta charset="UTF-8" />
Here, charset
is an attribute. Attributes change the behaviour or meaning of an element.
Attributes have a value: UTF-8
in this example.
Another example: there is an <abbr>
tag to indicate abbreviations.
<abbr>SFU</abbr>
… but it's not very interesting to say that this is and abbreviation. We'd also like to specify what it's an abbreviation for.
The title
attribute can be used with <abbr>
to give the full expansion of the abbreviation:
<abbr title="Simon Fraser University">SFU</abbr>
Browsers generally show the title as a tooltip if you mouse-over the text: SFU.
Another example: the lang
attribute can be used on any element to specify the natural language of the content.
There are language codes for (every?) language that you're likely to need.
For example, we probably should start every page with this document tag, because we're writing English:
<html lang="en">
This will let search engines correctly categorize the page, let the browser offer appropriate automatic translations, etc.
If we switch languages somewhere, we can specify that as well.
<p>He demanded <q lang="fr">Quoi?</q> and I was shocked.</p>
The <a>
tag is used to create a link, and the href
attribute gives the destination.
<a href="http://www.sfu.ca/">Simon Fraser University</a>
<a href="http://www.sfu.ca/">Simon Fraser University</a>
The value for href
is a URL. More later about what values can go there (besides complete absolute URLs).
There was one other weird thing about our meta tag: it didn't seem to be closed:
<meta charset="UTF-8" />
i.e. there was no </meta>
anywhere.
<meta charset="UTF-8" />
The <meta>
tag is an empty tag: one that has no contents.
Since it's empty, it must be closed right away. The
is how we will specify that. It's a signal that the tag is closing now and there are no contents/>
Older versions of HTML allowed you to explicitly close it right away, but this is not allowed in HTML5:
<meta charset="UTF-8"></meta>
… but this is probably the right way to think about it: a tag that closes as soon as it opens.
Another empty tag: <img />
which is used to insert an image onto a page.
<img src="group.jpg" alt="a group of people" />
The src
attribute is used to give the location where the image can be found. In this case, a file in the same directory as the .html
. (Also a URL: again, more later.)
As we said: HTML tags focus on the meaning or purpose or type of content they hold.
HTML is a semantic markup language. Semantic: to do with meaning.
Everything we have expressed in HTML has been semantic: <p>
= paragraph, <h1>
= level-1 heading, <em>
= text that's emphasized, etc.
The opposite: visual or presentational markup, as you would often use in MS Word or similar, where you specify bold
or 16 pt font
. Word's styles
are generally semantic.
When writing HTML, your job is to worry about the meaning, not the appearance, of the content.
Describe the meaning of the content you have as best possible with the tags available. Pay attention to the meanings given in the reference.
We will worry about appearance later with CSS.
CSS is where you can express things like all
.<p>
elements should appear with font X, left-justified, red
Why bother with a semantic markup language and another layer for presentation?
This should make it easier to maintain and update the site: if we want every <p>
to look different, just update one thing the CSS.
Semantic markup should help search engines: they can extract a lot of meaning from just the markup and better categorize your pages.
It will make it easier to adapt pages to different situations.
For example, on a mobile browser, you might want a different font size, or arrangement of the content. We can express this in CSS, but the semantic content doesn't change: a heading is still a heading.
Let's put together a more realistic example: we will use HTML to mark up a recipe.
Parts of the page we need to decide on markup for: serving size (serves 4
), introduction (This is a recipe adapted from my seriouseats.com…
, etc), ingredients, steps.
Things I expect to do in the demo:
<h2>
s.<cite>
s.<section>
s to identify parts of the page.Results will be posted on the course web site.
Two problems are going to start happening as we start making real pages and trying to match meaning to markup.
For (1), we will want to make them look different later with stylesheets.
e.g. a list that is the table of contents, vs the recipe steps. Both are (ordered) lists, but we might want to style the intro and servings differently.
We can distinguish instances of the same tag with class
or id
attributes. Both attributes let you refine the meaning of tags.
<p id="serving">Serves four.</p>
The value for class
and id
can be any word we want: here, we're suggesting that this paragraph contains something about servings
.
In this example, we are indicating that there is something different about the last list item:
<ol> <li>Combine ingredients</li> <li>Bake until done.</li> <li class="optional">Garnish.</li> </ol>
The word optional
suggests (to people, not the browser) how it's different.
In any case, the value for class
and id
should be a meaningful word.
The difference: an id
value must be unique on a page.
We can have multiple things with class="optional"
on a page, but never more than one id="serving"
per page.
You should choose appropriately: if it's a piece of content that will only occur once, use id
. Otherwise, class
.
We will (soon) be able to use class
and id
to select certain tags for appearance changes in CSS.
We solved one of the problems we were having above (tags not specific enough), but not content that doesn't match the meaning of tags in HTML
.
How do we handle content that doesn't match any of the tags in HTML?
First step: have a good look in the HTML Reference: there might be a tag you didn't know about and will work.
e.g. maybe we want to highlight the quantities of ingredients (2 cups
) from the item (flour
). I can't find any tag I think is relevant.
But if still nothing fits, there are two generic tags that can be used: <div>
and <span>
.
These have no meaning on their own and should be given one by a meaningful class
or id
.
So our example might get markup like:
<li><span class="quantity">2 cups</span> flour</li>
The difference: <div>
is block-level content (or flow content).
That is, it can go inside the <body>
directly, and is generally presented below any previous content. Other block-level tags we have seen: <h1>
, <p>
, <ul>
.
The <span>
is inline (or phrasing) content: it goes inside a block-level tag and is generally beside the next/​previous content.
Other inline tags we have seen: <em>
, <abbr>
, <a>
, <q>
, <img>
.
General rules for HTML:
<body>
and (sometimes) other blocks.Another example: suppose we want a collection of social media links like this:
Follow us:
First, consider existing tags for the job: maybe <p>
or <section>
or <footer>
? No, not really.
So, we reach for a generic tag: this is probably block-level content, so <div>
, with an appropriate class
.
<div class="social">Follow us: <a href="http://twitter.com"><img src="twitter.svg" alt="Twitter"></a> โฎ </div>
Summary:
class
or id
value that suggests some meaning (maybe for any tag, but definitely for a generic tag).We have heard many rules about HTML. e.g.
<ul>
must contain only <li>
s.But what if we get something wrong?
The web browser will not tell us: it will do its best to display broken HTML, so it works
on as many pages as possible.
But don't really know if the next browser will โcorrectโ our mistakes the same way. Will our site be horribly broken in another browser? What about when Google tries to index it?
Solution: actually write correct HTML. If the browser isn't going to give us feedback when we make a mistake, we need something else for that.
An HTML validator will take our HTML and check against the rules of HTML.
We can use the HTML validator online by going to https://validator.w3.org/
.
If our file has been uploaded, we can give its URL. If not, we can upload the file from our computer or copy-and-paste the code from the text editor.
Expectation from now on: any HTML you produce for the course should pass the validator without errors.
(Warnings are okay, but it's probably not a bad idea to fix them.)
An overall principle for any computer system where two things have to communicate: the robustness principle:
Be conservative in what you do, be liberal in what you accept from others.
Web browsers are holding up their end of the robustness principle by doing their best with not-quite-right HTML: be liberal in what you accept
.
But if we (as HTML authors) rely on that behaviour, we're taking a risk that the next tool won't handle our mistakes the same way.
We need to follow our end of the deal: be conservative in what you [send]
. If we do, we maximize the chances that everything works together.
Validating our HTML is part of that. Producing semantically-meaningful markup is another.
We have seen two contexts where we can give a URL in HTML:
<a href="http://www.sfu.ca/">SFU</a> <img src="http://example.com/img.jpg" alt="a group" />
… and there are a few more that we'll see later.
The URLs that start with a scheme/​protocol (like http://
) are absolute URLs.
These indicate where the content is on the web (or elsewhere with another protocol), and take you to the same place no matter where you start on the web.
The other option is relative URLs which indicate a location relative to the URL of the current page.
We have actually seen one of these:
<img src="group.jpg" alt="a group of people" />
There are a few things you can do with relative URLs
Resources in the same folder/​directory as the .html
file: just give the file name.
<img src="group.jpg" alt="a group of people" />
That refers to the image group.jpg
in the same folder as the HTML file.
A relative URL can move into a folder: suppose we have an .html
file beside a directory images
and want to refer to an image file in there.
Give the name of the directory, slash, and the name of the file:
<img src="images/vacation.jpg" alt="our trip" />
Or, if we create a directory named courses
for some of our HTML and put a file cmpt165.html
in there, we can get to it with:
<a href="courses/cmpt165.html">CMPT 165</a>
We can also move up a level and out of a directory. The special directory name
means ..
go up a level
.
e.g. to go from the cmpt165.html
page back up to the menu, one level up in the directory hierarchy:
<a href="../menu.html">The Menu</a>
Suppose we have files organized like this:
On cmpt165.html
, these make sense:
<a href="math.html">Math</a> <a href="last-semester/geog.html">Geography</a> <a href="../menu.html">Menu</a> <img src="summer/poster.svg" alt="project poster" />
The biggest benefit of relative URLs: they will work on your computer, and after you have uploaded to the server, and if you upload your pages to a different server.
Absolute URLs aren't as flexible (and thats their strength too: they work the same everywhere).
Both absolute and relative URLs can be used anywhere HTML expects a URL.
Notes:
/
.page.html
and Page.html
are different URLs.Advice: keep your filenames lower-case and replaces spaces with dash/underscore.
There are a few characters we have seen with a special meaning in HTML code: <, >, ".
We can't just type those everywhere since they mean start of tag, end of tag, attribute value. What if we want to have them appear on a page?
HTML character references are used to refer to characters we can't type.
Character references start with
and end with &
. e.g. ;
<
is the character reference for <
. This HTML:
7 < 10
… appears as:
7 < 10
Another example:
<p>I like the <code><abbr></code> element.</p>
When displayed:
I like the
<abbr>
element.
These are the characters that have special job in HTML, and it's worth remembering their references:
Character | Reference |
---|---|
< | < |
> | > |
& | & |
" | " |
' | ' |
There are also references for characters you just don't have on your keyboard.
See the Character Reference Chart.
e.g.
<p>Select File → Open. Olé!</p>
Becomes:
Select File → Open. Olé!
There are even more characters than these named references exist for. For those, you can use numeric character references.
Numeric references can refer to any character from the Unicode character set. See the Unicode Character Table.
Unicode is designed to represent all written languages: about 140k characters. This includes math symbols, emoji, etc.
Named references look like
.&name;
Numeric references look like
.Ӓ
If you need a numeric reference, have a look at a Unicode Character Table or Unicode Character Search.
For example, this uses the Thai baht symbol, which doesn't have a named reference:
<p>The exchange rate is ฿25 ≈ $1.</p>
The exchange rate is ฿25 ≈ $1.
Assuming your browser displays like mine, the baht symbol is in a different font because the font used in these slides doesn't contain that character.
The exchange rate is ฿25 ≈ $1.
This can be a problem with fonts on the web: do your users have the same fonts that support the same characters as you?
Emoji are also Unicode characters. You can copy-and-paste them, or use character references:
😉 👍 🍕 ๐
😉 👍 🍕 ๐
Some character combinations (in some fonts) have more beautiful combinations: ligatures.
flip fizz official tuft bรฆr
The same text without ligatures:
flip fizz official tuft baer
Ligatures are generally used automatically by the browser. They are also applied to emoji to create combination that can represent a huge variety of characters
.
For example, the skin tone modifiers (characters 127995 to 127999) with the waving hand (128075):
<p>👋 - 🏿 - 🏻</p> <p>👋🏿 - 👋🏻</p>
👋 - 🏿 - 🏻
👋🏿 - 👋🏻
The emoji ligatures
allow very flexible combinations: female astronaut with medium-light skin tone:
👩🏼‍🚀
👩🏼🚀
Also is used for flags, by encoding
the country code with the Regional Indicator Symbol Letters
. Here, the encoding for C, A (CA = Canada), D, E (DE = Deutschland = Germany):
🇨🇦 🇩🇪
🇨🇦 🇩🇪
Combining emoji doesn't seem to work everywhere. Here is a separate page of the emoji examples.
Screenshots: Firefox on Linux, Chrome on Android, Firefox on Windows 10, Chrome on Windows 10, Edge on Windows 10.
Typing these as character references isn't fun, but it's possible. It's easier to copy-and-paste from somewhere else if you can.
<p>Option 1: 🤙🏽. Option 2: ๐ค๐ฝ.</p>
Option 1: 🤙🏽. Option 2: ๐ค๐ฝ.