What is ..., and where can I learn more about it?
Where can I find a list of all the current HTML tags?
How can I show HTML examples without them being interpreted as part of my document?
How do I put a ... character in my HTML?
Should I put quotes around attribute values?
How can I include comments in HTML?
How can I avoid using the whole URL?
Should I end my URLs with a slash?
How can I check for errors?
What is a DOCTYPE? Which one do I use?
HTML stands for Hypertext Markup Language, and it's the language that most Webpages are written in (though there are others that we'll explore later on). HTML defines the structure of a Webpage, whereas additional languages are used to style the page.
W3C's HTML 4.01 Recommendation URL: http://www.w3.org/TR/html4/
WDG's HTML 4.0 Reference URL: http://www.htmlhelp.com/reference/html40/
Jukka Korpela's "Getting Started With HTML" URL: http://www.cs.tut.fi/~jkorpela/html-primer.html
CSS stands for Cascading Style Sheet. CSS is a language that used for styling HTML documents. If you want to change your Webpage's fonts, colors, or layout, you'll use CSS to do it.
W3C's CSS Level 2 Recommendation URL: http://www.w3.org/TR/CSS2/
WDG's Guide to Cascading Style Sheets URL: http://www.htmlhelp.com/reference/css/
The HTML Writers Guild's CSS FAQ URL: http://www.hwg.org/resources/faqs/cssFAQ.html
SGML stands for Standard Generalized Markup Language, and it's a language used to define the syntax of markup languages. HTML is an SGML application (a markup language defined in SGML).
W3C's "On SGML and HTML" URL: http://www.w3.org/TR/html401/intro/sgmltut.html
XML stands for Extensible Markup Language, which is another language used to define the syntax of markup languages. XML is a subset of SGML, and is designed to represent arbitrary structured data in a text format.
XHTML stands for Extensible Hypertext Markup Language, which is a reformulation of HTML as an XML application. Because it is an XML application, the syntax requirements of XHTML are more restrictive than those of HTML. Otherwise, XHTML 1.0 mirrors the functionality of HTML 4.01.
W3C's XHTML 1.0 Recommendation URL: http://www.w3.org/TR/xhtml1/
SSI stands for Server-Side Include. SSIs allow various directives (e.g., to include the content of another file) to be embedded within Web documents. The Web server processes SSI directives each time a document that uses SSI is retrieved. Documents that use SSI are often identified with a .shtml filename extension, but there is no "SHTML" language as such. Implementation details vary among Web servers; consult your server documentation for details.
SSI Documentation for the Apache server URL: http://www.apache.org/docs/mod/mod_include.html
CGI stands for Common Gateway Interface, which is a standard interface between external programs and Web servers. Unlike static HTML documents, CGI programs can produce dynamic information based on form data submitted by the user, on information in a database, or on any other data available to the program.
WDG's CGI Programming FAQ URL: http://www.htmlhelp.com/faq/cgifaq.html.
The current recommendation is XHTML 1.0, which is a reformulation of HTML 4.01 as an XML 1.0 application. HTML 4.01 is an update with minor corrections to HTML 4.0. HTML 4 extends HTML 3.2 to include support for frames, internationalization, style sheets, advanced tables, and more. The new markup introduced by HTML 4 is not well supported by current browsers, but much of it can be used safely in non-supporting browsers.
Recommended materials on HTML 4:
W3C's official HTML 4.01 Recommendation: http://www.w3.org/TR/html4/
http://www.htmlhelp.com/reference/html40/: a handy HTML 4.0 reference, with notes on using poorly supported features safely
Some materials on browser-specific versions of HTML:
http://www.blooberry.com/indexdot/html/supportkey/a.htm -- Brian Wilson's checklist of browser support for HTML tags and attributes
Within the HTML example, first replace the "&" character with "&" everywhere it occurs. Then replace the "<" character with "<" and the ">" character with ">" in the same way.
The next Q&A addresses the more general issue of representing arbitrary characters in HTML documents.
The answer to the previous question addressed the special case of the less-than (<), greater-than (>), and ampersand (&) characters. In general, the safest way to do HTML is in (7-bit) US-ASCII, and expressing characters from the upper half of the 8-bit code by using HTML entities.
Working with 8-bit characters can also be successful in many practical situations: Unix and MS-Windows (using Latin-1), and also Macs (with some reservations).
The available characters are those in ISO-8859-1, listed at URL: http://www.htmlhelp.com/reference/charset/. On the Web, these are the only characters widely supported. In particular, characters 128 through 159 as used in MS Windows are not part of the ISO-8859-1 code set and will not be displayed as Windows users expect. This includes the em dash, en dash, curly quotes, bullet, and trademark symbol; neither the actual character nor &#nnn; is correct. (See the last paragraph of this answer for more about those characters.)
On platforms whose own character code isn't ISO-8859-1, such as MS DOS, Macs, there may be problems: you'd have to use text transfer methods that convert between the platform's own code and ISO-8859-1 (e.g Fetch for the Mac), or convert separately (e.g GNU recode). Using 7-bit ASCII with entities avoids those problems, and this FAQ is too small to cover other possibilities in detail. Mac users - see the notes at the above URL.
If you run a Web server (httpd) on a platform whose own character code isn't ISO-8859-1, such as a Mac, or IBM mainframe, it's the job of the server to convert text documents into ISO-8859-1 code when sending them to the network.
If you want to use characters outside of the ISO-8859-1 repertoire, you must use HTML 4 rather than HTML 3.2. See the HTML 4.01 Recommendation at URL: http://www.w3.org/TR/html4/.
It is never wrong to quote attribute values, and many people recommend quoting all attribute values even when the quotation marks are technically optional. XHTML 1.0 requires all attribute values to be quoted. Like previous HTML specifications, HTML 4 allows attribute values to remain unquoted in many circumstances (e.g., when the value contains only letters and digits). See URL: http://www.w3.org/TR/html4/intro/sgmltut.html#attributes for the exact rules.
Be careful when your attribute value includes double quotes, for instance when you want ALT text like "the "King of Comedy" takes a bow" for an image. Humans can parse that to know where the quoted material ends, but browsers can't. You have to code the attribute value specially so that the first interior quote doesn't terminate the value prematurely. There are two main techniques:
Escape any quotes inside the value with " so you don't terminate the value prematurely: ALT="the "King of Comedy" takes a bow". (" is not part of the formal HTML 3.2 spec, though most current browsers support it.)
Use single quotes to enclose the attribute value: ALT='the "King of Comedy" takes a bow'.
Both these methods are correct according to the spec and are supported by current browsers, but both were poorly supported in some earlier browsers. The only truly safe advice is to rewrite the text so that the attribute value need not contain quotes, or to change the interior double quotes to single quotes, like this: ALT="the 'King of Comedy' takes a bow".
A comment declaration starts with "<!--", followed by one or more comments, followed by "-->". A comment starts and ends with "-->", and does not contain any occurrence of "--" between the beginning and ending pairs. This means that the following are all legal HTML comments:
<!-- Hello -->
<!-- Hello -- -- Hello-->
<!------ Hello -->
But some browsers do not support the full syntax, so we recommend you follow this simple rule to compose valid and accepted comments:
An HTML comment begins with "<!--", ends with "-->" and does not contain "--" or ">" anywhere in the comment.
See URL: http://www.htmlhelp.com/reference/wilbur/misc/comment.html for a more complete discussion.
The URL structure defines a hierarchy similar to a file system's hierarchy of subdirectories or folders. The segments of a URL are separated by slash characters ("/"). When navigating the URL hierarchy, the final segment of the URL (i.e., everything after the final slash) is similar to a file in a file system. The other segments of the URL are similar to the subdirectories and folders in a file system.
A relative URL omits some of the information needed to locate the referenced document. The omitted information is assumed to be the same as for the base document that contains the relative URL. This reduces the length of the URLs needed to refer to related documents, and allows document trees to be accessed via multiple access schemes (e.g., "file", "http", and "ftp") or to be moved without changing any of the embedded URLs in those documents.
Before the browser can use a relative URL, it must resolve the relative URL to produce an absolute URL. If the relative URL begins with a double slash (e.g., //www.htmlhelp.com/faq/html/), then it will inherit only the base URL's scheme. If the relative URL begins with a single slash (e.g., /faq/html/), then it will inherit the base URL's scheme and network location.
If the relative URL does not begin with a slash (e.g., all.html , ./all.html or ../html/), then it has a relative path and is resolved as follows.
The browser strips everything after the last slash in the base document's URL and appends the relative URL to the result.
Each "." segment is deleted (e.g., ./all.html is the same as all.html, and ./ refers to the current "directory" level in the URL hierarchy).
Each ".." segment moves up one level in the URL hierarchy; the ".." segment is removed, along with the segment that precedes it (e.g., foo/../all.html is the same as all.html, and ../ refers to the parent "directory" level in the URL hierarchy).
Some examples may help make this clear. If the base document is <URL:http://www.htmlhelp.com/faq/html/basics.html>, then all.html and ./all.html refer to <URL:http://www.htmlhelp.com/faq/html/all.html>. ./ refers to <URL:http://www.htmlhelp.com/faq/html/>. ../ refers to <URL:http://www.htmlhelp.com/faq/>. ../cgifaq.html refers to <URL:http://www.htmlhelp.com/faq/cgifaq.html>. ../../reference/ refers to <URL:http://www.htmlhelp.com/reference/>.
Please note that the browser resolves relative URLs, not the server. The server sees only the resulting absolute URL. Also, relative URLs navigate the URL hierarchy. The relationship (if any) between the URL hierarchy and the server's filesystem hierarchy is irrelevant.
For a full discussion of the proper form of URLs, see URL: http://www.w3.org/Addressing/.
The URL structure defines a hierarchy similar to a filesystem's hierarchy of subdirectories or folders. The segments of a URL are separated by slash characters ("/"). When navigating the URL hierarchy, the final segment of the URL (i.e., everything after the final slash) is similar to a file in a filesystem. The other segments of the URL are similar to the subdirectories and folders in a filesystem.
When resolving relative URLs (see the answer to the previous question), the browser's first step is to strip everything after the last slash in the URL of the current document. If the current document's URL ends with a slash, then the final segment (the "file") of the URL is null. If you remove the final slash, then the final segment of the URL is no longer null; it is whatever follows the final remaining slash in the URL. Removing the slash changes the URL; the modified URL refers to a different document and relative URLs will resolve differently.
For example, the final segment of the URL http://www.htmlhelp.com/faq/html/ is empty; there is nothing after the final slash. In this document, the relative URL all.html resolves to http://www.htmlhelp.com/faq/html/all.html (an existing document). If the final slash is omitted, then the final segment of the modified URL http://www.htmlhelp.com/faq/html is "html". In this (nonexistent) document, the relative URL all.html would resolve to http://www.htmlhelp.com/faq/all.html (another nonexistent document).
When they receive a request that is missing its final slash, web servers cannot ignore the missing slash and just send the document anyway. Doing so would break any relative URLs in the document. Normally, servers are configured to send a redirection message when they receive such a request. In response to the redirection message, the browser requests the correct URL, and then the server sends the requested document. (By the way, the browser does not and cannot correct the URL on its own; only the server can determine whether the URL is missing its final slash.)
This error-correction process means that URLs without their final slash will still work. However, this process wastes time and network resources. If you include the final slash when it is appropriate, then browsers won't need to send a second request to the server.
The exception is when you refer to a URL with just a hostname (e.g., http://www.htmlhelp.com). In this case, the browser will assume that you want the main index ("/") from the server, and you do not have to include the final slash. However, many regard it as good style to include it anyway.
For a full discussion of the proper form of URLs, see URL: http://www.w3.org/Addressing/.
Various software is available to find errors in your web documents automatically. HTML validators are programs that check HTML documents against a formal definition of HTML syntax and then output a list of errors. Validation is important to give the best chance of correctness on unknown browsers (both existing browsers that you haven't seen and future browsers that haven't been written yet).
HTML linters (checkers) are also useful. These programs check documents for specific portability problems, including some caused by invalid markup and others caused by common browser bugs. Linters may pass some invalid documents, and they may fail some valid ones.
All validators are functionally equivalent; while they may have different reporting styles, they will find the same errors given identical input. Different linters are programmed to look for different problems, so their reports will vary significantly from each other. Also, some programs that are called validators (e.g. the "CSE HTML Validator") are really linters/checkers. They are still useful, but they should not be confused with real HTML validators.
When checking a site for errors for the first time, it is often useful to identify common problems that occur repeatedly in your markup. Fix these problems everywhere they occur (with an automated process if possible), and then go back to identify and fix the remaining problems.
While checking for errors in the HTML, it is also a good idea to check for hypertext links which are no longer valid. There are several link checkers available for various platforms which will follow all links on a site and return a list of the ones which are non-functioning.
You can find a list of validators, linters, and link checkers at URL: http://www.htmlhelp.com/links/validators.htm. Especially recommended is the use of an SGML-based validator such as the WDG HTML Validator URL: http://www.htmlhelp.com/tools/validator/ or W3C HTML Validation Service URL: http://validator.w3.org/.
According to HTML standards, each HTML document begins with a DOCTYPE declaration that specifies which version of HTML the document uses. The DOCTYPE declaration is useful primarily to SGML-based tools like HTML validators, which must know which version of HTML to use in checking the document's syntax. Browsers generally ignore DOCTYPE declarations.
See URL: http://www.htmlhelp.com/tools/validator/doctype.html for information on choosing an appropriate DOCTYPE declaration.
Note that the public identifier section of the DOCTYPE declaration is case sensitive. Some versions of Netscape Composer are known to insert the lower-case "-//w3c//dtd html 4.0 transitional//en", rather than the correct mixed-case "-//W3C//DTD HTML 4.0 Transitional//EN".