While I was writing the footnotes plugin for b2evolution, I discovered a loophole, or rather, ambiguity, in the HTML 4.01 specification which causes a lot, if not all, browsers to behave incorrectly. It has something to do with the <base> tag and how self-referencing anchor/URI fragments are resolved.
What is the base tag? According to the W3’s HTML 4.01 specification here, the base tag “specifies an absolute URI that acts as the base URI for resolving relative URIs.” So if you have index.html in www.yourserver.com and index.html has his:
HTML:
image.jpg would get resolved to www.yahoo.com/image.jpg instead of www.yourserver.com/image.jpg. This is how the “Save webpage” feature of html browsers manipulate relative URIs
Now what about self-referencing anchors/fragments? Well, everyone knows anchors. They’re basically go like. <a name="anchor_a">foo</a> which could be referenced on the same page using <a href="#anchor_a">bar</a>. The text after the crosshatch (#) sign is called the fragment identifier the entire thing is called, logically, a same-document reference. If you specify a URI before the fragment identifier (e.g. <a href="www.something.com/index.html#anchor_a">), instead of the link taking you to the anchor on the same page, it will take you to the anchor of that URI (another page). More information can be found in the RFC 2396.
Now we know that if a fragment doesn’t contain any URI before it, it is meant to refer to the same document. Now the problem: where should those same-document reference point to when the base is changed using the base tag? Should they still point to the document or should they point to the anchor of the document located at the base URI or should they do something else? Keep in mind how same-document references are being used.
Follow up:
Let’s take a look again at the HTML 4.01 specs section 12.4:
Relative URIs [bold mine] are resolved according to a base URI, which may come from a variety of sources. The BASE element allows authors to specify a document’s base URI explicitly.
Also, under section 12.2, the HTML 4.01 specs defines examples of the href attribute of the a element:
- An absolute URI: http://www.mycompany.com/one.html#anchor-one
- A relative URI: ./one.html#anchor-one or one.html#anchor-one
- When the link is defined in the same document [bold mine]: #anchor-one
Note that Relative URIs are the ones that are being redirected. Which brings us to the question: Are same-document references part of the relative URI resolution?
If we look again at the RFC 2396, and take a look at section G.4 which explains some modifications/clarifications on RFC 1808:
RFC 1808 (Section 4) defined an empty URL reference (a reference containing nothing aside from the fragment identifier) as being a reference to the base URL. Unfortunately, that definition could be interpreted, upon selection of such a reference, as a new retrieval action on that resource [italics mine]. Since the normal intent of such references is for the user agent to change its view of the current document [bold mine] to the beginning of the specified fragment within that document [bold mine], not to make an additional request of the resource, a description of how to correctly interpret an empty reference has been added in Section 4.
Unfortunately the document referenced in the HTML 4.01 specs is RFC 1808 (check Section 12.4.1) and not the RFC 2396 which would lead people, as RFC 2396 puts it, to interpret it the wrong way. To make it worse, the HTML 4.01 didn’t make any clarifications and/or examples on how same-document references should be treated, thus, the ambiguity.
This quote also tells us how same-document references are used: don’t we use them to jump between sections of the original page?
The RFC 2396 also explicitly stated the behavior of same-document references in section 4.2:
A URI reference that does not contain a URI is a reference to the current document. In other words, an empty URI reference within a document is interpreted as a reference to the start of that document, and a reference containing only a fragment identifier is a reference to the identified fragment of that document. Traversal of such a reference should not result in an additional retrieval action.
So, in conclusion, we can say that same-document references in the <a> tag are not part of the relative URI resolution.
I tested in Firefox 2.0.x, Opera 9.2x and Internet Explorer 7 on how they treat same-document references in the presence of a base tag. Result: they do the wrong thing. That is, they direct you, not to the anchor of the same page, but to the anchor of the resolved URI! That is, if you have this code:
HTML:
| <head> | |
| … | |
| <base href="www.something.com" /> | |
| </head> | |
| <body> | |
| … | |
| <a name="foo">FUBAR</a> | |
| … | |
| <a href="#foo">bar</a> |
Clicking on “bar” won’t lead you to FUBAR, instead it will take you to www.something.com/#foo (you’re lucky if this happens to be on the same page)!
So how do we fix this? We explicitly state the URI, so if your documents resides in www.something.com/articles/2007/01/12/1.html, we do something like this:
HTML:
| <head> | |
| … | |
| <base href="www.something.com" /> | |
| </head> | |
| <body> | |
| … | |
| <a name="foo">FUBAR</a> | |
| … | |
| <a href="http://www.something.com/articles/2007/01/12/1.html#foo">bar</a> |
However, this would lead the browser into retrieving the same page again (i.e. a page refresh)! Which may break some effects (MooTool’s Smooth Scroll for example).
But look at the brighter side: At least they (the browsers) all behave in the same wrong way.
This has been discussed in the W3C’s Mailing list.
RFC 2396
W3C’s HTML 4.01 Specification
RFC 3986
The “debate” at the Mozilla Bugzilla on whether to fix this or not and how to fix this in Firefox.
Trackback URL (right click and copy shortcut/link location)