Access me... #4

...display the text right — right..?

plain text…�

presentation of unrecognized word, as question-marks

You've prob​ably seen web pages where parts of the text is presented as question-marks or something else that is difficult to make out. Text turned into lines of ugly glyphs, so the term "plain text" becomes a bit ridiculous. Someone has made a mistake, but I'm not sure they will admit responsibility for it.

I see these question-marks (�) all the time on Norwegian web sites. Most often created by someone "who know exactly what they are doing". Yes, they probably do, but I don't think they know too much about proper encoding for the world wide web.

I see that some use their knowledge from print, and expect everyone around the world to use the same encoding as they themselves have learned to use for their own newspapers and brochures and so on. The results pops up on web pages around, so some obviously still see the web as an extension to print. It isn't, and I'm glad this approach isn't mine anymore.

encoding…?

I've spent half my life encoded text and code for and by the help of some software. That's more than 25 years with these bits and bytes, from "machine-code" to "high level". My mind works on bit-switch level, so I think it's about time I figure out what I've wasted all this time on.

I see these ISO-8859-1 and Windows-1252 and US-ASCII around in meta-tags in web pages, and I know the basics: one byte give us 256 different letters and glyphs, where the first 128 (7 bits) are (more or less) basic US-ASCII-code, and the last 128 (the 8. bit) are used in a kind of system for different letters and glyphs needed in other languages than the basic US-EN.

There is one "map" for each major language, so the encoded text, or whatever is "mapped", can be presented as it is supposed to. So all we have to do now is to tell our visitor what "map" to use. Guess that's what those meta-tags are used for, if they exist in a page. Whitout them it's anybody's guess what language-"map" their browser should use.

Oh, but that's easy, isn't it? Let's just use the right one for the language. My fellow Norwegians use whatever suits them, or nothing at all. Visitors just have to change encoding (or rather decoding) till they hit the right one. And then they'll just have to change it again and again when they visits other sites. They can even leave the guesswork to their browser—it may just get it right.

No guesswork needed here…

This site doesn't use any language-based encoding and 8 bits "maps". The text comes out right anyway—most of the time. The simple reason for that is that we encode our pages without any of these 8 bits "maps". We use the number of bytes that's needed, and convert everything that isn't basic US-EN into numeric entities based on Unicode. The result is UTF-8, and it has a much larger "map" than any 8-bit encoding. In fact: it's as large as needed, no matter what...

Problem solved, and has been for a long time. I tell the browsers that this page use encoding UTF-8, and make sure my pages are encoded in accordance to this protocol. New browsers should be able to interpret UTF-8, and my æ, ø and å will at least look like they are supposed to, even if many visitors won't have any idea about how these strange æ, ø and å are pronounced in my native language (xml:lang="no"). But maybe their browser can handle that too, if they so wish?

UTF-8 doesn't throw in any surprises for me regarding the low-numbered entites, so I don't spend much extra time with encoding or entities. It's all handled by software, so why doesn't all web pages come with UTF-8 encoding? Well, I guess the people behind them haven't caught up on Unicode yet.

That's right, I can solve my own minor problems with encoding by using entities. Most Western languages can be "secured" this way because we don't have many "extra" characters. Most pages that fails are using plain English, but none of the extremely few "extras" are taken properly care of. Examples: broken page.

no more excuses…

These question-marks got to go. No need for them unless some visitor explicitly force them on web pages. Web developers who force them on their visitors are not ready for the world wide web.

It's slightly unclear to me how different browsers handle my <?xml version="1.0" encoding="utf-8"?>. Firefox make direct use of it, while Opera and IE/win don't seem to notice it. Doesn't really matter since UTF-8 covers awailable encodings like a multibyte mirror. Thus the result looks alright as long as I do it right at my end—in most new browsers at least.

Found this page for Developers about Unicode. If you develop, design, or carve out web pages, and use any other encoding than Unicode UTF-8 — or maybe no encoding at all, then you should read it and catch up.

accessible text…

This is as much about accessibility as anything I can write about. Think I'll just keep telling HTML Tidy that numeric-entities: true, so my pages gets proper numeric Unicode entities wherever they're needed. If someone like to circle around this which-encoding issue, they may just do so. I think I have solved it at my end.

It does take me slightly more than one second to encode each page properly, so it may be regarded as too time-consuming by some. Guess I can live with that, as my (x)html code gets a check-up at the same time. Valid web pages are usually more accessible, and it may also help that visitors can read them.

You may go up to access me #1.
You may also look at tips and browser options.

sincerely  georg; sign

Hageland 04.mar.2005
last rev: 20.apr.2005

Access me...


UTF-8 is accessible...
...it's even universal.

Section

  • introduction
  • Table of Content

About

  • this is PTL web-design
  • CSS sledgehammer
  • accessibility
  • more about access
  • Print enhanced page
  • Projection enhanced
  • Small Screen enhanced page
  • validity of xhtml and CSS
  • html tidy
  • Opera and me
  • Firefox vs. IE
the usual
  • the author
  • Copyright
the unusual
  • Molly speaks up
the additional
  • Examples
  • Demo pages


Two good friends - Lynx and Opera.
They both provide excellent access to the world wide web.


Dear web design experts:

Don't tell me that nobody use these options...
...I'm not a 'nobody'...
Georg

Dear visitor:

So, I can't make Internet Explorer perform as well as any good browser.
Internet Explorer can't perform as any browser.
Georg

You're completely right.
It didn't help one bit that I wrote the word "accessibility" one more time.
Georg


Resources:


HTML Tidy

about…
…2005