OddThinking » The World of Upper- & Lower-Case Mappings

The World of Upper- & Lower-Case Mappings

Filed by: Julian on October 25th 2005

I’m no linguist – nor do I play one on TV, but it can be fun to have a dig around in their world. I’ve been doing a bit of research.

Enthnologue is an interesting database of human languages and worth a wander through.

However, it doesn’t contain what I was looking for – a list of human languages (well, more strictly, the scripts) that have both majuscules and minuscules – better known as “upper-case” and “lower-case” to the likes of me. Certainly the Germanic scripts do.

What I learnt from the Unicode FAQ was that:

Most scripts don’t have cases.
The mappings from upper-to-lower-case are not one-to-one.
- “For example, both a sigma and a final sigma upper-case to a capital sigma.” – Unicode FAQ
- Some mappings are locale-dependent. UPPER("i") != "I" in Turkish!
- Some lower-case Unicode glyphs can’t map to a single upper-case glyph. For example, the ﬂ ligature (U+FB02), when converted to upper-case, should end up taking two characters.

There are a couple of conclusions:

When dealing with non-English scripts, avoid assumptions like UPPER(x) == UPPER(LOWER(x)).
Relying on a computer to change the case of a character is going to be a non-trivial operation.

(Why am I going on about typography and linguistics? Bear with me. I’m building up a framework for an argument. I’ll come back to this later on.)

3 CommentsCategories: Geek,S/W Dev
Tags: case, internationalisation, linguistics, typography

Comments

RSS feed for comments on this post. TrackBack URI

Comment by alastair on October 25, 2005

I find Unicode fascinating just for the window it provides onto the range of human written expression. As you have seen, even the most basic English assumptions (like the characteristics of upper- and lower-case letters) do not hold for the rest of the non English-speaking world.
Comment by Alan Green on October 26, 2005

Still bearing with you, Julian. But if you don’t come to the point soon, my head, unsatiated curiosity will develop into intellectual vacuum, and my head will implode.

You must be logged in to post a comment.

Web Mentions

Who's been talking about this post?

OddThinking » The Case for Case-Preserving, Case-Insensitivity