Introduction
Let’s spend a moment exploring the world of case-sensitivity. I am, of course, not talking about the skills of a baggage-handler. I am talking about the way software deals with the differences between upper-case and lower-case characters.
Particularly relevant examples include:
- identifiers in programming languages.
- filenames in file systems.
- URLs in web-servers.
- search terms in search engines.
There is a logical hierarchy of the way software can treat case. I will talk through each one of these below.
- Case Sensitivity
- Case Insensitivity
- Case Preserving
- Non-Case Preserving
- Fixed Style
- Dictionary Definition
- Last Usage
- Asymmetric Case Sensitivity
(Why am I going on about the different types of case-sensitivity with a neutral point-of-view? Bear with me. I’m building up a framework for an argument. I’ll come back to this later on.)
Case Sensitivity
Many software systems are case-sensitive. The character string “foo
” and the character string “Foo
” are not considered to be equivalent.
Examples include:
- Programming languages: C/C++, Java, Python, PHP, Modula-3, Perl
- File Systems: UFS
- Web Servers: Apache
- Search engines: grep
Case Insensitivity
Conversely, many software systems are case-insensitive. The character string “foo
” and the character string “Foo
” are considered to be equivalent.
Examples include:
- Programming languages: LISP, BASIC, Ada, Eiffel, Pascal
- File Systems: NTFS, FATxx
- Web Servers: IIS
- Search engines: Most search engines including Google, Altavista, Yahoo, etc.
In some circumstances – particularly with search engines or the execution of scripts – this suffices to characterise the processing. However, in many circumstances, the software processing the text string has the opportunity to change the case of the text string into a canonical form. The following sections outline the options.
Case Preserving
If the case of the original text is not modified, the software is said to be case preserving.
Examples include:
Non-Case Preserving
If the software modifies the case of the original text to put it into a canonical form, it is non-case preserving.
Such software can be further classified into the following sub-categories.
Fixed Style
If the software always converts the case to a fixed style (whether it be upper-case, lower-case, title-case, sentence-case, etc.) then I refer to it here as using a fixed style canonical form.
Examples include:
- File Systems: FAT12 and FAT16 (Upper-case)
Dictionary Definition
In some situations, each identified or word may have an official declaration – e.g. identifiers in Ada, proper nouns in spelling dictionaries. Each reference to the object can be transformed to match the declaration.
Last Use
This is a special category to describe an early IDE for Microsoft’s BASIC family. (I believe it was QuickBASIC, but it may have been Visual Basic 1.0 for DOS, or even QBASIC.) When an identifier was typed in with an unexpected capitalisation, all previous references to the identifier would be modified to match the most recent capitalisation.
Asymmetric Case Sensitivity
Another way of dealing with case, especially in searches, is to use what I have dubbed “asymmetric case-sensitivity”.
Under this system, if a user searched for “foo
” it would match “foo
“, “Foo
” or “FOO
“- it is case-insensitive.
However, if the user searched for “Foo
” – the user has gone to the effort of specifying that some letters are upper-case, then it will only match “Foo
” and not “foo
” or “FOO
“.
This system suffers from theoretical limits (you can’t do a case-sensitive search for “foo
“) and user-friendliness issues (it is difficult to work out that this will be the behaviour), but in practice, for an experienced user, it can prove to be quite natural.
Examples include:
- Emacs
Miscellaneous Pedantic Notes:
- Many of the software examples here have modes, options or variants, or various quirks under certain circumstances, that alter their handling of case. The classifications here described their typical or default behaviour.
- I use the terms case-smashing as synonymous with non-case preserving. I use the term case-folding as synonymous with case insensitivity. However, the definitions in the normally definitive Jargon File (viz fold case and smash case) are somewhat conflated, so I have avoided them here.
- Strictly, it is the Windows operating system, not the file system, that provides NTFS its case-insensitivity feature.
- Further Reading:
- Wikipedia articles on Case sensitivity, Case Sensitivity, Case preservation and Comparison of File Systems
- Merd’s Language Comparison
Comment by Sunny Kalsi on October 26, 2005
skills of a baggage-handlers indeed!.
Comment by Sunny Kalsi on October 26, 2005
It quite suffers from theoretically limits.
[Editor’s note: Sunny is referring to a typo, which I have now corrected. Thanks, Sunny.]
Comment by alastair on October 26, 2005
In “a” text editor? How about the text editor? Emacs uses the “Asymmetric” case sensitivity in its search commands.
[Editor’s note: Alastair is referring to a section of the document which has since been updated.]
To your list of case-insensitive file systems, add HFS Plus (used on MacOS 8+). Interestingly, they recently created a variant of it called HFSX which can be case-insensitive (depending on configuration). It is case-preserving.
Comment by Julian on October 26, 2005
Alastair,
Emacs! Of course! I was afraid it might have actually been Exco! I have updated the article to fill in this info.
Re: HFS Plus
Ahh! That explains why, during my research, I was reading conflicting rumours about Mac OS and case-sensitivity. Article updated. Thanks.
Comment by Alan Green on October 27, 2005
The vim name for the asymmetric case sensitivity when finding is smartcase. Being more user-friendly than emacs, vim does not enable this option by default.
Comment by Alan Green on October 27, 2005
No, I was wrong. Vim’s smartcase is something else. You think I would have learnt my lesson about posting before my first cup of coffee.
Comment by Julian on October 27, 2005
But Alan, your link to the vim document perfectly described smartcase in the same way as asymmetric case-sensitivity. Perhaps you shouldn’t post after your fourth cup of coffee! 🙂
By the way, I was willing to take on the case-sensitive versus case-insensitive controversy, but even I would dare not tread into the “vi versus emacs” debate!