If you aren’t into programming language design, you can skip this.
For the rest of us, I want to talk enumerated types. It is a discussion I have with myself frequently.
Simple Modelling Challenge
Consider how you would model this decision at your local chip shop, in your favourite programming language.
Milkshakes can be chocolate, vanilla or strawberry – that’s all they sell. They can be small, medium or large.
How to Assess The Result?
Here are some questions for you:
Does your language ensure that the milkshake could only be those flavours. Will it permit someone to say a milkshake is blueberry flavoured?
Does your language support the concept that large is greater than medium? (Conversely, can you say that Vanilla and Strawberry do not support the greater-than relationship?)
Can you loop over the flavours?
Do you have to specify an internal integer representation? Or will your language assign values for you?
Can you learn that internal integer representation for use as a key? What about for use as serialisation (which requires you to be able to go the other way, too, from integer to enumerated value)?
Can you override the internal integer representation? e.g. to map large to its size in millilitres.
Is there a text representation available so you can easily print “Strawberry” rather than 2?
Some I Prepared Earlier
Here are what I believe the answers are for several classic languages – some have more than one way to implement this. [Source: Memory and Wikipedia]
Language (Technique) | Type-Checked | Order Available? | Order Mandatory? | Iterable? | Visible integer representation? | Serialisable? | Has Auto-Inc value | Has Overridable value | Has Text Representation |
---|---|---|---|---|---|---|---|---|---|
Pascal | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE3 | TRUE | FALSE | FALSE |
C (Enum) | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE6 | TRUE | TRUE | FALSE |
C++ (Enum) | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE6 | TRUE | TRUE | FALSE |
C/C++ (Const) | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE |
Java | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
Python (Dict) | Manual2 | TRUE5 | TRUE | TRUE | TRUE | FALSE4 | TRUE | TRUE | TRUE |
Python (Set) | Manual2 | FALSE | N/A | TRUE | FALSE | FALSE4 | N/A | FALSE | TRUE |
Python (Enum Class1) | Manual2 | TRUE | TRUE | TRUE? | TRUE | FALSE4 | FALSE | TRUE | FALSE |
SQL | TRUE? | TRUE | TRUE | FALSE? | TRUE? | TRUE | TRUE | FALSE | TRUE? |
Ada | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
C# | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE |
1 I frequently use this Enum class I found on the web. I frequently regret it, because I keep forgetting it doesn’t offer everything Ada did (in 1983!).
2 Python is dynamically typed. You can write, for example, assert flavour in milkshake_flavours
, but it won’t be checked for you, so you need to do it explicitly.
3 Standard Pascal doesn’t support it but several implementations, and derived languages, do.
4 It is possible to create back-mappings, but they are look-ups, not trivial type-coercions.
5 By comparing the values from the dictionary yourself: CUP_SIZE['large'] > CUP_SIZE['small']
Without manual sorting, iteration will not be in this order.
6 De-serialising a value known to be good can be done for free – a coercion in C, or a cast in C++. Detecting whether such a value is good is hard.
Conclusion
Sometimes, I miss Ada.
[Updated to include C#, separate C++ enums and indicate Java’s overridable nature, iterability and text representation, according to help from commenters. Thank you.]
Comment by Mr Rohan on July 24, 2010
The Java one is overridable – since Enum is effectively a class you can have whatever method you want to get whatever output you want .. I’ve used it previously to have 1 set of IDs from the DB, one set of IDs (being the std enum values), and another set of “translated” values …
Comment by Alastair on July 24, 2010
Julian, you undermine your credibility by lumping C and C++ together. They are different languages. And enumerations are one of the points of difference. Given:
enum Flavour { Chocolate, Strawberry, Vanilla };
enum Size { Small, Medium, Large };
int main()
{
enum Flavour mine = Chocolate;
enum Flavour yours = Medium;
return 0;
}
It compiles as a C program:
$ gcc -o enums enums.c
$ ./enums
$
But not as C++:
$ g++ -o enums enums.cpp
enums.cpp: In function 'int main()':
enums.cpp:7: error: cannot convert 'Size' to 'Flavour' in initialization
Comment by configurator on July 24, 2010
What about C#?
TRUE / TRUE / TRUE / TRUE / TRUE / TRUE / TRUE / TRUE / TRUE
Comment by Julian on July 24, 2010
Alastair,
Oops. I knew C++ had a different Enum type, and I originally had them as two rows, but they came out as identical, so I merged them.
I now see that I made a mistake in the early draft under the type-checking column, so I merged prematurely.
Will fix.
Comment by Julian on July 24, 2010
Rohan,
Are you saying that you had one enumerated type with three mappings? Or three unrelated enumerated types, with three different mapping techniques?
Was the mapping from the database affecting just the internal representation or also the members of the Enumerated type (I assume not: Java being statically typed and all.)
Interesting; it raises the question further of how far you can go before you have pushed beyond the meaning of enumerated type.
Comment by Julian on July 24, 2010
Configurator,
Thank you. Excluded purely from ignorance – I have never used C#.
I have updated according to the comments from above. Both Java (assuming iterability) and C# now are at the Ada level. So looks like I have just been choosing the wrong languages (for my favourite enumerated type behaviour) for the past couple of decades.
Comment by Alastair on July 24, 2010
I have a mild concern with your evaluation criteria. The auto-increment feature can, depending on your usage, come into conflict with the serialisable feature. If, for example, you add a new value in the middle of the list, you end up breaking your serialisation, because all subsequent values will have a different integer representation. So any language which claimed to provide both features would have to also provide stability of the integer representations, and I’m not aware of any that do.
Besides, I don’t really think C or C++’s enums satisfies the serialisation criteria anyway. Sure, an enumerated type can be automatically promoted to an integral type, but you can’t (generally) go the other way. As you correctly note, this a key requirement for serialisation. In C++ anyway, if you were deserialising, you’d have to first check whether the integral value was in fact valid for the enumeration, and then cast it explicitly. This sounds like a FALSE to me.
And speaking of which, Python’s type checking (or lack thereof) surely deserves a FALSE for the type checking criteria? Or possibly “N/A”?
Comment by Julian on July 24, 2010
Alastair,
Fair criticisms. Let me defend myself, while still admitting you are right.
One of the situations that causes me to think about enumerated types is when I have a text file (as opposed to, for example, a Python pickle file) with a representation of each of the enumerated instances – e.g. 3, 2 and 1 for the three milkshake sizes. When the only purpose of the enumerated type is to represent these persistent values, it is inviting to have the internal representation the same as in the file.
(There is a small voice in my head yelling at me that this is an inappropriate reason for overriding the compiler’s preferred representation, but I am going to plough on anyway.)
In any language, I can create a manual mapping function from the file representation to the enumerated values, but in some languages, that operation is “free” (e.g. a coercion or even a cast, but not a conversion or a lookup). I was trying to come up with a column header that indicated that concept, and “Serialisable?” was the best I could come up with. It’s not a great fit.
I agree using such a system for serialisation is fragile to additions to the enumerated type with auto-inc, but such concepts would generally not be mixed, because you would want to specify the representation for each new value explicitly.
Your comment about C++ deserialising is valid. I had to go and check. It can be done in C but not C++. At least the cast is “free”, though. I’ll update the chart.
It remains true that, in C, you need to manually check that the number is valid, which is non-trivial. But if you are sure it is okay, it’ll be coerced.
You are also right about Python having no type-checking. I’d sometimes miss that.
However, what Python does offer is an easy way to manually check if a parameter (for example) is, in fact, a member of the right enumerated type. That isn’t available at all in C. Explicit, and slow, type-checking doesn’t seem to be a good solution to me, but it’s better than nothing. I wanted to highlight that.
Comment by Julian on July 24, 2010
I have used what I learned here, to create a new Python class that better meets my needs. I won’t publish it until it has been used in anger.
Comment by configurator on July 25, 2010
“So looks like I have just been choosing the wrong languages (for my favourite enumerated type behaviour) for the past couple of decades”
Don’t worry – Java is (almost) never the right choice.
I strongly recommend C# – in its latest versions it has become quite a powerful language with many functional-style features (although it does have some ugly features). The BCL is reliable, well-designed and pretty comprehensive to complement that.
Comment by Sunny Kalsi on July 26, 2010
How does Java not have a text representation for enums? I thought it would give the name by default, and you could override the toString() function in the enum class. Java should be all trues. It’s iterable also (using Enum.values()).
Comment by Julian on July 27, 2010
Sunny,
I did more reading (my Java knowledge is both one-dimensional and dated) and you are right. You can too simply print the enumerated type to get it’s text representation. Thanks for confirming the iterability too.
I updated the chart again to make Java all TRUE. I wondered again how I managed to avoid for the past decade or two all the languages that have what I want in an enumerated type. 🙁
p.s. That chart really needs some style-sheet improvements to make it readable. Sorry!
Comment by John Y. on July 28, 2010
How? Easy. You picked languages that were better in other ways. Presumably, these languages were sufficiently better in those other ways, or conferred advantages even more important than “Julian-perfect” enumerated types. (Frankly, I found enumerated types cute when I discovered them in Pascal, and then soon realized they were one of the least critical language features for my taste and purposes. Clearly, many programmers and language designers agree with me on that point.)
Comment by configurator on August 10, 2010
Seems like Java’s enum ordinals are not overridable. You can add extra properties – but the ordinal value (and the only one used in enum.valueOf) doesn’t seem to be available.
Comment by John Y. on May 1, 2013
For anyone stumbling upon this post, be aware that Python is getting close to adding an enum module to its standard library (see PEP 435). Of course, if it happens, it would appear in the upcoming 3.x release of Python. The Python 2 series is feature-frozen at 2.7.
Regardless of when or if the above PEP is accepted, you can get substantially the same functionality from Barry Warsaw’s flufl.enum, which works on both Python 2.7 and Python 3.2+, according to the package documentation. (The PyPI entry lists 2.6 as well, but I am not sure if this information is up to date. Presumably older versions of the package did work with 2.6, and they’re still available.)
Comment by John Y. on May 13, 2013
Oy. Well, PEP 435 was indeed accepted and is now slated for inclusion in 3.4. However, the big discussion and hashing out of details has left it different enough from flufl.enum that if you really want flufl-style enums, you’ll still need to use flufl.enum.
Basically, there emerged a philosophical difference over whether enum values should be of a different type than the enum itself; e.g. should Color.red and Color.blue be of type Color, or should they be of some separate EnumMember type? The former view (adopted by PEP 435) has implications that severely restrict subclassing; the latter view (taken by flufl.enum) was ultimately found unattractive by Python’s creator.
Comment by Julian on May 13, 2013
I have been watching over the past few years, my opinion of Python 3.x change from “Wouldn’t use even if I could.” to “Can’t use; too many dependencies not available.” to “Hmmm… PIL is available? I think that is the last missing piece. Still no reason to upgrade” to “Ooh, that would be nice. I wish I could justify upgrading.”
This is definitely in the “that would be nice” category.
Comment by Julian on May 13, 2013
Which brings me to the migration question (and I am making no commitments to migrate any time soon):
The migration path that most attracts me is to one-by-one migrate low-level modules to Python 3 (with Python2to3 plus hand editing until unit-tests pass), and declaring the Python 3.x version to be the master, delete the original and regenerate it with Python 3to2, and checking the unit-tests pass.
Slowly the Python 3 version will be constructed while maintaining a Python 2 version working at all times. Eventually, there will be one code-base, directly running Py 3 and acting as source for an auto-generated Py2 version. At that point, production can migrate to Python 3, and the auto-generated version can be allowed to wither.
My question being: If I use Python 3.4 enum types, with Python 3to2 be able to convert it meaningfully into Python 2.7 code?
Comment by Alastair on October 12, 2013
Just for the record, C++11 has strongly-typed enumerations, which are similar to the weakly-typed ones but probably deserve a TRUE in the type-checked column.