Hmm, they seem to have chosen to avoid names to the choices in the union, joining C++ variants and (sort of) TypeScript unions: unions are effectively just defined by a collection of types.
Other languages have unions with named choices, where each name selects a type and the types are not necessarily all different. Rust, Haskell, Lean4, and even plain C unions are in this category (although plain C unions are not discriminated at all, so they’re not nearly as convenient).
Hi there! One of the C# language designers here, working on unions. We're interesting in both forms! We decided to go with this first as felt there was the most value here, and we could build the named form on top of this. In no way are we thinking the feature is done in C#15. But it's part of our ongoing evolution.
If you're interested, i can point you to specs i'm writing that address the area you care about :)
I think it would be a considerable improvement to allow duck-typing over the top. implicitly defined interface that includes exact member matches, something like that.
Hi! Love to see that C# language designers are here on HN :)
Just wanted to add just another opinion on a few things.
I think many people already mentioned it, but I also don't feel to good about non-boxed unions not being the default. I'd personally like the path of least resistance to lead to not boxing. Having to opt-in like the current preview shows it looks like a PITA that I'd quickly become tired of.
Also, ad-hoc union types could be really nice. At least in Python those are really nice, stuff like def foo(x: str | int) is just very nice. If I had to first give this union type a name I'd enjoy it way less.
But I'm aware that you are trying your best to find a good trade-off and I'm sure I don't know all the implications of the things I wish you'd do. But I just wanted to mention those to have another data point you can weigh into your decision process.
> Having to opt-in like the current preview shows it looks like a PITA that I'd quickly become tired of.
My belief is that we will have a union struct just like we have record and record struct. Where you can simply say you want value-type, non-boxing, behavior by adding the struct keyword. This feels very nice to me, and in line with how we've treated other similar types. You'll only have to state this on the decl point, so for each union type it's one and done.
> I think many people already mentioned it, but I also don't feel to good about non-boxed unions not being the default. I'd personally like the path of least resistance to lead to not boxing. Having to opt-in like the current preview shows it looks like a PITA that I'd quickly become tired of.
The problem is that the only safe way for the compiler to generate non-boxed unions would require non-overlapping fields for most value types.
Specifically the CLR has a hard rule that it must know with certainty where all managed pointers are at all times, so that the GC can update them if it moves the referenced object. This means you can only overlap value types if the locations of all managed pointers line up perfectly. So sure, you can safely overlap "unmanaged" structs (those that recursively don't contain any managed pointers), but even for those, you need to know the size of the largest one.
The big problem with the compiler doing any attempt to overlap value types is that if the value types as defined at compile time may not match the definitions at runtime, especially for types defined in another assembly. A new library version can add more fields. This may mean one unmanaged struct has become too big to fit in the field, or that two types that were previously overlap compatible are not anymore.
Making the C# compiler jump though a bunch of hoops to try to determine if overlapping is safe and even then leaving room for an updated library at runtime to crash the whole things means that the compiler will probably never even try. I guess the primitive numeric types could be special cased, as their size is known and will never change.
Again, very rough. We go round and round on things. Loving to decompose, debate and determine how we want to tackle all these large and interesting areas :)
Careful not to mix unions with sum types, though. The key distinction is that the latter are disjunct sets, even if you "sum" together the same type twice, you can always tell which "way" you went.
An example that may show the difference: if you have a language with nullable types, then you basically have a language with union types like String|Null, where the Null type has a single value called null and String can not be null.
Now if you pass this around a function that itself may return null, then your type coalesces to String|Null still (you still get a nullable string, there is no doubly nullable). This is not true for Maybe/Option whatever you call types, where Some(None) (or Optional.of(Optional.empty())) is different from None only.
Rich Hickey once made a case that sort of became controversial in some FP circles, that the former can sometimes be preferred (e.g. at public API surfaces), as in for a parameter you take a non-nullable String but for returns you return a String|Null. In this case you can have an API-compatible change widening the input parameters' type, or restricting the return type - meanwhile with sum types you would have to do a refactor because Maybe String is not API compatible with String.
> Careful not to mix unions with sum types, though. The key distinction is that the latter are disjunct sets, even if you "sum" together the same type twice, you can always tell which "way" you went.
This is a really good point. I'd love to be able to have a sum type of two strings ("escaped" and "unescaped"); or any two kinds of the same type really, to model two kinds of the same type where one has already passed some sort of validation and the other one hasn't.
Edit to add: I figure what I want is for enums to be extended such that different branches are able to carry different properties.
Edit again (I should learn to think things through before posting. sorry): I suppose it can be faked using a union of different wrapper types, and in fact it might be the best way to do it so then methods can take just one of the types in arguments and maybe even provide different overloads.
Not sure, but I think C++ actually does allow std::variant with multiple choices using the same type. You might not be able to distinguish between them by type (using get()), but you can by position (get<0>(), get<1>(), ...)
I haven’t tried this, and I don’t intend to, because visitors and similar won’t work (how could they?) and I don’t want to have to think about which is choice 2 and which is choice 7.
The C and Rust union types are extremely sharp blades, enough so that I expect the average Rust beginner doesn't even know Rust has unions (and I assume you were thinking of Rust's enum not union)
I've seen exactly one Rust type which is actually a union, and it's a pretty good justification for the existence of this feature, but one isn't really enough. That type is MaybeUninit which is a union of a T and the empty tuple. Very, very, clever and valuable, but I didn't run into any similarly good uses outside that.
Unions can be used as a somewhat safer (not safe by any means but safer), more flexible, and less error-prone form of transmute. Notably you can use unions to transmute between a large type and a smaller type.
That is essentially the motivation, primarily in the context of FFI where matching C's union behaviour using transmute is tricky and error-prone.
There are rare cases where all attributes of the C union are valid at the same time. Say you have a 32-bit RGBA color value and you want to access the individual 8 bit values. You can make a union of an 32 bit int and a struct that contains 4x 8 bit integers.
Also you can manually tag them and get s.th. more like other high level languages. It will just look ugly.
Yes. I once wanted C unions limited to fully mapped type conversions, where any bit pattern in either type is a valid bit pattern in the other. Then you can map two "char" to "int". Even "float". But pointer types must match exactly.
If you want disjoint types, something like Pascal's discriminated variants or Rust's enums is the way to go. It's embarrassing that C never had this.
Many bad design decision in C come from the fact that originally, separate compilation was really dumb, so the compiler would fit in small machines.
I do not agree. MaybeUninit is without any doubt more valuable than the C FFI use
I can't even think of any prominent C FFI problems where I'd reach for the union's C representation. Too many languages can't handle that so it seems less useful at an FFI edge.
OS APIs for one, at least there are some Win32 calls that take unions if I remember correctly.
One of the reasons .NET had Managed C++, replaced by C++/CLI (nowadays C++20 compliant, minus modules), is exactly that P/Invoke (and RCW/CCW) cannot represent everything.
Which they don't want to expose on .NET type system directly.
In Rust? The two I'm big fans of, CompactString and ColdString do not use unions although historically CompactString did so and it still has a dependency on smallvec's union feature
ColdString is easier to explain, the whole trick here is the "Maybe this isn't a pointer?" trick, ColdString might be a single raw pointer onto your heap with the rest of the data structure at the far end of the pointer, this case is expensive because nothing about the text lives inline, but... the other case is that your entire text was hidden in the pointer, on modern hardware that's 8 bytes of text, at no overhead, awesome.
CompactString is more like a drop-in replacement, it's much bigger, the same size as String, so 24 bytes on modern hardware, but that's all SSO, so text like "This will all fit nicely" fits inline, yet the out-of-line case has the usual affordances such as capacity and length in the data structure. This isn't doing the "Maybe this isn't a pointer?" trick but is instead relying on knowing that the last byte of a UTF-8 string can't have certain values by definition.
I realise that I don't do the best job of explaining ColdString here. After all most 8 byte strings of UTF-8 text could equally be a pointer so, why can this work?
All ColdStrings which look like 8 bytes of UTF-8 text really are 8 bytes of UTF-8 text, just the type label on those 8 bytes isn't "[u8; 8]" an array of 8 bytes but instead "mut *u8" a raw pointer. "Validate" for example is 8 bytes of ASCII, thus UTF-8, and Rust is OK with us just saying we want a pointer on a 64-bit machine with those bytes. It's not a valid pointer, but it is a pointer and Rust is OK with that, we just need to be careful never to [unsafely] dereference the pointer because it's invalid
OK, so there are two cases left: First, what if there are fewer bytes of text? Zero even?
Since there are fewer than 8 bytes of text we can use the whole first byte to signal how many of the remainder are text, we use the UTF-8 over-long prefix indicator in which the top five bits of the byte are all set, bytes 0xF8 through 0xFF for this, there are eight of these bytes corresponding to our 8 lengths 0 through 7 inclusive. Because it's over-long this indicator isn't itself a valid UTF-8 prefix. Again we can pretend this is a pointer while knowing it's invalid.
Lastly, the seemingly trickiest problem, what if the string didn't fit inline? We use a heap allocation to store the text prefixed by a variable size integer length and we insist this allocation is aligned to 4 bytes. This means a valid pointer to our allocation has zeroes for the bottom two bits, then we rotate that pointer so those bottom two bits are at the top of the first byte position (depending on machine word layout) and we set the top bit. This is now always invalid UTF-8 because it has the continuation marker - the top bit is set but the next is not, which cannot happen in the first byte of any UTF-8 text, and so our code can detect this and reverse the transformation to get back a valid pointer using the strict provenance APIs if this marker is present.
I would love to see a page that has cross-language comparisons of how different structures work. Eg “unions: differences between langs. Enums: …” while grouping together the different design choices, as you do in this one case.
I suppose an LLM could do a pretty good job at this.
It's trying to generalize - we might have exactly one T, fine, or a collection of T, and that's more T... except no, the collection might be zero of them, not at least one and so our type is really "OneOrMoreOrNone" and wow, that's just maybe some T.
it's for type purists, because sometimes you want the first element of the list but if you do that you will get T? which is stupid if you know that the list always holds an element, because now you need to have an unnecessary assertion to "fix" the type.
The NonEmptyList in Cats is a product (struct/tuple) type, though; I assume the Haskell version is the same. The type shown in the blog post is a sum (union) type which can contain an empty enumerable, which contradicts the name OneOrMore. The use described for the type in the post (basically a convenience conversion funnel) is different and makes sense in its own right (though it feels like kind of a weak use case). I'm not sure what a good name would've been illustratively that would've been both accurate and not distracting, though.
Well you are right of course, I just wanted to explain what they wanted to show. Of course the type would be wrong if the second entry in itself is an empty list. I just wanted to explain the reasoning what they tried to accomplish
They could’ve done the Either type which would’ve been more correct or maybe EitherT (if the latter is even possible)
I don't think they were trying to accomplish the same thing as the Scala/Haskell version; these are just two completely different things that happen to share a name because the blog post gave the example a name that is confusing when read literally. The purpose of the Cats version is “there is always a head element”. The purpose of the union in the blog post is more like “this can be a collection, but many callers will be thinking of it as a single element, so don't put the burden on them to convert it”. I do think it's a weak case for them in a type theory sense (I would tend to position that kind of implicit conversion elsewhere in the language), but I can also see it being motivating to a large class of developers…
… wait, I've made a different mistake here while trying to explain the difference, haven't I? I was describing it as a sum type, but it's not really a sum type, it's really just set-theoretic union, right?
Which also means OneOrMore is unsound in a different way because it doesn't guarantee that T and IEnumerable are disjoint; OneOrMore
But now you can only call methods that are available for both T and IEnumerable, you have no way of knowing which it actually is. (You would know if it were sum types)
If I understand correctly, it’s actually OneOrOneOrMoreOrNone. Because you have two different distinguishable representations of “one”.
The only reason to use this would be if you typically have exactly one, and you want to avoid the overhead of an enumeration in that typical case. In other words, AnyNumberButOftenJustOne.
I have the same gripe about IsNullOrWhiteSpace which is basically an extension to IsNullOrEmpty, so it really should be named IsNullOrEmptyOrWhiteSpace.
I haven't read this in detail but I expect it to be the same kind of sealed type that many other languages have. It doesn't cover ad-hoc unions (on the fly from existing types) that are possible in F# (and not many non-FP languages with TypeScript being the most notable that does).
The problem with ad-hoc unions is that without discipline, it invariably ends in a mess that is very, very hard to wrap your head around and often requires digging through several layers to understand the source types.
In TS codebases with heavy usage of utility types like Pick, Omit, or ad-hoc return types, it is often exceedingly difficult to know how to correctly work with a shape once you get closer to the boundary of the application (e.g. API or database interface since shapes must "materialize" at these layers). Where does this property come from? How do I get this value? I end up having to trace through several layers to understand how the shape I'm holding came to be because there's no discrete type to jump to.
This tends to lead to another behavior which is lack of documentation because there's no discrete type to attach documentation to; there's a "behavioral slop trigger" that happens with ad-hoc types, in my experience. The more it gets used, the more it gets abused, the harder it is to understand the intent of the data structures because much of the intent is now ad-hoc and lacking in forethought because (by its nature) it removes the requirement of forethought.
"I am here. I need this additional field or this additional type. I'll just add it."
This creates a kind of "type spaghetti" that makes code reuse very difficult.
So even when I write TS and I have the option of using ad-hoc types and utility types, I almost always explicitly define the type. Same with types for props in React, Vue, etc; it is almost always better to just explicitly define the type, IME. You will thank yourself later; other devs will thank you.
Yeah, Typescript feels like it had has arrived at the point where someone needs to write “Typescript: the good parts” and explains all of the parts of the language you probably shouldn’t be using.
Hi there! One of the C# language designers here, working on unions.
We're very interesting in this space. And we're referring to it as, unsurprisingly, 'anonymous unions' (since the ones we're delivering in C#15 are 'nominal' ones).
An unfortunate aspect of lang design is that if you do something in one version, and not another, that people think you don't want the other (not saying you think that! but some do :)). That's definitely not the case. We just like to break things over many versions so we can get the time to see how people feel about things and where are limited resources can be spent best next. We have wanted to explore the entire space of unions for a long time. Nominal unions. Anonymous unions. Discriminated unions. It's all of interest to us :)
Well, there is also the issue that some things get designed and then abandoned even thought some improvements were expected, dynamic typing from DLR, expression trees, for example.
=> named sum type implicitly tagged by it's variant types
but not "sealed", as in no artificial constraints like that the variant types need to be defined in the "same place" or "as variant type", they can be arbitrary nameable types
> unions enable designs that traditional hierarchies can’t express, composing any combination of existing types into a single, compiler-verified contract.
To me that "compiler-verified" maps to "sealed", not "on the fly". Probably.
Their example is:
public union Pet(Cat, Dog, Bird);
Pet pet = new Cat("Whiskers");
- the union type is declared upfront, as is usually the case in c#. And the types that it contains are a fixed set in that declaration. Meaning "sealed" ?
I think your "sealed" is misleading here, as that is used for sum types in similar languages (java).
As the language designer notes in the comments, these are named unions, as opposed to anonymous ones, but they are also working on the latter.
"Sealed" is probably not the correct word to use here, as it would be sealed in both case (it doesn't really make sense to "add" a type to the A | B union). The difference is that you have to add a definition and name it.
var someUser = new { Name = "SideburnsOfDoom", CommentValue = 3 };
What type is someUser ? Not one that you can reference by name in code, it is "anonymous" in that regard. But the compiler knows the type.
A type can be given at compile-time in a declaration, or generated at compile-time by the compiler like this. But it is still "Compiler-verified" and not ad-hoc or at runtime.
the type (Dog, Cat) pet seems similar, it's known at compile-time and won't change. A type without a usable name is still a type.
Is this "ad-hoc"? It depends entirely on what you mean by that.
I mean that Cat, Dog and Bird don't have to inherit from the union, you can declare a union of completely random types, as opposed to saying "Animal has three subtypes, no more, no less", which is what F# does more or less.
I love it, but I see a downside, though: unions are currently implemented as structs that box value types into a Value property of type object. So there can be performance implications for hot paths.
That is not what C# has just added to the language though. These union types so far are just wrappers over an object field which gets downcasted.
F# offers actual field sharing for value-type (struct) unions by explicitly sharing field names across cases, which is as far as you can push it on the CLR without extra runtime support.
Yes, there's a compat-shim in the stdlib/runtime, but not in the language syntax. E.g. it by-definition won't do escape-analysis and optimize discriminated value-types with the first-class keyword.
Sad part is, is that ad hoc unions probably won’t make it into v1. That is probably one of the only feature why I like typescript. Because I can write result types in a good way without creating thousands of sub types. It’s even more important when using Promises and not having checked exceptions.
Is this the last of the F# features to be migrated into C#?
What a missed opportunity. I think really F# if you combine all of its features, and what it left out, was the way. Pulling them all into C# just makes C# seem like a big bag of stuff, with no direction.
F#'s features, and also what it did not included, gave it a style and 'terseness', that still can't really be done in C#.
I don't really get it. Was a functional approach really so 'difficult'? That it didn't continue to grow and takeover.
> reduces the amount of dead boilerplate code other languages struggle with.
given that most of the thinks added seem more inspired by other languages then "moved over" from F# the "other languages struggle with" part makes not that much sense
like some languages which had been ahead of C# and made union type a "expected general purpose" feature of "some kind":
- Java: sealed interfaces (on high level the same this C# features, details differ)
- Rust: it's enum type (but better at reducing boilerplate due to not needing to define a separate type per variant, but being able to do so if you need to)
- TypeScript: untagged sum types + literal types => tagged sum types
- C++: std::variant (let's ignore raw union usage, that is more a landmine then a feature)
either way, grate to have it, it's really convenient to represent a TYPE is either of TYPES relationship. Which are conceptually very common and working around them without proper type system support is annoying (but very viable).
I also would say that while it is often associated with functional programing it has become generally expected even if you language isn't functional. Comparable to e.g. having some limited closure support.
In isolation, yes, I agree with you. But in the context of the cornucopia of other "carefully evaluated" features mixed into the melting pot, C# is a nightmare of language identities - a jack of all trades, master of none, choose your dialect language. No thanks.
> C# is a nightmare of language identities - a jack of all trades, master of none, choose your dialect language.
I honestly have no idea where you would get this idea from. C# is a pretty opinionated language and it's worst faults all come from version 1.0 where it was mostly a clone of Java. They've been very carefully undoing that for years now.
It's a far more comfortable and strict language now than before.
If it’s not for you I guess that is ok. But from your comment I would also deduct that you never professionally used it.
After so many different languages it’s the only one I always comeback to.
The only things that I wish for are: rusts borrow-checker and memory management. And the AOT story would be more natural.
Besides that, for me, it is the general purpose language.
Yes, C# is a jack of all trades and can be used at many things. Web, desktop mobile, microservices, CLI, embedded software, games. Probably is not fitted for writing operating systems kernels due to the GC but most areas can be tackled with C#.
C# is a perfect example of feature envy, but because "Java sucks" C# must be the best thing ever in the world of computing. Orthogonality and coherence be damned.
Absolutely agree. Modern C# language design feels very much lacking in vision or direction. It's mostly a bunch of shiny-looking language features being bolted on, all in ways that make the language massively more complex.
Was this needed? Was this necessary? It's reusing an existing keyword, fine. It's not hard to understand. But it adds a new syntax to a language that's already filled to the brim, just to save a few keystrokes?
Try teaching someone C# nowadays. Completely impossible. Really, I wish they would've given F# just a tenth of the love that C# got over the years. It has issues but it could've been so much more.
I personally like the direction C# is taking. A multi-paradigm language with GC and flexibility to allow you to write highly expressive or high performance code.
Better than a new language for each task, like you have with Go (microservices) and Dart (GUI).
I'm using F# on a personal project and while it is a great language I think the syntax can be less readable than that of C#. C# code can contain a bit too much boilerplate keywords, but it has a clear structure. Lack of parenthesis in F# make it harder to grasp the structure of the code at a glance.
I would think about it more as them including features other more general purpose languages with a "general" style have adopted then "migrating F# features into C#, as you have mentioned there are major differences between how C# and F# do discriminated sum types.
I.e. it look more like it got inspired by it's competition like e.g. Java (via. sealed interface), Rust (via. enum), TypeScript (via structural typing & literal types) etc.
> Was a functional approach really so 'difficult'?
it was never difficult to use
but it was very different in most aspects
which makes it difficult to push, sell, adapt etc.
that the maybe most wide used functional language (Haskel) has a very bad reputation about being unnecessary complicated and obscure to use with a lot of CS-terminology/pseudo-elitism gate keeping doesn't exactly help. (Also to be clear I'm not saying it has this properties, but it has the reputation, or at least had that reputation for a long time)
To me it makes sense because C# is a very general purpose language that has many audiences. Desktop GUI apps, web APIs, a scripting engine for gaming SDKs, console apps.
It does each reasonably well (with web APIs being where I think they truly shine).
> Was a functional approach really so 'difficult'
It is surprisingly difficult for folks to grasp functional techniques and even writing code that uses Func, Action, and delegates. Devs have no problem consuming such code, but writing such code is a different matter altogether; there is just very little training for devs to think functionally. Even after explaining why devs might want to write such code (e.g. makes testing much easier), it happens very, very rarely in our codebase.
Union is almost a net positive to C# in my opinion.
But I do agree. C# is heading to a weird place. At first glance C# looks like a very explicit language, but then you have all the hidden magical tricks: you can't even tell if a (x) => x will be a Func or Expression[0], or if a $"{x}"[1] will actually be evaluated, without looking at the callee's signature.
Microsoft's management has always behaved as if it was a mistake to have added F# into Visual Studio 2010, and being stuck finding a purpose for it.
Note that most of its development is still by the open source community and its tooling is an outsider for Visual Studio, where everything else is shared between Visual Basic and C#.
With the official deprecation of VB, and C++/CLI, even though the community keeps going with F#, CLR has changed meaning to C# Language Runtime, for all practical purposes.
Also UWP never officially supported F#, although you could get it running with some hacks.
Similarly with ongoing Native AOT, there are some F# features that break under AOT and might never be rewritten.
211 comments
Other languages have unions with named choices, where each name selects a type and the types are not necessarily all different. Rust, Haskell, Lean4, and even plain C unions are in this category (although plain C unions are not discriminated at all, so they’re not nearly as convenient).
I personally much prefer the latter design.
If you're interested, i can point you to specs i'm writing that address the area you care about :)
Just wanted to add just another opinion on a few things.
I think many people already mentioned it, but I also don't feel to good about non-boxed unions not being the default. I'd personally like the path of least resistance to lead to not boxing. Having to opt-in like the current preview shows it looks like a PITA that I'd quickly become tired of.
Also, ad-hoc union types could be really nice. At least in Python those are really nice, stuff like
def foo(x: str | int)is just very nice. If I had to first give this union type a name I'd enjoy it way less.But I'm aware that you are trying your best to find a good trade-off and I'm sure I don't know all the implications of the things I wish you'd do. But I just wanted to mention those to have another data point you can weigh into your decision process.
> Having to opt-in like the current preview shows it looks like a PITA that I'd quickly become tired of.
My belief is that we will have a
union structjust like we haverecordandrecord struct. Where you can simply say you want value-type, non-boxing, behavior by adding thestructkeyword. This feels very nice to me, and in line with how we've treated other similar types. You'll only have to state this on the decl point, so for each union type it's one and done.> I think many people already mentioned it, but I also don't feel to good about non-boxed unions not being the default. I'd personally like the path of least resistance to lead to not boxing. Having to opt-in like the current preview shows it looks like a PITA that I'd quickly become tired of.
The problem is that the only safe way for the compiler to generate non-boxed unions would require non-overlapping fields for most value types.
Specifically the CLR has a hard rule that it must know with certainty where all managed pointers are at all times, so that the GC can update them if it moves the referenced object. This means you can only overlap value types if the locations of all managed pointers line up perfectly. So sure, you can safely overlap "unmanaged" structs (those that recursively don't contain any managed pointers), but even for those, you need to know the size of the largest one.
The big problem with the compiler doing any attempt to overlap value types is that if the value types as defined at compile time may not match the definitions at runtime, especially for types defined in another assembly. A new library version can add more fields. This may mean one unmanaged struct has become too big to fit in the field, or that two types that were previously overlap compatible are not anymore.
Making the C# compiler jump though a bunch of hoops to try to determine if overlapping is safe and even then leaving room for an updated library at runtime to crash the whole things means that the compiler will probably never even try. I guess the primitive numeric types could be special cased, as their size is known and will never change.
Again, very rough. We go round and round on things. Loving to decompose, debate and determine how we want to tackle all these large and interesting areas :)
I admit that I haven’t actually used C# in 20 years or so.
An example that may show the difference: if you have a language with nullable types, then you basically have a language with union types like String|Null, where the Null type has a single value called
nulland String can not be null.Now if you pass this around a function that itself may return
null, then your type coalesces to String|Null still (you still get a nullable string, there is no doubly nullable). This is not true for Maybe/Option whatever you call types, where Some(None) (or Optional.of(Optional.empty())) is different from None only.Rich Hickey once made a case that sort of became controversial in some FP circles, that the former can sometimes be preferred (e.g. at public API surfaces), as in for a parameter you take a non-nullable String but for returns you return a String|Null. In this case you can have an API-compatible change widening the input parameters' type, or restricting the return type - meanwhile with sum types you would have to do a refactor because Maybe String is not API compatible with String.
> Careful not to mix unions with sum types, though. The key distinction is that the latter are disjunct sets, even if you "sum" together the same type twice, you can always tell which "way" you went.
This is a really good point. I'd love to be able to have a sum type of two strings ("escaped" and "unescaped"); or any two kinds of the same type really, to model two kinds of the same type where one has already passed some sort of validation and the other one hasn't.
Edit to add: I figure what I want is for enums to be extended such that different branches are able to carry different properties.
Edit again (I should learn to think things through before posting. sorry): I suppose it can be faked using a union of different wrapper types, and in fact it might be the best way to do it so then methods can take just one of the types in arguments and maybe even provide different overloads.
I've seen exactly one Rust type which is actually a union, and it's a pretty good justification for the existence of this feature, but one isn't really enough. That type is MaybeUninit which is a union of a T and the empty tuple. Very, very, clever and valuable, but I didn't run into any similarly good uses outside that.
That is essentially the motivation, primarily in the context of FFI where matching C's union behaviour using transmute is tricky and error-prone.
Also you can manually tag them and get s.th. more like other high level languages. It will just look ugly.
If you want disjoint types, something like Pascal's discriminated variants or Rust's enums is the way to go. It's embarrassing that C never had this.
Many bad design decision in C come from the fact that originally, separate compilation was really dumb, so the compiler would fit in small machines.
As for C, it is a sharp blade on its own.
I can't even think of any prominent C FFI problems where I'd reach for the union's C representation. Too many languages can't handle that so it seems less useful at an FFI edge.
One of the reasons .NET had Managed C++, replaced by C++/CLI (nowadays C++20 compliant, minus modules), is exactly that P/Invoke (and RCW/CCW) cannot represent everything.
Which they don't want to expose on .NET type system directly.
ColdString is easier to explain, the whole trick here is the "Maybe this isn't a pointer?" trick, ColdString might be a single raw pointer onto your heap with the rest of the data structure at the far end of the pointer, this case is expensive because nothing about the text lives inline, but... the other case is that your entire text was hidden in the pointer, on modern hardware that's 8 bytes of text, at no overhead, awesome.
CompactString is more like a drop-in replacement, it's much bigger, the same size as String, so 24 bytes on modern hardware, but that's all SSO, so text like "This will all fit nicely" fits inline, yet the out-of-line case has the usual affordances such as capacity and length in the data structure. This isn't doing the "Maybe this isn't a pointer?" trick but is instead relying on knowing that the last byte of a UTF-8 string can't have certain values by definition.
unions. I guess I'm not very up to dateAll ColdStrings which look like 8 bytes of UTF-8 text really are 8 bytes of UTF-8 text, just the type label on those 8 bytes isn't "[u8; 8]" an array of 8 bytes but instead "mut *u8" a raw pointer. "Validate" for example is 8 bytes of ASCII, thus UTF-8, and Rust is OK with us just saying we want a pointer on a 64-bit machine with those bytes. It's not a valid pointer, but it is a pointer and Rust is OK with that, we just need to be careful never to [unsafely] dereference the pointer because it's invalid
OK, so there are two cases left: First, what if there are fewer bytes of text? Zero even?
Since there are fewer than 8 bytes of text we can use the whole first byte to signal how many of the remainder are text, we use the UTF-8 over-long prefix indicator in which the top five bits of the byte are all set, bytes 0xF8 through 0xFF for this, there are eight of these bytes corresponding to our 8 lengths 0 through 7 inclusive. Because it's over-long this indicator isn't itself a valid UTF-8 prefix. Again we can pretend this is a pointer while knowing it's invalid.
Lastly, the seemingly trickiest problem, what if the string didn't fit inline? We use a heap allocation to store the text prefixed by a variable size integer length and we insist this allocation is aligned to 4 bytes. This means a valid pointer to our allocation has zeroes for the bottom two bits, then we rotate that pointer so those bottom two bits are at the top of the first byte position (depending on machine word layout) and we set the top bit. This is now always invalid UTF-8 because it has the continuation marker - the top bit is set but the next is not, which cannot happen in the first byte of any UTF-8 text, and so our code can detect this and reverse the transformation to get back a valid pointer using the strict provenance APIs if this marker is present.
This type is tomtomwombat's idea, credit to them:
https://github.com/tomtomwombat/cold-string*
> The C and Rust union types are extremely sharp blades
Sure, but the comparable Rust feature is enum, not union.
I suppose an LLM could do a pretty good job at this.
- enum: essentially just a typed uint tag collection
- union: the plain data structure that can contain any of several types
- tagged union: combines enums and unions, so you can dispatch on its tag to get one of the union types
Read from this section and they appear in order:
https://ziglang.org/documentation/master/#enum
It's trying to generalize - we might have exactly one T, fine, or a collection of T, and that's more T... except no, the collection might be zero of them, not at least one and so our type is really "OneOrMoreOrNone" and wow, that's just maybe some T.
https://hackage.haskell.org/package/oneormore or scala: https://typelevel.org/cats/datatypes/nel.html
it's for type purists, because sometimes you want the first element of the list but if you do that you will get T? which is stupid if you know that the list always holds an element, because now you need to have an unnecessary assertion to "fix" the type.
They could’ve done the Either type which would’ve been more correct or maybe EitherT (if the latter is even possible)
… wait, I've made a different mistake here while trying to explain the difference, haven't I? I was describing it as a sum type, but it's not really a sum type, it's really just set-theoretic union, right?
Which also means OneOrMore is unsound in a different way because it doesn't guarantee that T and IEnumerable are disjoint; OneOrMore
OneOrMorewas an example of usinguniontypes.You are free to call it
public union Some(T, IEnumerable) > so our type is really "OneOrMoreOrNone"
If I understand correctly, it’s actually OneOrOneOrMoreOrNone. Because you have two different distinguishable representations of “one”.
The only reason to use this would be if you typically have exactly one, and you want to avoid the overhead of an enumeration in that typical case. In other words, AnyNumberButOftenJustOne.
And you’re exactly right.
It’s not “one or more.”
It’s “one or not one.”
Need two or not two.
> OneOrMoreOrNone
So IEnumerable ? What's up with wrapping everything into fancy types just to arrive at the exact same place.
The problem with ad-hoc unions is that without discipline, it invariably ends in a mess that is very, very hard to wrap your head around and often requires digging through several layers to understand the source types.
In TS codebases with heavy usage of utility types like
Pick,Omit, or ad-hoc return types, it is often exceedingly difficult to know how to correctly work with a shape once you get closer to the boundary of the application (e.g. API or database interface since shapes must "materialize" at these layers). Where does this property come from? How do I get this value? I end up having to trace through several layers to understand how the shape I'm holding came to be because there's no discrete type to jump to.This tends to lead to another behavior which is lack of documentation because there's no discrete type to attach documentation to; there's a "behavioral slop trigger" that happens with ad-hoc types, in my experience. The more it gets used, the more it gets abused, the harder it is to understand the intent of the data structures because much of the intent is now ad-hoc and lacking in forethought because (by its nature) it removes the requirement of forethought.
This creates a kind of "type spaghetti" that makes code reuse very difficult.So even when I write TS and I have the option of using ad-hoc types and utility types, I almost always explicitly define the type. Same with types for props in React, Vue, etc; it is almost always better to just explicitly define the type, IME. You will thank yourself later; other devs will thank you.
> ad-hoc unions (on the fly from existing types) that are possible in F#
Are you sure? This is a feature of OCaml but not F# IIUIR
Edit: https://github.com/fsharp/fslang-suggestions/issues/538
> It doesn't cover ad-hoc unions
Yes and no. C# unions aren’t sealed types, that’s a separate feature. But they are strictly nominal - they must be formally declared:
Which isn’t at all the same as saying: It is the same as the night and day difference between tuples and nominal records.We're very interesting in this space. And we're referring to it as, unsurprisingly, 'anonymous unions' (since the ones we're delivering in C#15 are 'nominal' ones).
An unfortunate aspect of lang design is that if you do something in one version, and not another, that people think you don't want the other (not saying you think that! but some do :)). That's definitely not the case. We just like to break things over many versions so we can get the time to see how people feel about things and where are limited resources can be spent best next. We have wanted to explore the entire space of unions for a long time. Nominal unions. Anonymous unions. Discriminated unions. It's all of interest to us :)
union([],*) , i.e.=> named sum type implicitly tagged by it's variant types
but not "sealed", as in no artificial constraints like that the variant types need to be defined in the "same place" or "as variant type", they can be arbitrary nameable types
> unions enable designs that traditional hierarchies can’t express, composing any combination of existing types into a single, compiler-verified contract.
To me that "compiler-verified" maps to "sealed", not "on the fly". Probably.
Their example is:
public union Pet(Cat, Dog, Bird);
Pet pet = new Cat("Whiskers");
- the union type is declared upfront, as is usually the case in c#. And the types that it contains are a fixed set in that declaration. Meaning "sealed" ?
As the language designer notes in the comments, these are named unions, as opposed to anonymous ones, but they are also working on the latter.
"Sealed" is probably not the correct word to use here, as it would be sealed in both case (it doesn't really make sense to "add" a type to the A | B union). The difference is that you have to add a definition and name it.
(Dog, Cat) pet = new Cat();
So without defining the union with an explicit name beforehand.
someUser? Not one that you can reference by name in code, it is "anonymous" in that regard. But the compiler knows the type.A type can be given at compile-time in a declaration, or generated at compile-time by the compiler like this. But it is still "Compiler-verified" and not ad-hoc or at runtime.
the type (Dog, Cat) pet seems similar, it's known at compile-time and won't change. A type without a usable name is still a type.
Is this "ad-hoc"? It depends entirely on what you mean by that.
int or string. Not sure what happened with it though.Here is visual layout if anyone is interested - https://vectree.io/c/memory-layout-tagging-and-payload-overl...
objectfield which gets downcasted.F# offers actual field sharing for value-type (struct) unions by explicitly sharing field names across cases, which is as far as you can push it on the CLR without extra runtime support.
* https://devblogs.microsoft.com/dotnet/csharp-15-union-types/...
This is essentially "sealed interface" from Java/Kotlin or "enum" in Rust
One thing I miss here (and admittedly I only skimmed through the post so if I missed this, please do correct me) is "ad hoc" unions.
It would be great to be able to do something like
Without having to declare the union first. Basically OneOf<...> but built-in and more integratedWhat a missed opportunity. I think really F# if you combine all of its features, and what it left out, was the way. Pulling them all into C# just makes C# seem like a big bag of stuff, with no direction.
F#'s features, and also what it did not included, gave it a style and 'terseness', that still can't really be done in C#.
I don't really get it. Was a functional approach really so 'difficult'? That it didn't continue to grow and takeover.
So far everything that was added to C# very much reduces the amount of dead boilerplate code other languages struggle with.
Really give it an honest try before you judge it based on the summation of headlines.
> reduces the amount of dead boilerplate code other languages struggle with.
given that most of the thinks added seem more inspired by other languages then "moved over" from F# the "other languages struggle with" part makes not that much sense
like some languages which had been ahead of C# and made union type a "expected general purpose" feature of "some kind":
- Java: sealed interfaces (on high level the same this C# features, details differ)
- Rust: it's enum type (but better at reducing boilerplate due to not needing to define a separate type per variant, but being able to do so if you need to)
- TypeScript: untagged sum types + literal types => tagged sum types
- C++: std::variant (let's ignore raw union usage, that is more a landmine then a feature)
either way, grate to have it, it's really convenient to represent a
TYPE is either of TYPESrelationship. Which are conceptually very common and working around them without proper type system support is annoying (but very viable).I also would say that while it is often associated with functional programing it has become generally expected even if you language isn't functional. Comparable to e.g. having some limited closure support.
> C# is a nightmare of language identities - a jack of all trades, master of none, choose your dialect language.
I honestly have no idea where you would get this idea from. C# is a pretty opinionated language and it's worst faults all come from version 1.0 where it was mostly a clone of Java. They've been very carefully undoing that for years now.
It's a far more comfortable and strict language now than before.
The only things that I wish for are: rusts borrow-checker and memory management. And the AOT story would be more natural.
Besides that, for me, it is the general purpose language.
>a jack of all trades
Yes, C# is a jack of all trades and can be used at many things. Web, desktop mobile, microservices, CLI, embedded software, games. Probably is not fitted for writing operating systems kernels due to the GC but most areas can be tackled with C#.
Just look at this feature: https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/cs...
Was this needed? Was this necessary? It's reusing an existing keyword, fine. It's not hard to understand. But it adds a new syntax to a language that's already filled to the brim, just to save a few keystrokes?
Try teaching someone C# nowadays. Completely impossible. Really, I wish they would've given F# just a tenth of the love that C# got over the years. It has issues but it could've been so much more.
Better than a new language for each task, like you have with Go (microservices) and Dart (GUI).
I'm using F# on a personal project and while it is a great language I think the syntax can be less readable than that of C#. C# code can contain a bit too much boilerplate keywords, but it has a clear structure. Lack of parenthesis in F# make it harder to grasp the structure of the code at a glance.
> big bag of stuff, with no direction.
also called general purpose, general style langue
> that still can't really be done in C#
I would think about it more as them including features other more general purpose languages with a "general" style have adopted then "migrating F# features into C#, as you have mentioned there are major differences between how C# and F# do discriminated sum types.
I.e. it look more like it got inspired by it's competition like e.g. Java (via. sealed interface), Rust (via. enum), TypeScript (via structural typing & literal types) etc.
> Was a functional approach really so 'difficult'?
it was never difficult to use
but it was very different in most aspects
which makes it difficult to push, sell, adapt etc.
that the maybe most wide used functional language (Haskel) has a very bad reputation about being unnecessary complicated and obscure to use with a lot of CS-terminology/pseudo-elitism gate keeping doesn't exactly help. (Also to be clear I'm not saying it has this properties, but it has the reputation, or at least had that reputation for a long time)
It does each reasonably well (with web APIs being where I think they truly shine).
It is surprisingly difficult for folks to grasp functional techniques and even writing code that usesFunc,Action, and delegates. Devs have no problem consuming such code, but writing such code is a different matter altogether; there is just very little training for devs to think functionally. Even after explaining why devs might want to write such code (e.g. makes testing much easier), it happens very, very rarely in our codebase.But I do agree. C# is heading to a weird place. At first glance C# looks like a very explicit language, but then you have all the hidden magical tricks: you can't even tell if a (x) => x will be a Func or Expression[0], or if a $"{x}"[1] will actually be evaluated, without looking at the callee's signature.
[0]: https://learn.microsoft.com/en-us/dotnet/csharp/advanced-top...
[1]: https://learn.microsoft.com/en-us/dotnet/csharp/advanced-top...
Note that most of its development is still by the open source community and its tooling is an outsider for Visual Studio, where everything else is shared between Visual Basic and C#.
With the official deprecation of VB, and C++/CLI, even though the community keeps going with F#, CLR has changed meaning to C# Language Runtime, for all practical purposes.
Also UWP never officially supported F#, although you could get it running with some hacks.
Similarly with ongoing Native AOT, there are some F# features that break under AOT and might never be rewritten.
A lost opportunity indeed.
>Is this the last of the F# features to be migrated into C#? >What a missed opportunity.
Not adding functional features to F# doesn't mean F# would have gained more usage. And if someone wants to use F#, no one is stopping him or her.
Even outside of C#, this trend seems to show up everywhere — trying to reduce boilerplate while keeping things safe.
The only thing I wish now is for someone to build a functional Web framework for C#.
Or is it becoming a ball-of-mud/bad language compared to its contemporaries?
(Honest questions. I have never used .NET much. I'm curious)
As much as I hate Microsoft, I admit they are doing great things for C#