Algebraic Data Types

An overview of ADTs with F# & OCaml

May 07, 2025

Introduction

Algebraic Data Types (ADTs) form a cornerstone of type theory and functional programming languages, offering a means to construct robust type systems by giving authors the ability to compose complex data types from simpler ones. When we say algebraic, we mean to define operations between type constructors and algebraic operations. In other words, product types correspond to multiplication (Cartesian products of sets of values), while sum types correspond to addition (disjoint unions of sets of values). By using sums and products to treat types algebraically, domains can be clearly and expressively represented, and handled exhaustively through pattern matching, a control flow construct common in functional languages, particularly those that share roots in the ML syntax and even in languages like Gleam and Rust. This exhaustiveness allows leveraging strong static type systems (such as those based on Hindley-Milner type inference) to catch errors at compile time and ensure correctness.

Foundations of ADTs: Sum and Product Types

Let’s dive deeper into the two fundamental constructs of ADTs:

Product Types
Sum Types

Product Types

Product types combine multiple values, called fields, into a single composite type. In many languages these are tuples and records (seen most commonly as key value pairs). Using set theory, we say that the set of values that make up a product type is the Cartesian product of those values. For example, a 2-tuple type of an integer and a boolean can represent any ordered pair of an integer and true, as well as an integer and false.

Sum Types

Sum types are represented by a group of related types, each of which is called a constructor. A constructor is a named variant that can carry any number of values of any valid type. In many programming languages these are called enums, or unions. The total number of values for a sum type is the sum of all possible values of its variants, i.e. the disjoint union of the sets of values of its variants (in set theory).

By combining sums and products, developers can model intricate data invariants directly in the type system. Consider a commonly implemented data structure: a binary tree. Usually a node, or leaf in a tree, is represented by its data, and pointers to its children. There are two possible cases for a binary tree that we can represent for a tree, either empty, or a root node with two subtrees. The node itself can be a product type which demonstrates the relationship between a type definition and the shape of its data.

Pattern Matching and Type Safety

Any variant of a sum type (a constructor) and its combinations are valid patterns. We check values against possible patterns using one of functional programming’s most compelling and powerful features, made possible by ADTs: Pattern Matching. This allows us to deconstruct values of constructors, and then bind inner fields to names (think destructuring assignment or tuple unpacking in JavaScript, and Python, respectively).

What makes this construct so powerful?

Exhaustiveness checking: The compiler warns if not all variants are handled, preventing runtime “match failure” errors, ensuring correctness
Clarity of intent: Each pattern branch explicitly names the data shape it handles, improving readability.
Conciseness: Nested data can be deconstructed succinctly, reducing boilerplate.

Let’s return to our binary tree from the previous section. When we match against variants of our tree type in OCaml for example, the compiler ensures that any additional variant forces compile time updates to every expression. This is a safety guarantee provided by the compiler that reminds authors to cover every case in every match.

ADTs in OCaml

Let’s dive deeper into OCaml’s support for ADTs. These are provided by variant types (sum types) and records (product types). Here’s an example of how we might represent a tree in OCaml:

Here, 'a tree is what we call a parametric ADT, meaning it can hold values of any type 'a. The empty constructor carries no data, whereas Node carries a 3-tuple/triple: a value of type 'a and two subtrees of the same type. Using pattern matching, we can deconstruct the values to the following:

Records (called structs in languages like Rust and C++) provide named fields, improving self‐documentation:

This definition models geometric shapes with explicit field names. OCaml’s compiler enforces exhaustive matching on these variants, ensuring that any new shape variant must be accounted for in all uses of a match statement.

Parametric Polymorphism and Functoriality

OCaml’s ADTs support polymorphism, allowing definitions to be generic over types (e.g., the ‘a parameter in 'a tree). This parametricity allows for code reuse: the same tree‐processing functions work for trees containing any node type, like integer trees, string trees, or even complex user‐defined types (record or tuples!). Under the hood, the Hindley–Milner type system infers the most general types for functions operating on ADTs, sparing the programmer from repetitive annotations and preserving type safety.

ADTs in F#

F# mirrors OCaml’s ADT capabilities via discriminated unions, coupled with records for product types. In fact F# is often called Microsoft/.NET’s version of OCaml.

A binary tree in F# can be defined as:

This declaration is semantically identical to the OCaml variant. Pattern matching in F# uses the match…with construct:

For records and discriminated unions with named fields, F# allows:

Again we see that pattern matching can extract fields by name:

FSharp’s discriminated unions also support recursive definitions, union cases with multiple fields, and can model complex hierarchies without boilerplate classes or inheritance.

Enhancements: Active Patterns and Union Case Testing

Beyond basic ADTs, F# introduces active patterns, allowing custom deconstruction logic for values that may not be direct ADTs (e.g., strings or numbers). Active patterns further generalize pattern matching and maintain ADT‐like clarity across diverse data sources. Additionally, the :? operator enables union case testing in object‐oriented contexts, bridging ADTs with .NET interoperability. Note that F# is not a functional language, just a functional-first programming language.

Comparative Discussion

OCaml and F# share the ML-syntax lineage and nearly identical ADT semantics, whereby each language leverages ADTs to eliminate a wide class of runtime errors and to encode domain logic directly in type definitions. This encourages a design style where correctness emerges from the type checker rather than from comprehensive test suites alone. However, there are subtle differences:

Syntax: OCaml uses type ... = … with constructors preceded by |, whereas F# uses a similar syntax but encourages naming union fields directly.
Tooling: F# benefits from the .NET ecosystem, integrating seamlessly with C# libraries and Visual Studio tooling.
- OCaml’s ecosystem is more language‐centric, with jbuilder/dune and opam managing builds and packages.
Interoperability: F# ADTs compile to corresponding .NET types, which can be consumed by other CLR languages, sometimes requiring attribute annotations. In contrast OCaml ADTs compile to native code optimized for performance in different runtimes. Both languages have alternative compilers that allow compiling to JavaScript with Fable for F# and tools like Melange for OCaml (and js_of_ocaml).

Parametric ADTs and GADTs

Generalized Algebraic Data Types (abbreviated GADTs) extend ADTs by allowing constructors to specify more precise return types, enabling richer invariants and encoding of logical constraints in types. In OCaml, GADT syntax uses type ... = with explicit return type annotations, seen below:

Here, the type parameter of expr can vary with each constructor, allowing heterogeneous expression trees that preserve type correctness at compile time. Pattern matching on GADTs refines type information within branches, avoiding unsafe casts and facilitating advanced type‐driven programming.

F# also provides GADT‐like functionality via inline functions and the static resolution of type parameters. Researchers continue to explore seamless GADT integration in F# as part of language evolution efforts.

Applications and Benefits

ADTs shine in scenarios requiring:

Domain modeling: Encoding business rules and workflows as sum‐of‐product types, such as HTTP response status codes, ASTs for compilers, or finite state machines.
Error handling: Representing success/failure cases explicitly via Result<'T,'Error> or custom error unions, avoiding unchecked exceptions.
Configuration and parsing: Defining precise schemas for configuration files or network protocols, with the compiler ensuring exhaustive parsing logic. If you’re familiar with the technique of using recursive descent for parsing syntax trees, you’ll find that matching against collections of tokens allows for a clean, non-regex based traversal.
Data transformations: Building safe, type‐checked pipelines where each stage maps one ADT to another, guaranteeing all cases are addressed.

The algebraic nature of types also aligns with equational reasoning, enabling formal verification techniques and equational shortcuts in refactoring.

Conclusion

Algebraic Data Types represent a powerful abstraction in typed functional programming, seamlessly combining sum and product constructs to model complex domains. OCaml and F# both offer first‐class support for ADTs—via variants and discriminated unions respectively—paired with exhaustive pattern matching and robust type inference. Advanced extensions such as GADTs further blur the line between types and values, unlocking expressive design patterns and compile‐time guarantees. As software systems grow in complexity and correctness becomes paramount, ADTs stand out as an essential tool in the functional programmer’s toolkit, fostering concise, safe, and declarative code.

Summary

ADTs blend product types (records, tuples) and sum types (variants, unions) to model complex data
Pattern matching on ADTs enforces exhaustive handling and clear deconstruction of data
OCaml defines ADTs via type ... = | ... and supports parametric polymorphism and GADTs natively
F# uses discriminated unions and records, integrates with .NET, and offers advanced features like active patterns
GADTs extend ADTs by refining constructor return types, enabling richer invariants and type‐driven design
Benefits include safer domain modeling, exhaustive error handling, and support for formal reasoning and verification

References

Clarkson, Michael R. . “3.9. Algebraic Data Types — OCaml Programming: Correct + Efficient + Beautiful.” Accessed May 1, 2025. https://cs3110.github.io/textbook/chapters/data/algebraic_data_types.html.
Madhavapeddy, Anil, Yaron Minsky, and Yaron Minsky. Real World OCaml: Functional Programming for the Masses. Second edition. Cambridge, United Kingdom New York, NY: Cambridge University Press, 2022.
W, Scott. “Algebraic Type Sizes and Domain Modelling | F# for Fun and Profit.” Accessed May 6, 2025. https://fsharpforfunandprofit.com/posts/type-size-and-design/.
Chrichton, Will. “CS 242: Algebraic Data Types.” Accessed May 1, 2025. https://stanford-cs242.github.io/f19/lectures/03-2-algebraic-data-types.html.
Dollard, Kathleen. “Pattern Matching - F#.” Accessed May 2, 2025. https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching.

Pattern Matched

Discussion about this post