The compile-time deinterleaving interface of luminance-0.40

luminance, type-safe, deinterleaving

2020-07-19 19:39:00 UTC, by Dimitri Sabadie — feed


luminance-0.40 is finally out, after several months of hardwork. A blog post is available here if you’ve missed it.

Among all the new features of luminance-0.40, there is one that I think is worth explaining in details, because it’s… pretty cool. If you like type states, refinement typing and on a general level, strong typing, you should enjoy this blog article.

While working on luminance-0.40, I spent two weeks working on a feature that is going to have a huge beneficial impact on how people use luminance. I’ve been wanting to see what people think about it for a long time, because I think it’s both a very powerful feature and allows to do low-level memory stuff in a safe way via the type system. Without further ado, let’s dig in.

The Tess type

In pre luminance-0.40, it is possible to gather vertices inside a type called Tess (it stands for tessellation). That type is responsible for holding vertices to render them later. They might represent point clouds, triangle-based meshes, lines, etc. It contains several properties, but we are going to focus on three:

In luminance-0.39, vertices, indices and instances can be retrieved, both read-only on read-write, via a mechanism of GPU slicing: you ask luminance to give you a type with which you are able to use &[T] and &mut [T].

However… we have several problems. First, the type is Tess and is completely monomorphized. What it means is that if you create a Tess that contains vertices of type SimpleVertex with indices of type u8, that information is not encoded in the type system — it is actually encoded as a value that is read and checked at runtime! When you ask for a slice, for instance to mutate the content of the Tess, you have to use a function like Tess::as_slice<V> where V: Vertex, which expects you to pass the type of stored vertices — in our case, it would be SimpleVertex. What happens if someone passed the wrong type? Well, currently, luminance checks at runtime that the stored type is the right one, but this is both wasted checks and not a very elegant API.

The same applies to indices and instance data: you don’t see them in the type. What happens if you slice the indices with u32? Runtime checks.

Now, there’s also the problem of building. When you build a Tess, you have to pass the vertices to the TessBuilder type, using functions like TessBuilder::add_vertices. It works, but it’s not practical. More importantly: if you call several times that function, you will create something that is called a deinterleaved memory data structure. Let’s digress a bit about that.

Digression on interleaved and deinterleaved data structures

Interleaving and deinterleaving makes sense when we talk about several objects / items of a same type, laid out in memory — typically in arrays. Imagine a type, Vertex, such as:

struct Vertex {
  position: Point3D,
  color: RGBA,
}

If you take a vector of Vertex (Vec<Vertex>), you get a continuous array of properties in memory. If you represent that in memory, you have something similar to this for two vertices (padding omitted):

[x0, y0, z0, r0, g0, b0, a0, x1, y1, z1, r1, g1, b1, a1]

We say that the memory is interleaved, because you’re going to alternate between fields when iterating in memory. Everything is interleaved. This kind of memory is what happens when you put a struct in an array, slice, etc. and is perfectly suited for most situations (even on the GPU). However, there is a small (yet important) detail: when you iterate (for instance in a for loop) on that array, you’re going to ask your CPU / GPU to load a bunch of values at once. That is going to fill your cache lines. If at each iteration you need to use every fields of the vertex, then that situation is pretty convenient, because you’re going to have a bunch of fields cached ahead (cache hits).

However… imagine a loop that only needs to access the position field. What’s going to happen is that your CPU / GPU will still load the same data in cache lines: now you get colors in the cache that you don’t need and your loop will make more cache misses. What could have been better would have been to fill the cache lines only with positions. If we had, instead:

[x0, y0, z0, x1, y1, z1]
[r0, g0, b0, a0, r1, g1, b1, a1]

Those two vectors can then be used independently for each need. Because we only need positions, we can simply use the first position vector. Now, when the CPU / GPU is going to load something in the cache, it’s going to cache much more values that we are going to actually use: we get more cache hits and it’s playa party.

That kind of memory layout is called deinterleaved memory. The way we typically do that is by simply moving the fields out of the struct and make several arrays of each field.

People tend to use two terms to describe both layouts: AoS and SoA.

So… months years ago, I realized that and decided a needed I better plan. Especially, on luminance-0.39, the way you handle slicing deinterleaved data is… well, inexistent. You cannot slice such data because it was never supported.

The Tess type… revisited

The new type is the following:

pub struct Tess<B, V, I = (), W = (), S = Interleaved>
where ABunchOfThings;

As you can see, there are a lot of new things there:

You will find the same type variables with the TessBuilder type.

The magic of type systems

The cool thing about that change is how it enabled me to yield much, much better APIs. Consider the previous API to create a deinterleaved Tess:

let direct_deinterleaved_triangles = TessBuilder::new(&mut surface)
  .add_vertices(TRI_DEINT_POS_VERTICES)
  .add_vertices(TRI_DEINT_COLOR_VERTICES)
  .set_mode(Mode::Triangle)
  .build()
  .unwrap();

Notice the two add_vertices. There is no type information checking and ensuring that:

Now, the new API looks like this:

let direct_deinterleaved_triangles = surface
  .new_deinterleaved_tess::<Vertex, ()>()
  .set_attributes(&TRI_DEINT_POS_VERTICES[..])
  .set_attributes(&TRI_DEINT_COLOR_VERTICES[..])
  .set_mode(Mode::Triangle)
  .build()
  .unwrap();

If you try to call set_vertices — the name got changed from add_vertices to set_vertices — on the builder you get from new_deinterleaved_tess, you will get a compilation error, because you cannot set vertices on deinterleaved tessellations: you need to set attribute vectors. The set_attributes has the information that you are doing that for a Vertex, so it can check the input data you pass and ensure it contains values which type is a field type used in Vertex. If not, you get a compilation error.

Most importantly: because of how vertices work in luminance, a field type is unique to a vertex: it doesn’t make sense to use twice the VertexPosition type. If you end up in such a situation, it means that your Semantics type lacks another variant — remember: vertex fields are basically semantics-based attributes. That leads to the possibility to automatically find out where exactly the data you provide needs to go inside the GPU tessellation.

The super cool part is that you can now slice deinterleaved tessellations by simply asking for:

In our case, we have a deinterleaved tessellation, which means we cannot slice whole vertices. If you try to get a slice of Vertex, you will get a compilation error. However, we can retrieve slices of vertex fields. The way we do this is super simple: we simply call the tess.vertices() or tess.vertices_mut() methods. It will infer the type of slices you are asking to automatically slice the right GPU buffer. This is all possible because our types are unique as the vertex fields.

let positions = tess.vertices().unwrap(); // you have to check for errors normally
let colors = tess.vertices().unwrap(); // you have to check for errors normally

Remark: you need to have the types of positions and colors inferred by setting / reading / passing them around, or you will have to put type ascriptions so that vertices() know what fields you are referring to.

Deinterleaving compile-time dispatching

So let’s dig a bit into how all this works. The first thing you need to know is that deinterleaving — the raw concept — is really simple. If you have a type such as:

struct Foo {
  a: A,
  b: B,
  c: C,
}

We say that Vec<Foo> is the interleaved representation of the collection. The deinterleaved representation needs to have three vectors of fields:

vec_a: Vec<A>
vec_b: Vec<B>
vec_c: Vec<C>

In order for a Vertex type to be valid in deinterleaving contexts, we need to have that tuple of vectors representation. First, we need a mapping between Vec<Foo> to (Vec<A>, Vec<B>, Vec<C>). This is the role of two traits: Deinterleave<T> and TessVertexData<S>.

Deinterleave<T> gives, for T, the field rank for T. For instance:

This is mandatory so that we know exactly which GPU buffers will need to be read / written to when creating tessellations and slicing them. You don’t have to implement Deinterleave<T> by yourself: luminance-derive does that automatically for you when you create a Vertex type. Also, you might be tempted to think that this rank will be used inside the Vertex type to retrieve data, but since you cannot pass whole vertices… nah. Also, you shouldn’t assume ranks based on fields declarations in struct (rustc can re-order that).

Next is TessVertexData<S>. It associates an input type — the type of data a tessellation will receive at creation type — for S, for the implementor type. The easy one is TessVertexData<Interleaved> for V where V: Vertex. The associated type is simply Vec<V>, because interleaved geometry simply stores the vertices as a vector of the whole vertex type. Simple.

It gets more complicated when we talk about deinterleaved geometry. TessVertexData<Deinterleaved> for V where V: Vertex has its associated type set to… Vec<DeinterleavedData>. Indeed: there is no simple way with the current stable Rust (and even nightly) to know the full type of Vertex fields at compile-time here. However, don’t get it wrong: that Vec is not a vector of vertices. It’s a typically small set of attributes (vectors too). If you look at the definition of DeinterleavedData, you get this:

#[derive(Debug, Clone)]
pub struct DeinterleavedData {
  raw: Vec<u8>,
  len: usize,
}

Yep. Type erasure at its finest. When you pass deinterleaved data, the data is type-erased and passed as a big blob of bytes, glued with its original size (so that we don’t have to store typing information – this will be needed when slicing vertex attributes).

Implementing slicing with these traits and data types is now possible: we can add another trait that we will use to slice vertices, for instance (the backend::VertexSlice trait in luminance) and based on the type of T in Deinterleave<T>, we can go and grab the GPU buffer we want. For instance, in both the OpenGL and WebGL luminance backends, buffers are stored in a Vec<_>, so in order to know which one we need to lookup, we simple use the <Vertex as Deinterleave<T>>::RANK constant value (a usize) and we’re good to go. You need two other traits for vertex instance data (InstanceSlice) and vertex indices (IndexSlice), and you’re good to go.

Wrap it up

So, checking at compile-time deinterleaving, in luminance, is done by:

A small note though. luminance — actually, I do that with all my code, whatever the language — considers a lot of operations unsafe. In this case, implementing Deinterleave<T> is unsafe. The reason is pretty obvious: you can implement Deinterleave<Whatever>, even if your vertex type doesn’t have a Whatever field. Doing so would yield a hazardous / undefined behavior when slicing the GPU buffer. luminance-derive takes care of implementing the unsafe interface for you.

I hope you have learned something new or gotten some ideas. Keep the vibes!