This article serves as an announcement to a new crate: proc-macro-faithful-display.

Enhancing faith in displaying Rust tokens

The problem

I’ve been around procedural macros in Rust for a few weeks now – in order to write my glsl-quasiquote crate. Quickly, I discovered that most types cannot be inspected. For instance, an Ident has no methods to get the underlying identifier (as a &str for instance). The only way to do so is to allocate a String, for instance, via a Display use, as in:

let ident_str = format!("{}", ident);

Now, imagine you have a fully formatted stream of GLSL tokens flowing in. I’ve already blogged lately about the fact Rust has some specific rules that will prevent you to write preprocessor pragmas. The only way to authorize them was to implement a hack (explained in the blog article).

Lately, I was working on some not-so-trivial glsl transformations in another project when I hit a parse error. I was surprised, because the glsl crate contains more than 130 unit tests about parsing. So I thought “Well, okay, it’s alright, I might have forgotten a very specific case not covered in my nom implementation. I’ll just get a small reproducible sample and use it as a new integration test in glsl”. That kind of reasoning is very important to me: if you find a bug, first thing to do (if you plan to fix it) is to write a unit test that fails first. This is very important because when the test succeeds, you have solved your problem. If someone provides a contribution later and that the test breaks again, you will have just shielded the project against a test regression. Seriously, I know it’s boring but: write tests.

So I wrote a test, a very simple one (this is actually an integration test, since I’ve found the error in a project using glsl). And… the test passed. I was so surprised: “So that code breaks in my crate, but succeeds in the integration test suite?! Well, wait. That means that the parser is actually correct. The problem might come from the way the parser is used… or the input.”

So my second reaction was to have a look at where the parser was called. It was in a glsl! procedural macro invocation – in my project, I hardcoded a vertex shader with glsl-quasiquote. And then, I came to the realization and remembered. glsl-quasiquote, in order to parse the input Rust token, has to use Display on them, in order to yield a String usable by the glsl parser. The parser is okay – i.e. the integration test passes – but the input string… loses information.

If you have taken a look at the integration test just above, you should have noticed an interesting construction:

void main() {
  float x = 1. * .5;
}

This is not a semantically valid vertex shader, but it should parse. And it does parse. But if you write that vertex shader with glsl-quasiquote, what you actually write is this:

let vertex_shader = glsl!{
  void main() {
    float x = 1. * .5;
  }
};

The Rust tokenizer will eat the tokens and the Display implementation will render the stream by using spaces whenever it sees fit. It will not respect the initial whitespaces. It might yield something like this:

void main ( ) { float x = 1. * . 5 ; }

Look closely: our .5 GLSL floating point constant value (it equals 0.5) has become . and 5. rustc has transformed it into two tokens instead of a single one. This is due to the fact that this float format is not allowed in Rust and is interpreted the same way a method call would be.

The solution

So, after realizing that, I just went for a documentation addition stating that numbers cannot be expressed this way, and that you have to use a leading 0. I really disliked writing that documentation exception but I didn’t have any solution to that.

A few days later, I was thinking about glsl variable interpolation and got a mind click. In order to implement preprocessor pragmas, I used a trick with Span, that gave me positional information of tokens. And I came to the realization that I could use the same kind of trick to implement a function that would yield an object implementing Display by respecting the initial layout / formatting!

Basically, a trait – FaithfulDisplay – is implemented for all flavours of Rust tokens:

All the implementations carefuly carry around the previous Span in order to pad enough spaces and newlines before they get formatted into the Formatter. This allows to reconstruct the input layout.

All this code is packaged in the proc-macro-faithful-display crate. You can find the documentation here. I use it – and tested against – the crate with glsl-quasiquote. It especially allowed me to remove all the preprocessor trick because now newlines are taken into account.

Is this viable?

Honestly, I have no idea. Currently, at the time of writing, it works like a charm. If you find any bug or weird behavior, please open an issue here. However, proc-macro hasn’t yet been stabilized and the Span type is accessible through a feature gate. All of this might change. However, I strongly think that:

Either Rust will keep that Span type because it is really helpful (its first purpose was to provide nice error reporting).
Either they’ll ditch it, but since quasiquoting is important and wanted (have a look at @bodil’s awesome typed-html crate for instance), something to replace it might come next.

Keep the vibes!

↑ A more faithful Display for proc-macro token types

rust, proc-macro

Sat Dec 1 19:51:00 2018 UTC