On dealing with owning and borrowing in public interfaces

On dealing with owning and borrowing in public interfaces

ownership, borrowing, API

2018-11-04 01:30:00 UTC, by Dimitri Sabadie


Disclaimer: this blog is about Rust and some of its intrinsics semantics, along with software design and architecture – especially for public interfaces like APIs. However, you may find it interesting to apply the concepts to any language that would support those ideas directly or indirectly.

I’ve been writing on a few examples code lately to add to documentations of some crates of mine. I write a lot of code that creates new objects that need other objects in order to be built. Most of the APIs you can see around tend to love the borrow principle – and I do. The idea is simple:

Depending on the crate you look at, the authors and how they see things, you might find a lot of ways to pass that string to your constructor. Let’s get technical. Especially, I want this blog post to give people a hint at how they should shape their APIs by driving the types with semantics.

The base code

struct Person {
  ident: String
}

Very simple to get started. We have a Person type that carries the name of that person. We don’t want to expose the internals of the type because we might add a lot of other things in there and allowing people to pattern-match a Person would break their code when we add more things.

impl Person {
  pub fn new(name: String) -> Person {
    Person { name }
  }
}

That works. This constructor works by move semantics: it expects you to move a String in in order to copy it into the Person value. Move semantics are clear here: the client must allocate a String or move one already allocated by someone else. This can get a bit boring. Imagine:

let someone = Person::new("Patrick Bateman"); // this won’t compile

Person::new("Patrick Bateman") doesn’t typecheck because "Patrick Bateman" has type &'static str here. It’s not a String.

Drive the allocation from within your code

So how can we fix the code above to have it compile?

let someone = Person::new("Patrick Bateman".to_owned()); // okay now

"Patrick Bateman".to_owned() makes a use of ToOwned::to_owned to clone the string. You could also have used Into::into. However, we’re still requiring clients of our API to perform the allocation – or move in, which is not ideal. A typical solution to this is to use a borrow and clone when needing an owned version.

impl Person {
  pub fn new(name: &str) -> Person {
    Person {
      name: name.to_owned()
    }
  }
}

This API enables the current code to compile:

let someone = Person::new("Patrick Bateman"); // okay because 'static is a subtype of 'a

We’re all good now, right? Not exactly: what happens if we try to pass the initial String we had when we moved something in?

let x: String = …;
let someone = Person::new(x); // not okay now since it’s a type mismatch

To fix this, we need to pass a &str out of a String. Since String implements Deref<Target = str>, it’s quite straight-forward:

let x: String = …;
let someone = Person::new(&x); // okay, here &x derefs to &str

You can also use the more explicit String::as_str method.

A nice trick here to be able to pass both &str and String is to use the AsRef trait.

impl Person {
  pub fn new<N>(name: N) -> Person where N: AsRef<str> {
    Person {
      name: name.as_ref().to_owned()
    }
  }
}

In this case, AsRef<str> is implemented for both &str (it just returns itself) and String – it just deref / uses the as_str method).

However, you can still see a problem here. If we keep using x after the call to Person::new, this code is actually okay and we can move on. If we don’t need x afterwards, we’re just wasting an opportunity to move in instead of allocating!

Drive the allocation from your code and allow for moving in

Clearly, to me, the perfect API would enable you to pass borrows and ask the API code to clone for you or just accept the memory region you provide (i.e. you move something in). The idea is then to accept either &str or String in our case, and clone only a &str… There are several traits and types that provide that feature.

ToOwned + Cow

Cow – for Clone on write – is a very interesting concept. It encodes that you’re either borrowing some data or that you’ve already borrowed it. See it as:

enum Cow<'a, T> where T: 'a + ?Sized + ToOwned {
  Borrowed(&'a T),
  Owned(T::ToOwned)
}

impl<'a, T> Cow<'a, T> where T: 'a + ?Sized + ToOwned {
  pub fn into_owned(self) {
    match self {
      Cow::Borrowed(b) => b.to_owned(),
      Owned(o) => o
    }
  }
}

Now, the interesting part: it’s possible to go from &'a str to Cow<'a, str> and from String to Cow<'a, str> by using Into implementors. That enables us to write this:

impl Person {
  pub fn new<'a, N>(name: N) -> Person where N: Into<Cow<'a, str>> {
    Person { name: name.into().into_owned() }
  }
}

This code will move in a String if you passed one and clone if you passed &str. The following lines of code compile and work as a charm:

let _ = Person::new("Patrick Bateman");

let dawg = "Dawg";
let _ = Person::new(format!("Doggo {}", dawg));

What’s interesting here is that all cases are covered:

There’s just a little nit: Cow::into_owned obviously patterns match on the variant, inducing a small yet present runtime overhead. We tend to prefer using Cow<_> to dynamically dispatch the decision to clone at runtime, while in our case, it’s more about a static choice (which API function version to use).

Into, as simple as it gets

If you look at the Into<String> implementors, you’ll find impl Into<String> for String – actually, this is a blanket implementor – and impl<'a> Into<String> for &'a str. That implements the same semantics as Cow<str>, but at the type level, removing any remaining runtime overhead.

The new code is even simpler:

impl Person {
  pub fn new<N>(name: N) -> Person where N: Into<String> {
    Person { name: name.into() }
  }
}

Obviously, there are drawbacks:

Let’s draw a conclusion

What I wanted to highlight in this blog post is that &_, AsRef<_>, Cow<_>, and Into<_> all have different semantics that can be used to encode different contracts in public interfaces.

This list gives you a good idea about what interface you should use in your public interfaces. Read-only? Owned data? Read and maybe write? All those semantics are visible through those types and traits, you should definitely try to wrap your finger around using all of them. Of course, if you’re just writing a small utility function that needs to borrow the .name of a Person, passing in a &Person seems completely sound. Most of the time, if you don’t know where the data comes from, be the more inclusive and generic as possible.

I hope you like that small article, and as always, keep the vibes.