DI tricks in C++: ref or ptr?

Spread the love

Dependency injection, AKA “DI”, AKA “passing arguments”, is commonly used in modern software design. You’ll see this approach quite often in C# and Java applications, mainly because these languages have specific support for interfaces and interfaces are a big part of DI in practice. C++, however, supports bonafide multiple inheritance; you could say C++ has “interfaces,” but only by convention. Here is a C++ “interface” type:

template <typename T>
class ISerializer
{
public:
    virtual void serialize(const T& value, std::ostream& out) = 0;

    virtual ~ISerializer() = default;
};

This is a pure abstract base class with a virtual destructor. Voila, it’s an interface type! This one represents a way to serialize a given data type T to an output stream.

If we want to map from DI in C#/Java to DI in C++, we need two things: one of those interface-like classes we just saw and the knowledge to use it correctly. On the second point, we need to be aware that we have to use reference semantics to get proper virtual behavior. That means that the following pattern is almost always wrong in a C++ program:

// WRONG WRONG WRONG
ISerializer<MyData> ser = get_data_serializer();
ser.serialize(MyData{}, std::cout);

The problem is very subtle, but you will note that the ISerializer type is returned by value here from the get_data_serializer function. This problem is referred to as object slicing (as in, we are “slicing off” the derived class information and leaving behind only the base class). It will thwart your attempts to achieve real polymorphism, which is usually what you need when programming against interfaces.

One common fix is to use a smart pointer, say, std::unique_ptr, to wrap our interface. This technique is demonstrated in Vlad Rișcuția‘s helpful article, “Dependency Injection in C++.” The bad example above can be thus rewritten correctly as follows:

// OK; the smart pointer gives us the ref semantics we need
std::unique_ptr<ISerializer<MyData>> ser = get_data_serializer();
ser->serialize(MyData{}, std::cout);

This approach is tried and true. It always works. Yet, it has a cost that C++ programmers especially are wary of — dynamic allocation! In many scenarios, this is not a cost worth worrying about. There are a dozen other things that tend to crop up in large C++ systems which are more expensive (e.g., string handling). But there is something to be said about seemingly unnecessary heap allocation. If my object graph can be fully relegated to the stack, why must I use the heap?

Let’s look at a somewhat realistic example. Imagine a Config class that contains zero or more named ConfigSections:

class ConfigSection
{
public:
    using Values = std::map<std::string, std::string>;

    ConfigSection(std::string&& name);

    const std::string& name() const noexcept;

    const std::string& value(const std::string& key) const;

    Values::const_iterator begin() const noexcept;
    Values::const_iterator end() const noexcept;

    void insert(std::string&& key, std::string&& value);

    void remove(const std::string& key);

private:
    std::string m_name;
    Values m_values;
};

class Config
{
public:
    using Sections = std::map<std::string, ConfigSection>;

    Config();

    ConfigSection& section(const std::string& name);

    Sections::iterator begin() noexcept;
    Sections::iterator end() noexcept;

    Sections::const_iterator begin() const noexcept;
    Sections::const_iterator end() const noexcept;

    void insert(ConfigSection&& section);

    void remove(const std::string& name);

private:
    Sections m_sections;
};

This might be a reasonable way to represent configuration loaded from an INI file. And in fact, maybe it would be useful to support serializing this data back out to a stream with our ISerializer interface above:

class ConfigSectionIniSerializer final : public ISerializer<ConfigSection>
{
public:
    void serialize(const ConfigSection& value, std::ostream& out) final;
};

class ConfigIniSerializer final : public ISerializer<Config>
{
public:
    ConfigIniSerializer() noexcept;

    void serialize(const Config& value, std::ostream& out) final;

private:
    ConfigSectionIniSerializer m_inner;
};

This implementation would work fine but only if we never need to customize the serialization mechanism. This is because we have a hardcoded dependency on the concrete ConfigSectionIniSerializer:

// default construct the m_inner serializer -- not injectable!
ConfigIniSerializer::ConfigIniSerializer() noexcept : m_inner{}
{}

void ConfigIniSerializer::serialize(const Config& value, std::ostream& out)
{
    for (const auto& s : value)
    {
        out << '\n';
        m_inner.serialize(s.second, out); // by-value dependency
    }
}

As I mentioned before, by-ref semantics are generally what we need for polymorphic behavior — especially in a DI context. So maybe we could rework the outer Config serializer like this:

class ConfigIniSerializer final : public ISerializer<Config>
{
public:
    ConfigIniSerializer(ISerializer<Config>& inner) noexcept;

    void serialize(const Config& value, std::ostream& out) final;

private:
    ISerializer<Config>& m_inner;
};

This is okay — we now have the ability to inject different implementations. However, we have a new requirement to ensure a strictly nested lifetime of the outer Config serializer and the inner ConfigSection serializer. In a large enough program, this becomes quite hard to manage. We could always swap out the reference for a std::unique_ptr, but this gets us back to the original dilemma — maybe sometimes it’s fine to use a reference (in a small, constrained scope, say) and other times we would rather use a smart pointer.

Normally a problem like this would call for a templated serializer. Perhaps there is a good way to make that work, but I see a much easier solution. Why not try std::variant? Essentially, we need a type that can hold a reference or a smart pointer and std::variant seems perfect for that job. There is one problem, though — a std::variant cannot hold a reference. In this case, though, that’s only a minor problem. We have a workaround:

template <typename T>
class RefOrPtr
{
public:
    RefOrPtr(T& ref) noexcept : m_value{ &ref }
    {}

    RefOrPtr(std::shared_ptr<T> p) noexcept : m_value{ std::move(p) }
    {}

    RefOrPtr(std::unique_ptr<T> p) noexcept : m_value{ std::move(p) }
    {}

    T* operator->()
    {
        if (std::holds_alternative<T*>(m_value))
        {
            return std::get<T*>(m_value);
        }
        else if (std::holds_alternative<std::shared_ptr<T>>(m_value))
        {
            return std::get<std::shared_ptr<T>>(m_value).get();
        }
        else
        {
            return std::get<std::unique_ptr<T>>(m_value).get();
        }
    }

private:
    std::variant<T*, std::shared_ptr<T>, std::unique_ptr<T>> m_value;
};

The RefOrPtr class wraps a variant over a raw pointer, a std::shared_ptr, or a std::unique_ptr. Note that the raw pointer value is initialized from a real reference, so we haven’t lost sight of our original goal (it’s but a minor implementation detail!). Now instead of tying yourself to one of these three options, we can have it all! And with judicious use of arrow operator overloading, we have a uniform access pattern to get at the internally held ref-or-pointer.

class ConfigIniSerializer final : public ISerializer<Config>
{
public:
    ConfigIniSerializer(RefOrPtr<ISerializer<ConfigSection>>&& inner) noexcept;

    void serialize(const Config& value, std::ostream& out) final;

private:
    RefOrPtr<ISerializer<ConfigSection>> m_inner;
};

// ...
void ConfigIniSerializer::serialize(const Config& value, std::ostream& out)
{
    for (const auto& s : value)
    {
        out << '\n';
        // using `->` operator to access the interface method
        m_inner->serialize(s.second, out);
    }
}

Because the RefOrPtr may contain a std::unique_ptr, it is a move-only type. Here we take it by r-value reference in the serializer constructor (to be std::move‘d into place in the implementation). RefOrPtr is also implicitly convertible from any of its variant alternatives (because it defines single argument non-explicit constructors). This means it’s pretty simple to use as a drop-in replacement.

What do you think? Is RefOrPtr a DI trick or DI treat?

For a complete implementation of all the code mentioned above, check out the dinject sample on GitHub.

Leave a Reply

Your email address will not be published. Required fields are marked *