The Pitchfork Layout (PFL)

URL: https://mizux.github.io/pitchfork.html Project: ISO/IEC JTC1/SC22/WG21 14882: C++ - Standards

Introduction

The phrase “it doesn’t matter, just be consistent” has really harmed the C++ community more than it has helped. “A foolish consistency,” if you will. Developers’ desire to keep their codebase unique and “special” has held back the development of tooling when every user insists that a tool support their own custom style.

For some things, like where you put the opening brace, the difference might truly be insignificant. For other things, this isn’t the case.

One such thing is the layout of a project on a filesystem.

Incongruous layouts not only harms tooling, they create unnecessary disparity, create friction during on-boarding of contributors, and present a huge burden to beginners for a task that shouldn’t even take more than a few seconds of thought.

For virtually all projects, they are some (maybe improper) subset of a few key components:

Compilable source files
Public headers
Private headers
Source files containing entry points (main() functions)
Documentation files
Tests
Samples and examples
External libraries which have been embedded within the project structure
“Add-ons” to the source (e.g. language bindings, optional plugins, platform bindings)

In addition, it is very common to see projects which subdivide themselves into submodules. These submodules may also have dependencies between each-other. Submodules greatly increase the complexity potential, but aren’t completely out of the question. A project layout standard must be able to handle subdivided projects.

Background and Terminology

The ideas and concepts presented herein are not novel. Rather, they are built upon the work of others. Some come from professional idea and experience, while others come from community convention, which has gradually converged on to prefer certain patterns while decrying others.

Physical vs. Logical Layout

This document builds heavily upon the work of John Lakos, who has spear-headed much work on the aspects of physical design. Much work has been put into evangelizing in the area of logical design, but physical design to this day remains a topic that (in this author’s opinion) is too often overlooked when designing and implementing systems, as well as the teaching thereof.

John Lakos’s 1996 book Large-Scale C++ Software Design, to this day, remains as the book on physical design in C++.

So what’s the difference between physical and logical design?

Logical design pertains closely to programming-language-level aspects of the design of a system.

Physical design pertains to the aspects of design outside of what is encoded by the language.

The source unit dealt with by the C and C++ standards is the translation unit. This corresponds to a source file which has had all preprocessor directives resolved. This is the closest that the language standards get to discussing physical design of systems.

Physical design, despite its name, still lives within the digital space, of course. It concerns the placement and naming of files and directories within a filesystem, as well as how to coordinate communication between physical units where the relationship between their filesystem locations is not deterministic.

This document does not address much in the way of logical design. Rather, it is specifically written to address many common questions and resolve common answers for physical design.

The Physical Component

The most fundamental unit of physical design is the Component.

A component should correspond to exactly two files: A header and a source file. For some rare cases it may be beneficial to use more than a single source file in a single physical component, but it may be a sign that further refactoring is needed.

Additionally, a component may have a test source file, but it is not considered a part of the physical component.

Logical/Physical Coherence

This document assumes that developers will work to maintain logical/physical coherence. There must be a coherent relationship between the logical and physical components of a system, and a single logical component must not be spread across multiple physical components. Multiple logical components should not appear within the same physical component.

Note: Logical components can include more than an individual class or function. For example, it may include classes representing private data (for the PIMPL pattern), or friend classes and functions. Blindly splitting every class and function into individual physical components is not required nor recommended.

Physical components must not form cyclic dependencies.

Example of friendship

// list.hpp
namespace acme {

template <typename T>
class list;

namespace detail {

// Class implementing a "list iterator"
template <typename T>
class list_iterator {
public:
    // Public default constructor
    list_iterator() = default;

private:
    // A private constructor used to initialize the iterator to the proper state
    explicit list_iterator(list<T>&);

    // Permit our list class to construct us with the private constructor
    friend class list<T>;
};

}

template <typename T>
class list {
public:
    using iterator = detail::list_iterator<T>;

    // The iterator class is a friend, giving it access to our internals
    friend class detail::list_iterator<T>;
};

}

The list.cpp is empty other than a single include directive:

// list.cpp
#include "list.hpp"

The purpose of the otherwise empty list.cpp is to ensure that the list.hpp file will compile in isolation, ensuring that the header has the necessary #include directives and/or forward declarations.

Despite having two distinct language-level class templates, we have a single logical component. We say that these two class templates are “co-located.”

We also have two source files, a list.hpp and list.cpp, but they together form a single physical component.

Example using PIMPL

// connection.hpp
#include <memory>

namespace acme {

// Forward-declare the private member data type
namespace detail { class connection_ipml; }

// The wrapper class
class connection {
private:
    // The actual PIMPL
    std::unique_ptr<detail::connection_impl> _impl;

public:
    connection(const std::string& address);
    ~connection();
};

}

And a corresponding connection.cpp:

#include "connection.hpp"

namespace acme::detail {

class connection_impl {
    // Whatever we need for the connection...
};

}

namespace {

std::unique_ptr<acme::detail::connection_impl>
init_private_data(const std::string& address) {
    // Initialize and return the connection data...
}

}

acme::connection::~connection() = default;
acme::connection::connection(const std::string& address)
    : _impl{ init_private_data(address) }
{}

Again we have two different (non-template) classes, and a single logical component. In the header, the implementation detail class is only forward declared, not fully defined. In the connection.cpp we provide the definition for the detail class.

Packages, Projects, Modules, and Submodules

The terms defined in prior sections were heavily borrowed from John Lakos’s work. His work also often uses the term package to refer to a collection of source components. This would roughly correspond to the entire source repository, including build files, support files, documentation, scripts, and data.

Unfortunately, the term “package” has been heavily overloaded. These days, “package” most often refers to a unit of distribution of software, rather than the software itself. For this reason, this document prefers the term “project,” and will use it instead of “package,” although they mean the same thing.

In a similar vein, the term module has also been heavily overloaded. In the wake of upcoming C++ module specifications and implementations, the word “module” will be avoided as to avoid ambiguity and confusion as a “module” corresponds with neither a package/project nor a physical component.

Still, a term is needed to refer to the subdivisions of a large project into smaller elements. For example, Qt is an enormous framework of many interconnected pieces. To refer to these pieces Qt uses the term “module”, which has already been excluded.

For lack of a better unqualified term, the best this author can find is a qualified term: Submodule. This term will be used to denote separate sections of a project which can be consumed on as-needed basis. See the submodules for more information.

Project Files

PFL prescribes several files that should be present in the root of the project:

A README.md file should be present. It should be easily readable in plaintext, but may use “enhanced” plaintext like Markdown or similar. It should contain a description of the contents of the directory and subdirectories.
A LICENSE file must be present for projects that wish to redistribute themselves. It must be plaintext (ie. not enhanced with markup).

Tool-support files, such as .gitignore and .clang-format, may be present in this directory.

Other files in the root directory must be pertinent to the build system of the project. Other files should not appear in the root of the project.

Project Directories

PFL prescribes several directories that should appear at the root of the project tree. Not all of the directories are required, but they have an assigned purpose, and no other directory in the filesystem may assume the role of one of these directories. That is, these directories must be the ones used if their purpose is required.

Other directories should not appear at the root.

Note: If you have a need not fulfilled by a PFL directory listed below, that is a bug in this specification, and I would love to hear from you! Before reporting, double-check that what you need isn’t listed below and in the following sections.

build/ A special directory that should not be considered part of the source of the project. Used for storing ephemeral build results. must not be checked into source control. If using source control, must be ignored using source control ignore-lists.
docs/ Directory for project documentation.
src/ Main compilable source location. must be present for projects with compiled components that do not use submodules.

In the presence of include/, also contains private headers.
include/ Directory for public headers. may be present. may be omitted for projects that do not distinguish between private/public headers. may be omitted for projects that use submodules.
tests/ Directory for tests.
ci/ Directory for CI/CD stuff.
examples/ Directory for samples and examples.
external/ Directory for packages/projects to be used by the project, but not edited as part of the project.
tools/ Directory containing development utilities, such as build and refactoring scripts
extras/ Directory containing extra/optional submodules for the project.
data/ Directory containing non-source code aspects of the project. This might include graphics and markup files.
libs/ Directory for main project submodules.

Top-Level Directories

Pitchfork specifies several top-level directories. Other directories should not be present in the root directory, except for what is required by other tooling.

build/

This directory is not required, but its name should be reserved.

The build/ directory is special in that it must not be committed to a source control system. A user downloading the codebase should not see a build/ directory present in the project root, but one may be created in the course of working with the software. The _build/ directory is also reserved.

Note: Some build systems may commandeer the build/ directory for themselves. In this case, the directory _build/ should be used in place of build/.

The build/ directory may be used for ephemeral build results. Other uses of the directory are not permitted.

Creation of additional directories for build results in the root directory is not permitted.

Note: Although multiple root directories are not allowed, the structure and layout of the build/ directory is not prescribed. Multiple subdirectories of build/ may be used to hold multiple build results of different configuration.

docs/

This directory is not required.

The docs/ directory is designated to contain project documentation. The documentation process, tools, and layout is not prescribed by this document.

include/

Note: The include/ and src/ directories are very closely related. Be sure to also read its section in addition to this one.

The purpose of the include/ directory is to hold public API headers.

The include/ directory should not be used if using merged header placement.

See Library Source Layout.

src/

Note: The src/ and include/ directories are very closely related. Be sure to also read its section in addition to this one.

The purpose and content of src/ depends on whether the project authors choose to follow merged header placement or separate header placement.

See Library Source Layout.

tests/

This directory is not required.

The tests/ directory is reserved for source files related to (non-unit) tests for the project.

The structure and layout of this directory is not prescribed by this document.

A project which can be embedded in another project (such as via external/), must disable its tests/ directory if it can detect that it is being built as an embedded sub-project.

Project maintainers must provide a way for consumers to disable the compilation and running of tests, especially for the purpose of embedding.

examples/

This directory is not required.

The examples/ directory is reserved for source files related to example and sample usage of the project. The structure and layout of this directory is not prescribed by this document.

Project maintainers must provide a way for consumers to disable the compilation of examples and samples.

external/

This directory is not required.

The external/ directory is reserved for embedding of external projects. Each embedded project should occupy a single subdirectory of external/.

external/ should not contain files other than those required by tooling.

This directory may be automatically populated, either partially or completely, by tools (eg. git submodules) as part of a build process. In this case, projects must declare the auto-populated subdirectories as ignored by relevant source control systems.

Subdirectories of external/ should not be modified as part of regular project development. Subdirectories should remain as close to their upstream source as possible.

data/

This directory is not required.

The data/ directory is designated for holding project files which should be included in revision control, but are not explicitly code. For example, graphics and localization files are not code in the same sense as the rest of the project, but are good candidates for inclusion in the data/ directory.

The structure and layout of this directory is not prescribed by this document.

tools/

This directory is not required.

The tools/ directory is designated for holding extra scripts and tools related to developing and contributing to the project. For example, turn-key build scripts, linting scripts, code-generation scripts, test scripts, or other tools that may be useful to a project develop.

The contents of this directory should not be relevant to a project consumer.

libs/

The libs/ directory must not be used unless the project wishes to subdivide itself into submodules. Its presence excludes the src/ and include/ directories.

extras/

This directory is not required.

extras/ is a submodules.

Library Source Layout

A library source tree refers to the layout of source code files that comprise a single library, which is a collection of code that is exposed to the library’s consumer.

Header File Placement

This document supports two different methods of placing headers in a single library: separate and merged. These two methods are mutually exclusive within a single library source tree.

Separate Header Placement

In separated placement, there are two source directories, include/ and src/. The include/ directory is designated to contain the public headers of the library, while the src/ directory is designated to contain the compilable source code and private headers.

Note: Not all projects will necessarily have private headers.

In separate placement, a single physical component is split between the two directories. The relative path to the parent directory of a compilable source file in the src/ directory must be equivalent to the relative path to the parent directory of the header in the include/ directory that corresponds to the compilable source file.

Example

Given a physical component of a header file meow.hpp and a source file meow.cpp, we might place the header at include/cat/sounds/meow.hpp. The relative path from include/ to the parent directory of meow.hpp is cat/sounds.

Thus, we can compose the path to the compilable source file as joining the source directory name src, the relative path cat/sounds, and the filename meow.cpp to get the path src/cat/sounds/meow.cpp. The following layout results:

<root>/
    include/
        cat/
            sounds/
                meow.hpp
    src/
        cat/
            sounds/
                meow.cpp

Note: The purpose of the deterministic header/file path relationship is to aid both tools and human viewers in understanding and manipulating the source directory structure.

Note: The relative paths of these physical components is not arbitrary. See source directory layout.

Consumers of a library using separated header layout should be given the path to the include/ directory as the sole include search directory for the library’s public interface. This prevents users from being able to #include paths which exist only in the src/ directory.

The library itself should be compiled with both its include/ and src/ directories as include search directories. This ensures that the library itself can access all files within both source directories.

Merged Header Placement

In merged header placement, there is a single source directory, src/.

Much like with separated placement, the relative path from the source directory to the parent of directory of the files of a physical component must be the same. This implies that the files of a physical component will always be sibling files in the same directory.

Example

Given a physical component with header hiss.hpp and hiss.cpp, and a relative path of cat/sounds, the resulting layout is defined:

<root>/
    src/
        cat/
            sounds/
                hiss.hpp
                hiss.cpp

Test Placement

This document distinguishes between unit tests, and other tests. Unit tests are tests that roughly correspond to a single single unit of the source code. This may be a physical component, public API, or combination thereof. The distinguishing of a unit test has implications on where it may be placed.

Merged Test Placement

Optional but recommended is to use merged test placement. In this method, a unit test should have exactly one compilable source file, and that filename stem should be the same as the filename stem of the physical component under test, with a .test appended to it. For example, a test for the physical component comprised of meow.hpp and meow.cpp will be named meow.test.cpp. This unit test source file should be placed in the same directory as a compilable source file of the physical component under test. Therefore, when the unit has a compilable source file, the unit test source file will appear as a sibling of the compilable source file.

Separate Test Placement

If not using merged tests, all tests should be placed within the tests/ top-level directory. There are no mandates on the layout within tests/.

Source Directory Layout

For the purposes of this section, the include/ and src/ top-level directories are both included in the definition of “source directories.” They are root of the library source tree. They are named as such because they contain the primary “source files” of the source language (C and/or C++).

No non-source-code files will should be placed or generated in any subdirectories of a source directory. That is, the root of a source directory may contain non-source-code files, but no child directories should.

Conversely, no source-code files should be placed in the root of a source directory. That is, all source files must have qualified paths relative to the root of their source directory.

Header files and source files should correspond to a logical component of the project. For example, a geometry library might contain a circle class along a single header and (optional) source file to represent it. If no other logical components appear in that header and/or source file, the logical component can be said to be the “main component” of the corresponding source component. The main logical component may be a class, function, or some grouping thereof.

The layout of the source tree should closely correspond to the namespace structure of the project.

In C, there is no language-level concept of a namespace, but there is the convention of qualifying globally visible identifiers with a “pseudo” namespace. For example, a libfoo might define a foo_create(), where the prefix foo_ acts as the “namespace” for the identifier. The namespace for these purposes can be said to be foo.

In C++, which has a language-level namespace, the need to qualify identifiers in this way is not necessary (when using C++ linkage). Instead, these qualifiers are put in namespaces.

Given that each logical component has a namespace, we can associate that namespace with the physical component in which it is defined. This namespace can then be used to generate a qualified path by which the physical component can be found. In this way, physical components can be considered “content-addressable”.

Source files should be placed in a directory relative to the source directory where the relative path is composed by joining the elements of the component namespace as intermediate directories. The stem of the source filename should correspond to the name of the logical component which it declares or defines.

Example

Given a logical component geo::shapes::circle, we can use the qualifying namespace to generate the relative path geo/shapes, and the component’s leaf name as a basis for the filename of circle. Thus, the full path to the sources defining the logical component are geo/shapes/circle.hpp and geo/shapes/circle.cpp.

If using separate header placement, the path from the project or submodule root to the header will be include/geo/shapes/circle.hpp, and the path to the compiled source file will be src/geo/shapes/circle.cpp.

If using merged header placement, then the header will appear as a sibling in src/geo/shapes/circle.hpp.

The unit test for this component will appear at src/geo/shapes/circle.test.cpp.

In some cases, it may be advantageous to separate the compiled source file into multiple compiled source files while maintaining a single header file. In this case, the stem of the source file should begin the same as if it were not subdivided, then qualified with a . separating the distinguishing characteristic of the source file.

Example

Given a logical component geo::shapes::circle, and two extremely complex member functions circumference() and area(), we might split the implementations of these methods into separate translation units. The resulting physical component will have the following appearance:

<root>
    src/geo/shapes/
        circle.hpp
        circle.cpp
        circle.circumference.cpp
        circle.area.cpp
        circle.test.cpp

Submodules

Very large projects (eg. Qt, Boost, JUCE, LLVM) will benefit from the concept of submodules.

Submodule A subdivision of a larger project which can be consumed as-needed. Contains its own source trees, tests, data, and documentation.

Note: Splitting a project into submodules should be considered very carefully. It is an extremely heavy tool with subtleties that often trip people up. Very few projects warrant subdividing themselves in this way. Most projects will do just fine with multiple namespaces and directories within their source tree. Don’t reach for this tool when namespaces and subdirectories will suite you just fine. Converting a project to/from a submodule layout is a very cumbersome task.

The following rules must be taken into consideration when considering or using submodules:

Submodules are not themselves standalone projects.
They should not pretend to be entire projects.
They cannot be consumed independent of the rest of the project.
They should not be versioned separately from the project.
They cannot further subdivide into sub-submodules.

Submodule Root

Submodule Root A directory whose child directories are submodules.

Submodule roots include:

lib/
extras/

Each subdirectory of a submodule root must correspond to exactly one submodule.

Submodule Directory

A submodule is represented as a subdirectory of the project which may contain the following directories:

docs/ for submodule documentation
include/ for submodule includes (if splitting headers)
src/ for submodule sources
tests/ for submodule tests
data/ for submodule data
examples/ for examples

Note: Most of the top level directories are absent from this list.

Submodules directories should not contain other files or directories except those required by tooling.

Submodule libs/

The libs/ directory is for main submodules. It is a submodule root.

When the libs/ directory is present, the top-level src/ and include/ directories must not be present. Instead of having a root source tree, a project using libs/ for submodules should instead refactor itself such that the project has a common basis submodule upon which other submodules will may depend.

The main difference between libs/ and extras/ is that the libs/ submodules found in libs/ should be built by default, although a consumer may opt-out of the submodules on an as-needed basis.

Submodule extras/

The extras/ directory is designated for containing additional submodules for the project which build upon the main component(s). This may include submodules that are not part of the project’s “default” build, or otherwise impose special requirements to be used.

For example, the following might be candidates for extra/ rather than regular components:

“Language bindings” or extra libraries that provide integrations of the project with programming languages or runtimes different from its own.
“Platform bindings” or extra libraries (plugins) that integrate the project with a particular platform. For example, a windowing library that needs to understand how to talk with Windows, Quartz, X11, and Wayland would include its platform integration implementations in this directory.
“Contributed” submodules. Additional submodules that are contributed by the project’s users and included in upstream, but are not officially supported by the project.
Optional submodules that require additional dependencies, or may be prohibitive to include for all users. For example, Qt’s Webkit module is prohibitively time consuming to build, and it requires the presence of dependencies that are only required exactly for that one component.

Build Systems

This document does not mandate any particular build system. The only requirements is that the chosen build system support the layout herein defined.

Libraries

For library projects, a source tree should correspond to exactly one library. That is, at most one one linkable result, and exactly one public include directory. A single #include tree must not require linking more than one library to access all symbols exported from that tree.

Note: Submodules, having their own source tree, may each contain a library that can be linked and consumed independently.

A single source tree should not vary its public interface based on anything other than the target platform. This has several big implications, including (but not limited to) the following:

A library should not offer the user controls for tweaking its public interface.
A library should not change its public interface based on the presence/absence of external software.

References

Here few references:

License

Apache 2. See the LICENSE file for details.