Design considerations in the physical datacenter

In the physical datacenters that I’ve worked in we often spear-headed large initiatives to improve east-west traffic. Racks in the same row communicating with each other degrading performance in racks unaffiliated with the traffic in question; either due to hair-pinning or a distribution switch that just wasn’t up for the task.

From a pure bandwidth/latency perspective it would be ideal if every node ¹ would have a direct connection to every other node. If you set aside the scaling issues one might encounter with immense route tables, the sheer port count, and cable management headaches for a moment, this would be ideal;

Traffic is isolated to participating parties. You can’t effect your neighbor.
Bandwidth is high. Links aren’t shared.
Latency is as low. There are no hops
The network is robust. If one path goes down only two nodes are affected. And even then, only a subset is affected.

Since a we can’t cable every server to every server, we typically achieve some of the similar benefits that by moving to a spine and leaf architecture (a form clos networking) with ECMP. We get the following benefits;

Improving latency by reducing the number of hops
Improving overall throughput by removing bottlenecks and leveraging active-active links
Scaling is a simple operation, you add more switches.

To the cloud

Despite the fact that the cloud allows infrastructure to be stood up quickly with pre-existing automation tools, one still has to take into account the same design considerations of a physical data center. Maybe your application is a simple crud app that only needs to speak to external customers and north-south is all you really care about. But maybe you have an application that has a large persistence layer, or needs to share an inordinate amount of state.

Since we’re in the cloud, switch ports and cables are no longer problems we have to deal with. There are limits² to peering, particularly the number of peering connections. The limits do set an upper bound, but even with an upper bound, connecting all of these traditionally would be still be tedious and a management headache. Luckily for us we can automate this process with modern IaC tools and frameworks. Meaning, we can get several of the aformentioned benefits with a very simple design and minimal upkeep.

A side note on AWS’s transit gateway

Back in 2018, AWS released the transit gateway which is generally speaking, a hub and spoke architecture. While it has it’s use cases, and I’m sure many are bound to disagree with me, it seems like it’s a step back in a few ways.

centralized routing and firewall rules
throughput limitations

Not to mention, they did without it for what? over a decade? GCP still doesn’t have a 1-to-1 offering like the transit gateway ³. The TGW feels like it was crafted for management via the AWS console.

While transit gateways have a use-case and place. VPC peering has a lot to offer, especially when paired with some custom automation.

Separating all of the configuration components such as routing and firewall rules
- It’s much harder to make a site-wide mistake when managing separate rules from a central place
Improving overall throughput
With automation we avoid the headache of managing all of these peers.

That said there are limitations;

We don’t have transitive peering, so we are forced to go all in on a mesh⁴
As stated earlier the number of peers per VPC have an upper bound

Building a solution

I’ll step through a POC using cdktf and GCP. If you want to see the source for this you can find it here⁵

While there is an initial lift in coding, particularly between transforming your configuration into the form useable by terraform, the day to day maintenance burden is lower due to a sane configuration in a central location. The whole get shit done, MVP, mentality of work is often used by folks as an excuse to cut corners.

Rather than quickly throw up something that with off the shelf automation tools with little regard for future flexibility; why not just come up with a simple, maintainable in-house design?

NOT EVERYTHING HAS TO BE A COMMUNITY SUPPORTED MODULE. In fact I’d argue that for your foundational layers of infrastructure it can be worth your while to do some of the heavy lifting yourself.

You have a complete understanding of the dependencies
You control all future changes
You probably only need a subset of functionality so you can keep your implementation simple
You aren’t at the mercy of any upstream changes.

A contrived example

Defining your configuration, aka a schema

Rather than think about how the tools available to you expect their format defined⁶, I prefer to think in terms of how it would be preferable to modify and manage. Less “how do I make this work” and more “how do I want this to work”

In networks I think in terms of topographies;

And with CDKTF we can enforce an actual type on our configuration.

type Cidr = string;
type Gateway = string;
type VpcAllocation = {
  subNets: Record<Subnets, { cidr: Cidr; gateway: Gateway }>;
};
type Topo = Record<Vpcs, VpcAllocation>;

readonly topo: Topo = {
  infra: {
    subNets: {
      app: {
        cidr: '10.92.0.0/22',
        gateway: '10.92.0.1',
      },
      db: {
        cidr: '10.92.4.0/24',
        gateway: '10.92.4.1',
      },
    },
  },
  management: {
    subNets: {
[snip]

peers I think of mappings

Now, this looks like it could be cut down further as there are always two ends of a peering. One would be tempted to specify the peering once. But by specifying the peering on each end;

It’s easy to look up a given VPC and see what it’s peered with.
We don’t have to come up with a policy/system for determining which network the peering is specified. (ex. Alphabetical? what if we add a new network that upsets that mix?)
There are technically two parts to a peer and we’re specify them both explicitly. IMO this is easy to reason about.

readonly peers: Peers = {
  management: ['infra', 'sales'],
  infra: ['management'],
  sales: ['management'],
};

and in firewall rules I think of traditional firewall chains

In the firewall rules I am using terms that closely map to the GCP terraform provider, although I have added an additional sourceNetworks field that I can match on to pull cidr blocks out of the live config.

readonly firewall: Record<Vpcs, FwConfig> = {
  management: {
    ingress: [
      {
        name: "ping",
        allow: true,
        priority: 1000,
        protocol: 'icmp',
        sourceNetworks: {infra: 'app'} // here
      },
      {
        name: "catchall",
        allow: false,
        priority: 10001,
        protocol: 'all',
        sourceRanges: ["0.0.0.0/0"]
      },

    ],
    egress: [
      {
        name: "ping",
        allow: true,
        priority: 1000,
        protocol: 'icmp',
      },
      {
        name: "catchall",
        allow: false,
        priority: 10001,
        protocol: 'all',
      },
    ],
  },
  infra: {
    egress: [
      {
        name: "ping",
        allow: true,
[snip]

making our representation useful.

types to leverage

Now that we’ve defined the types that are easy to modify.

We can define the type that is easy to work with in the language.

export type SubnetConfig = PartialRecord<
  Subnets,
  { config: ComputeSubnetworkConfig }
>;

export type NetworkConfig = Record<
  Vpcs,
  {
    project: string;
    subnets: SubnetConfig;
  }
>;

And then we can wrap it all up in constructor.

Allowing us to have our cake and eat it too. We have a type that’s easy to define, and a type that’s easy to work with.

  constructor(project: Project) {
    const l = getKeys(this.topo).map((j) => {
      const subnets = this.topo[j].subNets;
      return {
        [j as Vpcs]: {
          project: project.name,
          subnets: getKeys(subnets)
            .map((k) => {
              return {
                [k]: {
                  config: {
                    project: project.name,
                    ipCidrRange: subnets[k].cidr,
                    gatewayAddress: subnets[k].gateway,
                  },
                },
              } as SubnetConfig;
            })
            .reduce((obj, i) => {
              return { ...obj, ...i };
            }),
        },
      } as NetworkConfig;
    });
    this.config = l.reduce((obj, i) => {
      return { ...obj, ...i };
    });
  }

using the representation

And now that our easy to maintain representation has been transformed to one that is a bit easier to program with all of our tasks become simple. We can even stash the return types fields in the object iteslf for easy consumption by other modules.

I won’t go on and on into the details of everything, if you want to see the nitty gritty see the source I linked to above.

networks

networks(i: GcpNetworkingConfig) {
  return getKeys(i.config)
    .map((v: Vpcs) => {
      const rv = new ComputeNetwork(this, v, {
        name: v,
        project: i.config[v].project,
        autoCreateSubnetworks: false,
        deleteDefaultRoutesOnCreate: true,
      });
      return { [v]: rv } as Record<Vpcs, ComputeNetwork>;
    })
    .reduce((obj, i) => {
      return { ...obj, ...i };
    });
}

and subnets

subnetworks(i: GcpNetworkingConfig) {
  return getKeys(i.config)
    .map((v: Vpcs) => {
      return {
        [v]: getKeys(i.config[v].subnets)
          .map((s: Subnets) => {
            const c = assert(i.config[v].subnets[s]).config;
            return {
              [s]: new ComputeSubnetwork(this, `${v}-${s}`, {
                ...c,
                name: `${v}-${s}`,
                network: this.vpcs[v].name,
              }),
            } as Record<Subnets, ComputeSubnetwork>;
          })
          .reduce((obj, i) => {
            return { ...obj, ...i };
          }),
      } as Record<Vpcs, Record<Subnets, ComputeSubnetwork>>;
    })
    .reduce((obj, i) => {
      return { ...obj, ...i };
    });
}

and firewalls

ok, admittedly this one’s a bit beefier and shows some of typescripts warts. ⁷

firewall(i: GcpNetworkingConfig) {
  return getKeys(i.firewall)
    .map((v) => {
      return getKeys(i.firewall[v])
        .map((direction) => {
          const ad = (i: boolean): 'allow' | 'deny' => {
            return i == true ? 'allow' : 'deny';
          };
          return i.firewall[v][direction]
            .map((j) => {
              const sourceRanges =
                j.sourceRanges != undefined ? j.sourceRanges : [];

              const sourceNetworks =
                j.sourceNetworks != undefined
                  ? getKeys(j.sourceNetworks).map((net) => {
                      return this.subnets[net][
                        // typechecker ain't that great
                        assert(assert(j.sourceNetworks)[net])
                      ].ipCidrRange;
                    })
                  : [];
              return {
                [v]: {
                  [direction]: {
                    [j.name]: new ComputeFirewall(
                      this,
                      `${v}-${direction}-${ad(j.allow)}-${j.name}`,
                      {
                        name: `${v}-${direction}-${j.name}`,
                        project: this.vpcs[v].project,
                        [ad(j.allow)]: {
                          protocol: j.protocol,
                          ports: j.ports,
                        },
                        priority: j.priority,
                        sourceTags: j.sourceTags,
                        sourceRanges: [
                          ...sourceRanges,
                          ...sourceNetworks,
                        ],
                        network: this.vpcs[v].id,
                        direction: direction.toUpperCase(),
                        logConfig: {
                          metadata: 'INCLUDE_ALL_METADATA',
                        },
                      },
                    ),
                  },
                },
              } as Record<
                Vpcs,
                Record<Direction, Record<string, ComputeFirewall>>
              >;
            })
            .reduce((obj, i) => {
              return { ...obj, ...i };
            });
        })
        .reduce((obj, i) => {
          return { ...obj, ...i };
        });
    })
    .reduce((obj, i) => {
      return { ...obj, ...i };
    });
}

You get the idea

So yeah, peering is good and typescript is alright for this sort of thing. It feels a bit heavier than dealing with HCL when yout take into account the dev environment and initial setup. But dealing with it on a day to day basis is less painful despite the fact that the tools in TS/JS ecosystem are resource hogs.

Additionally, I could imagine defining an abstract class with methods like network, firewall, etc that accept a config block and set up the proper constructs. Then you could have a class for each cloud provider that implemented the methods, created the proper contstructs, and mapped the fields properly. Keeping your config somewhat agnostic between the providers. Naturally there’d be edge cases all over the place, but it’s a good thought exercise for practicing clean design.

In this context, when I say node I mean server, appliance, anything. Not network nodes in the strict sense. ↩︎
Currently it defaults to 50 for AWS and 25 for gcp. Although those limits can be increased. ↩︎
Although it’s likely not needed since the way GCP segregates infrastructure by projects is a lot less heavy handed than how AWS namespaces via account. The account is such a brutal namespace. ↩︎
Well not exactly, we’re not going to peer VPCs that don’t need to talk to each other. Satisfying security requirements and helping keep us under the VPC peering limits. ↩︎
md5sum, gpg signature ↩︎
Not that there isn’t at time and place to think of your output format ↩︎
While my approach could likely be improved, not being able to type narrow via ternary is quite a shame. ↩︎

Cloud Network Design: VPC peering

2022-07-08