GCP Network Design: The Basics
This is a write up on best practice networking basics for Google Cloud Platform with what I've learned over the last year while working on a large company migration to the cloud from AWS. I hope that it comes in handy for anyone new to designing networking in and to Google Cloud, or those who may just be generally interested in the details of the networking architecture stitching their projects together.
Overview
The Google Cloud Platform (the AWS-like division of the larger "Google Cloud", hereon referred to as just 'GCP'') organizational resource layout is generally like this:Created cloud resources live in a VPC, which are part of a project, which are organized in folders or subfolders.
The flow is... Create a folder for each BU or department in your organization. From there, create a project for nearly everything else. This includes creating a dedicated project for the shared VPC to exist in. Nearly every different use case pretty much gets its own project, even the resources specific to InterConnect. I mean everything!
Shared VPC
First and foremost, Google recommends using their 'Shared VPC' architecture when laying the foundational architecture. To understand what that means, just imagine that all of your networking components (VPC subnets, routing, firewall rules, VPN terminations, physical interconnects, NAT gateways) live in a single console view and/or Terraform workspace that only your network administration team had control of. Meanwhile, individual team projects then are created independently whilst "sharing" the subnets that your network team have already created and designated for use.
Advantages of Shared VPC
- No need to create new subnets for every team/project. They instead just share the network you've created for the shared VPC by being 'attached' to it as what's referred to as a 'service' project
- Central point of control for network teams
Disadvantages of Shared VPC
- GCP VPC peering currently has a limitation where the number of forwarding rules and VMs in a network and all its directly peered networks remains the same and cannot be exceed
- Some services need subnets that aren't a shared VPC subnet
Unlike Amazon, a VPC is a global construct and not a regional one. There's also no super net that you need to specify for it during creation. That means that you can can create subnets for the VPC in whatever region you want using whatever subnets you want, and they'll be able to communicate with each-other using Andromeda as the magical, ubiquitous backend data-plane.
Dedicated Interconnects
Dedicated Interconnects locations will connect you not to just that region but to the whole continent. For example, if you connect with GCP via Dedicated Interconnect in Equinix SV1 (Silicon Vally) you don't just have private IP connectivity to us-west1 but to every region in North America. This stands in contrast to AWS where the opposite is true: Where you connect is what you have access to.(unless using DirectConnect gateways, but I digress)
Public peering is also a little strange through GCP. Unlike AWS, there's no concept of a public VIF to peer with Google publicly. GCP only really supports private VPC peering over an InterConnect. So, in order to access public GCP services (such as Google Cloud Storage) you need to modify your Cloud Router to advertise Google's dynamic, every-changing IP address advertisements to you, or a single public /30 for use with their 'restricted' Google APIs gateway with a pretty tricky CNAME trick, the latter being what we ended up doing. This is something that Google clearly needs to work more on. If you qualify to publicly peer with Google properly at an IX then it's almost certainly the easier way to go.
High Availability
When it comes time to design how you will physically interconnect your network with Google you have a few standard options:99.99%
"Four nines" availability for Dedicated InterConnect [generally speaking] translates to 52.60 minutes of permitted downtime per year, or 4.38 minutes per month, 1.01 minutes per week, 8.64 seconds per day.This is the recommended design for Google for hosting pretty much anything serious and mission critical. This implies that you already operate a network with PoPs in two different geographical regions, have redundant routers at each PoP, and connect them through some sort of backbone infrastructure.