AWS Transit VPC: Don't fear the CSR!


What a difference a few years make...or have they? 

Since my 2013 entry on the same topic, inter-regional AWS overlay networking solutions have matured at AWS, at least ever so slightly. Since that time AWS have gracefully provided a few more-seamless ways to establish inter and intra-regional VPC transport, though not all fully native ones.

In 2014 landed VPC peering, which is a useful, natively-integrated AWS product, though only disappoints once one realizes that it's restricted to intra-regional VPC peering connectivity only. While great for linking VPC connectivity within a single region, you're out of luck still if you wanted to interconnect beyond those regional boundaries (US-East-1 <-> EU-West-1). The non-transitive nature also limits the design you can use even within that single region, since full connectivity between all of your VPCs would require a full mesh.

Then, in 2016 comes what AWS calls the Transit VPC to tackle that very real issue of inter-regional VPC resource connectivity, engineered on the principal that you (not AWS) are to carry the responsibility of support and appliance management (read: knowing how to scale) for your own little L3VPN core. Cloudy? Meh, not entirely.

2017 Update: As most people predicted, AWS have released inter-region peering which may solve your use case if you've just a few VPCs to cross-connect. However, if you need large scale (100+ VPC inter-connectivity), better security visibility or just want to understand the solution better, then please continue reading!

Here we'll take a look at the different aspects and requirements of Transit VPC - When to use them, and why.

Caveats

I'll point out again that AWS did made a design decision with this solution to put the network engineering burden mostly onto you, the customer. This stands in contrast to AWS silently managing and stitching your VPC networks together via some complex, hidden MPLS or SDN-type solution which manages everything for you in the background.

Instead, virtual router appliances (from Cisco) are directly exposed in EC2, similar (at least in principal) to deploying an EMR cluster. As such users who typically don't have first-hand network engineering and scaling experience may need some guidance on the topic of IP overlays.

While the networking aspects used aren't overly complicated, it's still important to know how to right-size them initially for anticipated throughputs, be able to troubleshoot things like BGP and IPSec, monitor them for patching, and so on.

When to Actually Use Transit VPC

So, do you need to use a transit VPC? The solution can be quite beneficial if you can say yes to at least one of the following:
  • You need highly-available, inter-region VPC connectivity
  • You'd like an aggregation point for existing or planned DirectConnect VIFs and/or VPNs (including: remote user, colo, B2B, multi-cloud, remote/branch offices, etc.)
  • You require direct connectivity between some or all of your VPC subnets that reside in different AWS regions
  • You need new or planned globally deployed VPCs to be able to access resources across VPNs and VIFs back in a centralized location
These scenarios are certainly the most common when helping customers plan their solution. I'd also note that some of these are more advanced and require a bit more hands on configuration to accomplish, something not really discussed here but giving an idea of what's possible.

Hub

As the name suggests, the deployment of this solution is done in a dedicated VPC in a region of your choice. When choosing that designated region, centrality is important. It's helpful to place the VPC in a region where you might already be performing heavy VPC-peering, VGW VPN terminations, or most importantly: terminating (or planning to terminate) DirectConnect/DX connections.

Once that is decided, the CloudFormation template will deploy two instances of Cisco CSR in the public subnets of a new VPC (all automatically). This is your hub.

The Cisco CSR 1000V, practically a virtual version of the ASR 1000, runs the robust IOS-XE operating system that's capable of much more than simple SVTI tunnel termination. For instance, once they are provisioned it's completely possible to configure these same instances for features like AnyConnect for user SSL VPN termination, designated hubs for a DMVPN/FlexVPN topology, site-to-site VPN, etc. And since they're the hubs for your AWS VPNs, connectivity to all sorts of internal and external resources should be well established.

Spokes

While the transit VPC acts as a "hub", all other new or existing VPCs can potentially be the "spokes". Generally speaking, spokes are what AWS calls the VPC(s) that will be automatically connected back in to the hub (and therefor other spokes) via VPN. Technically, a Transit VPC spoke is really just a tagged VGW that's attached to a VPC, though it's also possible for a VGW to be unattached from a VPC but instead to something like a DirectConnect VIF for including it in the route exchange.

Connecting spoke VPCs is made dead simple. After the solution's been deployed, all you need to do is create a VGW within the VPC you want tied in, and just apply a specific tag to it:

    [Key: "transitvpc:spoke", Value: "true"]

Then, as long as the VPC is in the same account (or even in other, multiple linked accounts) the Lambda "poller" operation will discover it within a few minutes, auto create the dynamic VPC VPN, configure the Cisco CSR, bring up the VPN tunnels, dynamically learn (through BGP) about other networks, and finally enable IP connectivity.

VGW Poller Logic (image source: http://docs.aws.amazon.com/)


Thinking of Cost? Here's the Deal.

To be clear, this isn't a cheap solution.
September 2017 Update: Cisco have dropped the SW/License charges on CSR marketplace by approximately 50%:



You will have to pay for both EC2 compute time and software licensing for your Cisco CSR instances. Since Cisco's name is on the solution, the majority of your cost is going to come from running these instances and will vary depending on your throughput requirements. I've compiled an estimated projection for the different options since the calculator doesn't factor these in together simply.

*Note: The "Bring Your Own License" AMI option is not recommended due to licensing inventory headaches and software-restriction to 2.5Gbps.

Overall Thoughts

The solution itself, for utilizing a third-party vendor, is impressively well-conceived by AWS. In less than five minutes deployment time the entire solution can be deployed soup to nuts, notably with the following built-in:
  • Redundancy (via N+1 BGP failover and EC2 AutoRecovery)
  • Lambda scheduled VGW poller operation
  • Lambda Cisco "configurator" operation
  • KMS for config automation auth private key storage
  • S3 bucket for VPN configuration storage
The deployment documentation, too, is very well thought out and easy to follow step-by-step. Once you've selected the region in which to install the Transit VPC, the entire operation is then carried out via the supplied CloudFormation stack. [See my entry on deploying a CloudFormation template]

Conclusion

This post just wouldn't be complete without a somewhat cynical amendment.

With Cisco's CSR being one of the first commercially available virtual routers in the AWS market place (and not to mention a likely partnership of some kind), it makes sense as to why it's the center of this solution. However I'd personally love to see Amazon add support for more virtual appliances capable of BGP over SVTI, such as Juniper vSRX or Palo Alto VM as commercial options, and for the most cost conscious, open source variants like VyOS or even pfSense.


Also, where's the security detail? For all the work that's gone into this and the value-brimming features available in vendor-branded appliances, I'm quite surprised that just base IP connectivity is provided. The security guy in me is inclined to say that connecting networks in regions across the globe is a massively valuable opportunity to implement some sort of traffic inspection. Nobody is really talking about this yet but it's something where I, without a doubt, expect things to evolve rapidly, whether it be in this traditional form or through more NFV-like mechanisms.

With such a heavy price tag on the Cisco CSR I've seen more than one customer walk on this solution and go with their own. I don't necessarily blame them. In my experience an average AWS deployments consist of just one or two VPCs max, so less fancy solutions can easily make more sense.

Finally, since it's easy to take arm-chair pot shots on the Internet, AWS's end game should presumably be to natively integrate this so that these ugly plumbing bits are better abstracted like we get with VPC peering. Call it "Inter-regional Peering". (Update: Annnnd they did)

Corporate Plug: Need help designing Transit VPC? We specialize in helping customers navigate AWS. Contact Rackspace today!

Popular posts from this blog

Configuring Cisco ASA for Route-Based VPN

Running ASA on Firepower 2100: An End-to-End Guide

Up and Rawring with TRex: Cisco's Open Traffic Generator