It’s easy to see the benefits of Upwind’s runtime solution for network mapping when it comes to security. Here, we want to talk about an additional use case – cost savings.

In the world of cloud computing, managing costs while ensuring optimal performance is a balancing act that many organizations face. An often overlooked aspect of this challenge lies in the costs associated with Network Address Translation (NAT) gateways, which can become especially costly when traffic to and from the Internet becomes substantial. This blog post explores a practical approach to reducing these costs, highlighting the effectiveness of Upwind in identifying and mitigating high NAT gateway charges.

Understanding NAT Gateway Costs

NAT gateways play a crucial role in operating your infrastructure on AWS, enabling instances in a private subnet to connect to the internet or other AWS services, without allowing inbound traffic initiated from the internet. While NAT gateways are essential for security and network architecture, their costs quickly escalate when data volume grows. Charges are primarily driven by two factors: the amount of data transferred through the gateway and the gateway’s hourly usage rate. As traffic increases, so do the costs, making it important for organizations to monitor and manage their NAT gateway usage efficiently.

The Challenge: Unraveling High NAT Gateway Charges

Our journey began when we noticed unusually high NAT gateway charges in our infrastructure. We needed to not only identify the source of this spike in costs but to find a way to reduce costs without compromising on the functionality or performance of our systems.

Identifying the Culprit

We could follow the standard 4-step approach outlined by AWS in their  documentation on how to investigate NAT gateway costs:

  1. Enable VPC Flow Logs: Turn on VPC flow logs for visibility into the traffic passing through our network.
  2. Analyze Traffic: Analyzing the most-used IPs requires sifting through extensive data.
  3. Identifying the Source\Destination of the Data: Get lists of IPs that are communicating with each NAT gateway (when you have more than one, that’s one more thing to aggregate).
  4. Drill Down: If you look at public databases to determine where these IPs belong, this data will usually be very general, such as “AWS” or “Github.” If you want to drill down, you’ll need much more work. 

Enter Upwind

Rather than using this tedious and time-intensive approach, Upwind’s cloud security platform provided us with precise insights into our network traffic out-of-the-box, allowing us to pinpoint the most “chatty”  resources within our AWS environment. Interestingly, we learned that AWS services were one of the top internet egress destinations.

Upwind’s capability to differentiate between the individual AWS services was a game-changer for our investigation. It helped us identify the EC2 endpoint including traffic for the API’s of EC2 instances, autoscaling groups, and VPC resources as a significant contributor to our NAT gateway costs. We use this in our platform to collect data from our customers’ accounts using AWS APIs.

Here, for example, we see 5.7MBp/s going through the internet to the AWS EC2 service, and after Upwind identifies the service’s IP from all the communication and aggregates it, we can also see which workload utilizes this EC2 service the most.

In addition to helping us identify all of the traffic from our NAT gateway, Upwind also allowed us to see all other related traffic and its destinations, using our unique runtime DNS resolution. For example, here’s Argo CD’s internet egress communication:

Strategic Solution: Implementing an AWS Endpoint

Armed with this knowledge, we created an AWS service endpoint specifically designed to keep some of the traffic destined for the EC2 endpoint within AWS. By doing so, we significantly reduced the load on our NAT gateway. This solution was implemented in the us-east-1 region, utilizing a regional endpoint to achieve our cost-saving goals.

Creating a service endpoint can be done from the AWS console. To do this you’ll need:

  • The ID of the VPC you’d like to add it to
  • The AWS service you want to access internally (in our case it was com.amazonaws.us-east-1.ec2)
  • Type Interface or Gateway (available for S3 and DynamoDB)
  • The private subnets you’d like to route the traffic from
  • Security group (this can be left open for all internal traffic, or have it fine-grained for selective access)

There are also very easy-to-use Terraform modules and Cloud Formation resources.

The Outcome: Significant Cost Savings

By leveraging Upwind and implementing a targeted solution, we achieved substantial cost savings. The reduction in NAT gateway usage reduced our total cost by 40% with minimal work, using Upwind’s built-in detailed network insights.

While not everyone has a use case where EC2 APIs will be significant for data transfer costs, in more common cases like S3, Cloudwatch logs and DynamoDB, fine-grained service identification can be crucial.

Conclusion

There are many challenges to optimizing AWS infrastructure costs, but also opportunities for significant savings and efficiency. Tools like Upwind can transform the way organizations approach these challenges, offering clarity and actionable insights that lead to meaningful outcomes. As you continue to navigate the complexities of cloud infrastructure, we recommend you embrace the tools and strategies that empower you to achieve your goals, ensuring your infrastructure is not only performant but also cost-effective.