DNS forwarding in VyOS

A lot of small networks do not have their own DNS server, but it's not always desirable to just leave hosts to use an external third-party server either, that's why we've had DNS forwarding in VyOS for a long time and are going to keep it there for the foreseeable future.

Experienced VyOS users already know all about it, but we should post something for newcomers too, shouldn't we?

Configuring DNS forwarding is very simple. Assuming you have "system name-server" set, all you need to do to simply forward requests from hosts behind eth0 to it is "set service dns forwarding listen-on eth0". Repeat for every interfaces where you have clients and you are done.

There are some knobs for telling the service to use or not use specific DNS servers though:

set service dns forwarding listen-on eth0

# Use name servers from "system name-server"
set service dns forwarding system

# Use servers received from DHCP on eth1 (typically an ISP interface)
set service dns forwarding dhcp eth1

# Use a hardcoded name server
set service dns forwarding name-server 192.0.2.10

You can also specify cache size:

set service dns forwarding cache-size 1000

One of the less known features is the option to use different name servers for different domains. It can be used for a quick and dirty split-horizon DNS, or simply for using an internal server just for internal domains rather than recursive queries:

set service dns forwarding domain mycompany.local server 192.168.52.100
set service dns forwarding domain mycompany.example.com server 192.168.52.100

And that's all to it. DNS forwarding is not a big feature — useful doesn't always equal complex.

Loopback and the dummies

"There is no place like 127.0.0.1" the old saying goes. While the loopback interface is most often seen as the interface where the 127.0.0.1 address is assigned by default and where the 127.0.0.0/8 network is routed, and just a way for programs on the same host to communicate over the network without actual network, it has uses in networked context as well.

Before we talk about those use cases, we need to discuss interfaces themselves. In some OSes, such as Cisco IOS, and many BSD derivatives, it is possible to create multiple loopbacks. Linux kernel (and thus VyOS) historically allowed only one loopback (named "lo"), and this behaviour has become too traditional and relied upon to change overnight, so to implement multiple loopback, a new interface type called "dummy" was added. Dummy interfaces are functionally identical to loopbacks so the difference is mostly aesthetic.

This is how to setup a dummy interface: "set interfaces dummy dum0 address ...". If your problem does not require independent interfaces, you can also just add another address to the loopback.

So, why would one want to use a loopback/dummy interface instead of assigning another address to a physical NIC?

Case 1: tunnel endpoints

We have already talked about GRE/IPsec behind NAT and/or with dynamic addresses. Since GRE requires fixed local and remote endpoint  addresses to work, and in a setup where dynamic addresses or NAT is involved you do not have fixed addresses, the trick is to use a pair of addresses made up specially for this purpose as GRE endpoints.

Case 2: management addresses

Suppose you have a router A with two NICs, connected to networks B and C that are connected to each other, so that if any of the links fails, the network as a whole is still operational. However, if you choose either NIC A or NIC B address as a management address, it may become inaccessible if one of the NICs fail, forcing you to manually fall back to the other address.

To prevent this situations, people often assign a dedicated management address to a loopback, create a DNS record for it, and advertise that address to all other routers so that as long as there is at least one path to that router that works, they do not need to worry about addresses of physical NICs to SSH to the router, and are free to change those addresses without having to update the DNS or memorize the new address.

Case 3: iBGP peer addresses

Since iBGP uses the same autonomous system number for all routers, it loses the ability to use AS path for path selection and loop detection. This means to keep the network loop-free, one has to setup it as a full mesh, or use a route reflector.

If we use addresses of physical NICs for session endpoints, we run into the same problem as in the previous use case: a session goes down with the link even if there are other valid paths. A possible solution is to select dedicated addresses for iBGP sessions, assign them to loopbacks, and advertise them to all other routers through a link-state protocol such as OSPF.

Your use case?

If you know other cases when a network setup can be improved by using loopback or dummy interfaces, let us know!

Take a third option: site to site OpenVPN

I've written a long series of post about setting up IPsec VPNs between NATed machines. As you've already seen, with some creative configuration it's possible, but is it always worth the sacrifice? Sometimes performance requirements, or lack of support for anything else on the other side make it necessary, but if the other side is also a VyOS, or another open source system, there's an alternative.

While OpenVPN is usually associated with road warrior VPN setups, it is not limited to it. It does have a site to site option and it's very quick and easy to setup. For some strange reason, that option is neglected by just about everyone who otherwise supports OpenVPN: in OpenWRT and OPNSense it's possible to setup through custom config options, while in Mikrotik RouterOS it's not possible to setup at all. In VyOS we have an explicit option for it.

The advantages are that it takes very few commands to get a tunnel to work, and that it will work in any network where you can forward a single port, even is both sides are behind double NAT. The downside is performance: squeezing even 100mbit/s of encrypted traffic out of it can be impossible, typical iperf figures are 10-20 mbit/s. For many use cases that performance is more than enough, though if you plan to use the tunnel for storage replication or another high-traffic job, that option is definitely not for you and you'll have to resort to IPsec.

How to use AS path matching in your BGP policies

AS path is one of the most fundamental attributes of a (e)BGP advertisments. Its length is the first parameter in the best path selection algorithm (shorter is better), and it's also the sole mechanism of loop detection (if an AS is seen twice, there's a loop). However, despite the important role it plays behind the scenes, it's rather underutilized in routing policies.

A lot of time when prefix-list or specific route-map rule options such as next-hop can do, route filtering and modification based on AS path can do it better.

Let's see how to use it.

Firewall groups today and tomorrow

Substantial work has been done by Marian Tudosoiu to bring IPv6 firewall groups to the current implementation of firewall configuration scripts even before we give it a complete rewrite. It's already merged into the current branch and is expected to be included in the 1.2.0-rc1 release. Now it's probably a good time to make a post about using firewall groups for those who haven't used them yet.

Of course there's still a lot of work to be done, such as integrating groups into NAT, which likely does require a complete rewrite to be feasible.

The concept is simple enough: instead of creating multiple rules that only differ in one address or port number, you create a group with all those addresses and ports, and reference it in a rule.

VyOS has three group types: address groups, network groups, and port groups. In 1.1.8 they can only be used with IPv4 firewall rulesets, including "policy route" rules.

Let's create some groups:
set firewall group port-group ManagementPorts port 22
set firewall group port-group ManagementPorts port 23
set firewall group port-group ManagementPorts port 443

set firewall group address-group Servers address 10.10.0.10
set firewall group address-group Servers address 10.10.0.15
set firewall group address-group Servers address 10.10.0.20

set firewall group network-group TrustedNets network 192.168.5.0/24
set firewall group network-group TrustedNets network 172.18.19.128/25
set firewall group network-group TrustedNets network 10.20.30.144/32

Now we can create a ruleset that uses them. Let's make a rule that references nothing but groups:

set firewall name DMZ-In rule 10 action accept
set firewall name DMZ-In rule 10 protocol tcp
set firewall name DMZ-In rule 10 source group network-group TrustedNets
set firewall name DMZ-In rule 10 destination group port-group ManagementPorts
set firewall name DMZ-In rule 10 destination group address-group Servers

An important part is that you can modify groups on the fly without updating any rules.

As you can see, groups is a simple concept that can be learnt in minutes. Once they are in IPv6 and NAT, their use will be very similar.

The night of living dead protocols: RIPv2

RIP's name seems to have anticipated its ultimate fate. It used to stand for Routing Information Protocol before newer and better protocols killed it. Still, most routers in the world still support it even though few people seriously consider using it, thus making it an undead protocol.

This is mainly for compatibility reasons. After all, if you have an old box at a remote location, connected to a 128kbps ISDN line or worse, that is working fine and is impractical to replace, but supports nothing but RIP, what can you do? Likewise, some modern small routers by Netgear or D-link only have RIP, so if you can't just replace it, there's no other choice.

Besides, RIP remains a valuable teaching tool since it's conceptually simple, and understanding its limitations can help one understand the new routing protocols and their strengths.

What's bad about RIP?

There are some very good reasons to choose something else than RIP if you can. It's one of the oldest protocols in existence, and it was designed at the time when neither modern route selection techniques existed, nor the size of networks was big enough to warrant those techniques. When RIP reached its scalability limits, it was impossible to retrofit those techniques into it, so it was replaced rather than upgraded. RIPv2, the latest version, replaced broadcast advertisments with multicast and added support for CIDR, but that's about it — the fundamental design problems are, well, fundamental.

The biggest issue is that in RIP, routers are only aware of themselves and their immediate peers. They are completely blind to the rest of  the network. Link-state and path-vector protocols such as OSPF and eBGP are aware of the full topology (concrete or abstract), and can either reduce the full network graph to loop-free tree or immediately detect route advertisments as loop-inducing respectively.

All information that RIP advertisments include is the network, next hop, and an integer metric value. No router has any idea how its peers are connected to one another, so there is no way to detect loops before they form.

RIP includes a number of solutions to this problem, and they all have a limiting effect on its scalability.

Split horizon

It's simple — do not advertise routes received from peers back to them. It prevents the trivial loop when peers are trying to route traffic to networks they learnt about from someone else through you, but it cannot prevent wider loops. At least it has no effect on convergence time and doesn't create scalability limits either.

Counting to infinity

Before the other mechanisms were developed, this one was the only measure for detecting unreachable or looping routes. To make sure a route that is not actually reachable will be eventually recognized as such, it was decided to choose a maximum value that represents "infinite" (unreachable) metric. Since this process already can take quite some time, the value had to be small. In RIP, the infinite metric was set to 16. This means if a network has paths longer than 15 hops, the protocol just stops working.

Reverse poisoning

Even with 16 as infinite metric, the process of counting to infinity can be slow. The next idea was to not just wait for unreachable routes to become known as unreachable naturally, but actively advertise them to your peers as unreachable if a peer that was advertising them goes down.

This still is not a complete solution because if a router first receives  an unreachability advertisment from a router who's aware of the true situation, but later receives an update from a router that is still not aware of it, it will start using the second false advertisment.

Hold timer

The ultimate solution is to ignore any advertisment for a network whose metric has recently increased for some time, to avoid receiving updates from routers who do not know the real situation yet. This time shouldn't be too small, it should be similar to the time it takes for updates to propagate through the entire network. A common default value is 180 seconds. That is, to prevent convergence issues completely, it may take a network of just a few routers three minutes to converge.

Configuration

So, suppose you are aware of all issues, but want or need to configure RIP nonetheless. It's pretty simple. First, enable RIP on all interfaces where you want to send and receive advertisments:


set protocols rip interface eth0
set protocols rip interface eth1

Networks that are configured on those interfaces will become a part of the RIP table automatically. If you want to advertise networks that are on other interfaces, you need to add them explicitly:

set protocols rip network 10.74.74.0/24

You can also use "redistribute" and "default-information originate" commands just like in all other routing protocols.

If everything is right, you will see something like this on the neighbor router:

vyos@vyos# run show ip rip 
Codes: R - RIP, C - connected, S - Static, O - OSPF, B - BGP
Sub-codes:
      (n) - normal, (s) - static, (d) - default, (r) - redistribute,
      (i) - interface

     Network            Next Hop         Metric From            Tag Time
R(n) 10.74.74.0/24      10.217.32.132         2 10.217.32.132     0 02:55
C(i) 10.217.32.0/24     0.0.0.0               1 self              0

If you remove the interface or the network statement, you will see the unreachable metric 16 for the duration of the hold timer, and only when the timer expires it will disappear from your table complely:

vyos@vyos# run show ip rip 
Codes: R - RIP, C - connected, S - Static, O - OSPF, B - BGP
Sub-codes:
      (n) - normal, (s) - static, (d) - default, (r) - redistribute,
      (i) - interface

     Network            Next Hop         Metric From            Tag Time
R(n) 10.74.74.0/24      10.217.32.132        16 10.217.32.132     0 01:57

IP tunnels I have known and loved

Today we'll talk about the "classic" IP tunneling protocols.

GRE is often seen as a one size fits all solution when it comes to classic IP tunneling protocols, and for a good reason. However, there are more specialized options, and many of them are supported by VyOS. There are also rather obscure GRE options that can be useful.

All those protocols are grouped under "interfaces tunnel" in VyOS. Let's take a closer look at the protocols and options currently supported by VyOS.

MTU considerations

One issues that often comes up in tunneled setups is that of the MTU and MSS. Generally, the kernel is capable of setting the correct MTU on its own, and as long as end to end ICMP works, there should be no MSS issues either, but if you are in doubt, or simply curious what the total overhead of a tunnel will be, I made a tool for quickly calculating MTU and MSS for any combination of encapsulating and encapsulated protocols. Your contributions and corrections to it are always welcome.

If you want to do MSS clamping, here's an example:

set policy route MSS-CLAMP rule 10 protocol 'tcp'
set policy route MSS-CLAMP rule 10 set tcp-mss '1400'
set policy route MSS-CLAMP rule 10 tcp flags 'SYN'

set interfaces ethernet eth1 policy route MSS-CLAMP
Alternatively, you can insert a global rule like "iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu" and make it persistent across reboot by placing it in /config/scripts/vyatta-postconfig-bootup.script

IPIP

This is the simplest tunneling protocol in existence. It is defined by RFC2003. It simply takes an IPv4 packet and uses sends it as a payload of another IPv4 packet. For this reason it doesn't really have any configuration options by itself.

An example:

set interfaces tunnel tun0 encapsulation ipip

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 203.0.113.20
set interfaces tunnel tun0 address 192.168.100.200

If tunneling IPv4 traffic in IPv4 is really all you want, then it's a pretty good and a very lightweight choice.

IP6IP6

This is the IPv6 counterpart of IPIP. I'm not aware of an RFC that defines this encapsulation specifically, but it's a natural specific case of IPv6 encapsulation mechanisms described in RFC2473.

It's not likely that anyone will need it any soon, but it does exist.

An example:

set interfaces tunnel tun0 encapsulation ipip

set interfaces tunnel tun0 local-ip 2001:db8:aa::1/64
set interfaces tunnel tun0 remote-ip 2001:db8:aa::2/64
set interfaces tunnel tun0 address 2001:db8:bb::1/64

IPIP6

I'm pretty sure in a few decades this is going to be a very useful protocol (though there are other proposals).

As the name implies, it's IPv4 encapsulated in IPv6, as simple as that.

An example:

set interfaces tunnel tun0 encapsulation ipip6

set interfaces tunnel tun0 local-ip 2001:db8:aa::1/64
set interfaces tunnel tun0 remote-ip 2001:db8:aa::2/64
set interfaces tunnel tun0 address 192.168.70.80

SIT (6in4)

I believe SIT stands for "Simple Internet Transition". This protocol is defined by RFC4213, but curiously that RFC or any of its predecessor do not refer to it as SIT, so I have no idea where that nickname actually comes from (if you know its origin, tell me).

It encapsulates IPv6 packets in IPv4, as the name suggests. Unlike two previous protocols, it's very useful right now, as it's used by a number of IPv6 tunnel brokers such as that of Hurricane Electric.

An example:
set interfaces tunnel tun0 encapsulation sit

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 2001:db8:bb::1/64

GRE

GRE stands for Generic Routing Encapsulation, and it lives up to its name as it can encapsulate many other protocols at more than one OSI layer. It is defined by RFC2784.

Due to kernel driver layout reasons, in VyOS it comes in two flavours: "gre" and "gre-bridge". The difference is that while "gre" is layer 3 only, "gre-bridge" is layer 2 and can encapsulate ethernet frames, thus it can be bridged with other interfaces to create datalink layer segments that span multiple remote sites. GRE is also unique in that it can encapsulate more than one protocol at the same time, so it's the only way to create dual stack IPv4 and IPv6 tunnels in a single interface.

Layer 3 GRE example:

set interfaces tunnel tun0 encapsulation gre

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 10.40.50.60/24
set interfaces tunnel tun0 address 2001:db8:bb::1/64

Layer 2 GRE example:

set interfaces bridge br0 

set interfaces tunnel tun0 encapsulation gre-bridge
set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 parameters ip bridge-group bridge br0

set interfaces ethernet eth1 bridge-group br0

As you can see, the bridge-group option for tunnels is in a rather unusual place, different from all other interfaces. I can't remember why is that, and we may make that CLI more consistent in the future even though it will take quite some effort to make it backwards-compatible.

GRE is also the only classic protocol that allows creating multiple tunnels with the same source and destination due to its support for tunnel keys. Despite its name, this feature has nothing to do with security: it's simply an identifier that allows routers to tell one tunnel from another.

An example:

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 10.40.50.60/24
set interfaces tunnel tun0 parameters ip key 10

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 172.16.17.18/24
set interfaces tunnel tun0 parameters ip key 20

Conclusion

Classic IP tunneling protocols are often not very flexible, but a lot of time they do their job very well, and are easy to use in conjunction with IPsec. For a more modern and flexible option you may consider L2TPv3 or VXLAN — but that's a story for future posts.

NAT with a thousand faces

The familiar use cases for NAT are source NAT/masquerade for allowing private subnets access to the Internet, and port forwarding from the Internet to a host in a private network. However, there are more use cases that are less obvious, in part because they are defined by the relative size of the source/destination and translation address options.

One to one NAT

Very common among cloud providers, but equally useful if your ISP is ready to give you an additional address, but not a routable subnet.

Suppose your ISP gave you two addresses, 203.0.113.114 and 203.0.113.115. You use the .114 address for the router itself and want to map the .115 to a server inside your network that has 192.168.136.100 address.

Here's how to do it:

 interfaces {
     ethernet eth0 {
         address 203.0.113.114/24
         address 203.0.113.115/24
         ...
     }
 nat {
     destination {
         rule 10 {
             inbound-interface eth0
             destination {
                 address 203.0.113.115
             }
             translation {
                 address 192.168.136.100
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 192.168.136.100
             }
             translation {
                 address 203.0.113.115
             }
         }
     }
 }

One to many NAT

If the network or range specified in translation address is larger than the network in source/destination address, connections from the same host will be translated to more than one address. In source NAT, this is only useful for a bizzare kind of conspicious consumptions like buying a /24 subnet for yourself and using it all for just your desktop.

In destination NAT, however, it can be used as a simple form of L3, non-application aware load balancing.

Suppose you got 10 web servers all in the range of 192.168.136.100 to 192.168.136.110. You want traffic sent to 203.0.113.115 balanced across them. Here's an example:

nat {
     destination {
         rule 10 {
             destination {
                 address 203.0.113.115
                 port 80,443
             }
             inbound-interface eth0
             protocol tcp
             translation {
                 address 192.168.136.100-192.168.136.110
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 192.168.136.100-192.168.136.110
             }
             translation {
                 address 203.0.113.115
             }
         }
     }
 }

Many to many NAT

What happens if the source/destination and translation networks are the same size though? In that case, only the network part is translated, while the host part should stay untouched.

This is useful for getting around subnet conflicts.

nat {
     destination {
         rule 10 {
             destination {
                address 192.168.136.0/24
             }
             inbound-interface eth0
             translation {
                 address 10.20.30.0/24
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 10.20.30.0/24
             }
             translation {
                 address 192.168.136.0/24
             }
         }
     }
 }

If you know more variations, please let me know.

Configuration versioning and archiving in VyOS

Last time I promised "node copying/renaming, node comments, and other little known features of the VyOS CLI", but the post actually only mentioned copying/renaming and comments, but not other features. It's time to fix that: today we'll discuss configuration versioning and archiving.

One of the great things about the config model with editing and commits being distinct stages is that it's feasible to execute some actions when the config is changed. In fact, you can execute arbitrary actions via pre/post-commit hooks, but there are built-in actions as well, namely configuration versioning and archiving to a remote location. This model, first introduced by JunOS, makes configuration is a lot more manageable than older Cisco style models.

This approach renders tools like Rancid or Oxydized redundant since the system can make a snapshot of the running config when the change is made rather than periodically. Moreover, right on the router you can see who made this or that commit and view diffs between revisions.

An additional advantage of versioning is that even if you forget to save the config (or purposely powercycle a system with an unsaved config because you forgot to use commit-confirm), you can always view recover the lost changes from the history.

Let's see how to use it.

Interaction between IPsec and NAT (on the same router)

I've just completed a certain unusual setup that involved NATing packets before they are sent to an IPsec tunnel, so I thought I'll write about this topic. Even in perfectly ordinary setups, the interaction between the two often catches people off guard, me included.

No, this is not a premature Friday post. The Friday post will be a continuation of the little known featured of the VyOS CLI.

Most routers these days have some NAT configured, so if you setup an IPsec tunnel, you need to understand the interaction between the two. Luckily, it's pretty simple.

Every network OS has a fixed packet processing order, and for a good reason. For example, source NAT has to be performed after routing because otherwise the OS will not know which outgoing interface must be used for the packet, and will not be able to determine which SNAT rule must be applied to that packet. Likewise, destination NAT must happen before routing if we want to be able to send incoming packets to the intended host — the routing decision depends on the new destination address.

Sometimes the order is less critical but reversing it would create inconvenience for network admins. For example, in Linux (and thus in VyOS), inbound firewall rules are processed after DNAT, so the destination address the firewall will see is the internal address, and you can easily setup a firewall that mentions private addresses on your WAN interface. If it was the other way around, then if you wanted to setup firewall rules for your private addresses, you would have to assign the firewall to the out direction of the LAN interface — not quite as logical or convenient, even if the end result is the same.

Where's IPsec in that processing flow and what are the implications of its position in it?

Let's revisit the complete diagram (image by Jan Engelhardt, CC-BY-SA):

If posthaven can't handle images properly, here's a direct link to the larger version:


The box you are looking for is "XFRM". In Linux, IPsec is not a special component, but a part of the XFRM framework that can do encryption amond other things (it also does compression and header modification).

From the diagram we can see that XFRM decode step (thus IPsec encryption) is before DNAT (NAT prerouting), and IPsec decryption is after SNAT (NAT postrouting). The implications of it are twofold: first you need to be careful when setting up SNAT and IPsec on the same machine, second, you can apply NAT rules to traffic that will go to the tunnel if you really have to.

Avoiding adverse interaction

Suppose you have this config:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 192.168.10.0/24
         }
         remote {
             prefix 10.10.10.0/24
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

What will happen to a packet sent by host 192.168.10.100 to host 10.10.10.200? Since SNAT is performed before IPsec, and the 192.168.10.100 source address matches the rule 10, the rule will be applied and the packet will go down the packet processing pipeline with source address 203.0.113.134, which does not match the IPsec policy from tunnel 1. The packet will be sent out of the eth0 interface, unencrypted, and destined to be dropped by the ISP due to its private destination address (or it will be sent to a wrong host, which is not any better).

In this case this order of packet processing seems to be a real hassle. There's a very easy workaround though: exclude packets with destination address 10.10.10.0/24 from SNAT, like this:

vyos@VyOS-AMI# show nat source 
rule 5 {
    outbound-interface eth0
    destination {
        address 10.10.10.0/24
    }
    exclude
}
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

If you've setup IPsec, the SA is up, but for some reason packets don't get through, make sure that you didn't forget to exclude traffic to the remote network from NAT. It's easy to see with tcpdump whether packets are sent the wrong way or not.

Exploiting the interaction

So far we've only seen how this particular processing order can be bad for our setup. Can it be good for anything then? Sometimes it seems like the Linux network stack was optimized to allow doing crazy things. Just a few days ago I've run into a case when this turned beneficial.

Suppose you setup an IPsec tunnel to your partner, and it turns out you both are using 192.168.10.0/24 subnet internally. None of you is willing to renumber your own network to solve the problem cleanly, but some compromise must be made. The solution is to NAT packets before they are encrypted, which works as expected precisely because IPsec happens after SNAT.

For simplicity let's assume only a single host from our network (internal address 192.168.10.45) needs to interact with a single host from the remote network (10.10.10.55). We will make up an intermediate 172.16.17.45 address and NAT the tunnel traffic to and from 10.10.10.55 host to actually be sent to the 192.168.10.45 host.

The config looks like this:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 172.16.17.45/32
         }
         remote {
             prefix 10.10.10.55/32
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface any
destination {
address 10.10.10.55
}
 source { address 192.168.10.45 }
 translation { address 172.16.17.45 } } vyos@vyos# show nat destination rule 10 { destination { address 172.16.17.45 } inbound-interface any translation { address 192.168.10.45 } }

If IPsec was performed before source NAT, this kind of setup would be impossible.