Using the "policy route" and packet marking for custom QoS matches

There is only that much you can do in a QoS rules to describe the traffic you want it to match. There's DCP, source/destination, and protocol, and that's enough to cover most of the use cases. Most, but not all. Fortunately, they can also match packet marks and that's what enables creating custom matches.

Packet marks are numeric values set by Netfilter rules that are local to the router and can be used as match criteria in other Netfilter rules and many other components of the Linux kernel (ip, tc, and so on).

Suppose you have a few phones in the office and you want to prioritize their VoIP traffic. You could create a QoS match for each of them, but it's quite some config duplication, which will only get worse when you add more phones. If you find a way to group those addresses in one match, wouldn't it be nice? Sadly, there's no such option in QoS. Firewall can use address groups though, so we can make the QoS rule match a packet mark (e.g. 100) and set that mark to traffic from the phones.

# show traffic-policy 
 priority-queue VoIP {
     class 7 {
         match SIP {
             mark 100
         }
         queue-type drop-tail
     }
     default {
         queue-type fair-queue
     }
 }

Now the confusing bit. Where do we set the mark? Around Vyatta 6.5, an unfortunate design decision was made: "firewall modify" was moved under overly narrow and not so obvious "policy route". Sadly we are stuck with it for the time being because it's not so easy to automatically convert the syntax for upgrades. But, its odd name notwithstanding, it still does the job.

Let's create an address group and a "policy route" instance that sets the mark 100:

# show firewall group 
 address-group Phones {
     address 10.4.5.10
     address 10.4.5.11
     address 10.4.5.12
 }
[edit]
# show policy route 
 route VoIP {
     rule 10 {
         set {
             mark 100
         }
         source {
             group {
                 address-group Phones
             }
         }
     }
 }

Now we need to assign the QoS ruleset to our WAN and the "policy route" instance to our LAN interface:

set interfaces ethernet eth0 policy route VoIP
set interfaces ethernet eth1 traffic-policy out VoIP

You can as well take advantage of "policy route" ruleset options for time-based filtering or matching related connections. Besides, you can use it to set DSCP values in case your QoS setup is on a different router:

set policy route Foo rule 10 set dscp 46

Writing the new-style command definitions

Earlier I said new features in Perl code and old style templates will not be merged anymore starting from May the 1st (if you have any such features already working and testing, you still have a chance to get them in, so hurry up!).

Now it's time to write step by step guides to using the new style and we'll start with command definitions.

History and motivation

Old-style command definitions (aka "templates") have quite a lot of design issues and proved to be one of the worst deterrents for new contributors (right after Perl code).

If you are not familiar with them, I'll remind you how they work. Suppose we need to create a command for new interface type "silly" (that's like dummy... but also silly). Suppose we start with address option, "set interfaces silly silly0 address 192.0.2.1/24". What we'd need to do:

  • Create directory structure interfaces/silly/node.tag/address
  • Put a node.def file under interfaces/, silly/, and address/, but not under node.tag (otherwise directory will not be recognized as a command definition)
  • Write a bunch of "tags" such as "help: Silly interface name" in the node.def's

There is a whole lot of problems with this approach:

  • To get the complete picture of commands of a component, you need to read a lot of files in multiple deeply nested directories
  • Every such file can contain embedded shell scripts, which means the logic rather than just data can be scattered across dozens files
  • You cannot check whether your node.def's are even syntactically correct without loading them into VyOS and trying them by hand

Some of these problems such as fragility of the data syntax could possibly be fixed. The problems with data and logic scattering, however, are fundamental, and cannot be cured without changing the approach.

A lot of design and development work went into the configuration mode commands definitions for vyconf and thus VyOS 2.0. However, vyconf is not and will never be a drop-in replacement for the old configuration backend (because it means it would have to reimplement the old unfortunate design decisions to be compatible, which defeats the purpose). And, since the plan is to rewrite VyOS 1.x.x gradually in the new style to have an operational system at all times and be able to reuse the code in 2.0 with minimal changes, we need a way to use new style command definitions alongside the old ones. As a compromise, we've made a convertor from new style definitions to the old style.

To learn how to use the new style and how much better it is, read on...

IP tunnels I have known and loved

Today we'll talk about the "classic" IP tunneling protocols.

GRE is often seen as a one size fits all solution when it comes to classic IP tunneling protocols, and for a good reason. However, there are more specialized options, and many of them are supported by VyOS. There are also rather obscure GRE options that can be useful.

All those protocols are grouped under "interfaces tunnel" in VyOS. Let's take a closer look at the protocols and options currently supported by VyOS.

MTU considerations

One issues that often comes up in tunneled setups is that of the MTU and MSS. Generally, the kernel is capable of setting the correct MTU on its own, and as long as end to end ICMP works, there should be no MSS issues either, but if you are in doubt, or simply curious what the total overhead of a tunnel will be, I made a tool for quickly calculating MTU and MSS for any combination of encapsulating and encapsulated protocols. Your contributions and corrections to it are always welcome.

If you want to do MSS clamping, here's an example:

set policy route MSS-CLAMP rule 10 protocol 'tcp'
set policy route MSS-CLAMP rule 10 set tcp-mss '1400'
set policy route MSS-CLAMP rule 10 tcp flags 'SYN'

set interfaces ethernet eth1 policy route MSS-CLAMP
Alternatively, you can insert a global rule like "iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu" and make it persistent across reboot by placing it in /config/scripts/vyatta-postconfig-bootup.script

IPIP

This is the simplest tunneling protocol in existence. It is defined by RFC2003. It simply takes an IPv4 packet and uses sends it as a payload of another IPv4 packet. For this reason it doesn't really have any configuration options by itself.

An example:

set interfaces tunnel tun0 encapsulation ipip

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 203.0.113.20
set interfaces tunnel tun0 address 192.168.100.200

If tunneling IPv4 traffic in IPv4 is really all you want, then it's a pretty good and a very lightweight choice.

IP6IP6

This is the IPv6 counterpart of IPIP. I'm not aware of an RFC that defines this encapsulation specifically, but it's a natural specific case of IPv6 encapsulation mechanisms described in RFC2473.

It's not likely that anyone will need it any soon, but it does exist.

An example:

set interfaces tunnel tun0 encapsulation ipip

set interfaces tunnel tun0 local-ip 2001:db8:aa::1/64
set interfaces tunnel tun0 remote-ip 2001:db8:aa::2/64
set interfaces tunnel tun0 address 2001:db8:bb::1/64

IPIP6

I'm pretty sure in a few decades this is going to be a very useful protocol (though there are other proposals).

As the name implies, it's IPv4 encapsulated in IPv6, as simple as that.

An example:

set interfaces tunnel tun0 encapsulation ipip6

set interfaces tunnel tun0 local-ip 2001:db8:aa::1/64
set interfaces tunnel tun0 remote-ip 2001:db8:aa::2/64
set interfaces tunnel tun0 address 192.168.70.80

SIT (6in4)

I believe SIT stands for "Simple Internet Transition". This protocol is defined by RFC4213, but curiously that RFC or any of its predecessor do not refer to it as SIT, so I have no idea where that nickname actually comes from (if you know its origin, tell me).

It encapsulates IPv6 packets in IPv4, as the name suggests. Unlike two previous protocols, it's very useful right now, as it's used by a number of IPv6 tunnel brokers such as that of Hurricane Electric.

An example:
set interfaces tunnel tun0 encapsulation sit

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 2001:db8:bb::1/64

GRE

GRE stands for Generic Routing Encapsulation, and it lives up to its name as it can encapsulate many other protocols at more than one OSI layer. It is defined by RFC2784.

Due to kernel driver layout reasons, in VyOS it comes in two flavours: "gre" and "gre-bridge". The difference is that while "gre" is layer 3 only, "gre-bridge" is layer 2 and can encapsulate ethernet frames, thus it can be bridged with other interfaces to create datalink layer segments that span multiple remote sites. GRE is also unique in that it can encapsulate more than one protocol at the same time, so it's the only way to create dual stack IPv4 and IPv6 tunnels in a single interface.

Layer 3 GRE example:

set interfaces tunnel tun0 encapsulation gre

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 10.40.50.60/24
set interfaces tunnel tun0 address 2001:db8:bb::1/64

Layer 2 GRE example:

set interfaces bridge br0 

set interfaces tunnel tun0 encapsulation gre-bridge
set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 parameters ip bridge-group bridge br0

set interfaces ethernet eth1 bridge-group br0

As you can see, the bridge-group option for tunnels is in a rather unusual place, different from all other interfaces. I can't remember why is that, and we may make that CLI more consistent in the future even though it will take quite some effort to make it backwards-compatible.

GRE is also the only classic protocol that allows creating multiple tunnels with the same source and destination due to its support for tunnel keys. Despite its name, this feature has nothing to do with security: it's simply an identifier that allows routers to tell one tunnel from another.

An example:

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 10.40.50.60/24
set interfaces tunnel tun0 parameters ip key 10

set interfaces tunnel tun0 local-ip 192.0.2.10
set interfaces tunnel tun0 remote-ip 192.0.2.20
set interfaces tunnel tun0 address 172.16.17.18/24
set interfaces tunnel tun0 parameters ip key 20

Conclusion

Classic IP tunneling protocols are often not very flexible, but a lot of time they do their job very well, and are easy to use in conjunction with IPsec. For a more modern and flexible option you may consider L2TPv3 or VXLAN — but that's a story for future posts.

First ProNet Portal drafts, new Partners and Social Media

Hello Community!

We are super excited that VyOS finally gets the traction that it deserves and we have a few interesting updates to share with you!

VyOS 1.2.0-rc1

Will be released within two or three weeks.

Tons of new things, we are still building future release notes and you can always grab a rolling version from here. Of course you are invited to add to the release notes if you spot a resolved issue we forgot, or you want to expand the documentation of the new features.

More contributors!

New contributors are joining our effort and more communication happens about VyOS on different platforms. We would like to remind everybody that you are welcome to join and participate in our development collaboration platform.

Social Media & other communication channels

It's more important than ever to spread the word about VyOS.

This is the reason we’ve been adding profiles on just about every social network over time  and on behalf of the team I encourage you to follow/like/subscribe and participate!

Here is the list of social media accounts:

Twitter - https://twitter.com/vyos_dev

Facebook - https://www.facebook.com/vyosofficial

LinkedIn - https://www.linkedin.com/company/vyos

YouTube - https://www.youtube.com/channel/UCEjJx6j87szaiqtKDrMVb2Q

Instagram - https://www.instagram.com/vyosofficial/

Reddit - https://www.reddit.com/r/vyos/


We also have a forum, a development portal and several chat platforms for real-time communication:

IRC - irc.freenode.net - many people here

Slack - slack.vyos.io - Newly launched!

Rocket.Chat - chat.vyos.io - few people


ProNet web

See the document below, it’s an initial draft rather than anything set in stone, but it’s probably pretty close to what we’ll eventually deploy in its fundamental concepts.

As always all feedback is welcome, email it to pronet@vyos.io

New Partners

We’ve already received quite a few applications for participation in ProNet and continue to receive them.

They are all important for us and we are glad to see interest in VyOS among service providers.

I would like to specially mention two of newly arrived partners:

Packet - The Promise of the Cloud Delivered on Bare Metal. Great folks provide super interesting offering. Bare Metal instances with hourly billing and lot of interesting capabilities (Wide list of supported OSes, API, BGP peering and more). Currently you can boot 1.1.8 following this manual.

And we agreed to work together to get VyOS 1.2 as natively supported OS on Packet.

Protectli - First hardware vendor that showed interest to work with us!

Check out their appliances, those will be first hardware appliances officially supported by VyOS 1.2 so you may want order some. And they support OpnSense and other OSes, you can see full list here

P.S.

As always, for those who read the article to the end get a 30% discount on merchandise via this link

Stay tuned!

NAT with a thousand faces

The familiar use cases for NAT are source NAT/masquerade for allowing private subnets access to the Internet, and port forwarding from the Internet to a host in a private network. However, there are more use cases that are less obvious, in part because they are defined by the relative size of the source/destination and translation address options.

One to one NAT

Very common among cloud providers, but equally useful if your ISP is ready to give you an additional address, but not a routable subnet.

Suppose your ISP gave you two addresses, 203.0.113.114 and 203.0.113.115. You use the .114 address for the router itself and want to map the .115 to a server inside your network that has 192.168.136.100 address.

Here's how to do it:

 interfaces {
     ethernet eth0 {
         address 203.0.113.114/24
         address 203.0.113.115/24
         ...
     }
 nat {
     destination {
         rule 10 {
             inbound-interface eth0
             destination {
                 address 203.0.113.115
             }
             translation {
                 address 192.168.136.100
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 192.168.136.100
             }
             translation {
                 address 203.0.113.115
             }
         }
     }
 }

One to many NAT

If the network or range specified in translation address is larger than the network in source/destination address, connections from the same host will be translated to more than one address. In source NAT, this is only useful for a bizzare kind of conspicious consumptions like buying a /24 subnet for yourself and using it all for just your desktop.

In destination NAT, however, it can be used as a simple form of L3, non-application aware load balancing.

Suppose you got 10 web servers all in the range of 192.168.136.100 to 192.168.136.110. You want traffic sent to 203.0.113.115 balanced across them. Here's an example:

nat {
     destination {
         rule 10 {
             destination {
                 address 203.0.113.115
                 port 80,443
             }
             inbound-interface eth0
             protocol tcp
             translation {
                 address 192.168.136.100-192.168.136.110
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 192.168.136.100-192.168.136.110
             }
             translation {
                 address 203.0.113.115
             }
         }
     }
 }

Many to many NAT

What happens if the source/destination and translation networks are the same size though? In that case, only the network part is translated, while the host part should stay untouched.

This is useful for getting around subnet conflicts.

nat {
     destination {
         rule 10 {
             destination {
                address 192.168.136.0/24
             }
             inbound-interface eth0
             translation {
                 address 10.20.30.0/24
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 10.20.30.0/24
             }
             translation {
                 address 192.168.136.0/24
             }
         }
     }
 }

If you know more variations, please let me know.

Configuration versioning and archiving in VyOS

Last time I promised "node copying/renaming, node comments, and other little known features of the VyOS CLI", but the post actually only mentioned copying/renaming and comments, but not other features. It's time to fix that: today we'll discuss configuration versioning and archiving.

One of the great things about the config model with editing and commits being distinct stages is that it's feasible to execute some actions when the config is changed. In fact, you can execute arbitrary actions via pre/post-commit hooks, but there are built-in actions as well, namely configuration versioning and archiving to a remote location. This model, first introduced by JunOS, makes configuration is a lot more manageable than older Cisco style models.

This approach renders tools like Rancid or Oxydized redundant since the system can make a snapshot of the running config when the change is made rather than periodically. Moreover, right on the router you can see who made this or that commit and view diffs between revisions.

An additional advantage of versioning is that even if you forget to save the config (or purposely powercycle a system with an unsaved config because you forgot to use commit-confirm), you can always view recover the lost changes from the history.

Let's see how to use it.

ProNet details and registration form

We are excited to see so much interest in the ProNet partner program.

While we collect applications from people and companies, we have already started working on the design of the ProNet web portal, and we need your input to make it match actual needs of the people.


What would you like to see in the portal as business users, support and consulting providers, service providers, and hardware vendors?


The current concept includes:

  • Different partner profile types for consultants, support providers, and hardware vendors for easy filtering.

  • Search by service types and location.

  • Badges of Contributor, Sponsor, and Evangelist for people and companies that contributed code, made donations, or written or gave talks about VyOS respectively.


Please send your suggestions to pronet@vyos.io All suggestions authors will get discounts on VyOS merchandize, and authors of best proposals will get a reward from us.

Let’s bring VyOS users and service providers closer to one another!


If you are interested in becoming a partner as an individual or a company, please fill the registration form


ProNet Announcement

Hello Community!

In the past few months there was a significant growth of interest to VyOS Project.

We’ve started getting more requests from companies looking for professional services such as  feature development and support and that is obviously great thing for the project!

However, to satisfy the demand, we need to grow the support team, and rather than try to do everything ourselves, we decided to share the business opportunities with the community of people and companies who are using VyOS and willing to share their expertise.


In other words, it’s time to start building a partner network!

We decided to name it ProNet, for a short and catchy name.


At Sentrium, we’ll focus on developing and maintaining VyOS, expanding our cloud platforms support, and offering custom feature development and developer support to our customers. Existing support contracts will continue to be fulfilled, but we will not take new support customers ourselves from now.


To offer decent level of quality for support services we already talking to several companies that showed interested in providing support services for certain territories however that is only beginning. If you is freelancer with deep VyOS knowledge or company with expertise in networking and VyOS and want to be part of our ProNet please drop us a line to pronet@vyos.io and we follow up from there.


There are three levels - registered, professional, enterprise.

While we finishing program perks and requirements document we can guarantee that all qualified partners of first wave will get very exclusive conditions

We plan to introduce partner locator web on https://pronet.vyos.io in Q2 2018 and we open to suggestions, 

just comment your ideas here or on social media about what you would like to see in future portal and we will be glad to consider them for implementation.

If you read until this line, you deserve 15% discount for our merchandise, just use this link  to get it automatically on check out or use PRONET code on checkout otherwise at our shop here

Stay tuned!

UPD1:
Registration form is now available here


Interaction between IPsec and NAT (on the same router)

I've just completed a certain unusual setup that involved NATing packets before they are sent to an IPsec tunnel, so I thought I'll write about this topic. Even in perfectly ordinary setups, the interaction between the two often catches people off guard, me included.

No, this is not a premature Friday post. The Friday post will be a continuation of the little known featured of the VyOS CLI.

Most routers these days have some NAT configured, so if you setup an IPsec tunnel, you need to understand the interaction between the two. Luckily, it's pretty simple.

Every network OS has a fixed packet processing order, and for a good reason. For example, source NAT has to be performed after routing because otherwise the OS will not know which outgoing interface must be used for the packet, and will not be able to determine which SNAT rule must be applied to that packet. Likewise, destination NAT must happen before routing if we want to be able to send incoming packets to the intended host — the routing decision depends on the new destination address.

Sometimes the order is less critical but reversing it would create inconvenience for network admins. For example, in Linux (and thus in VyOS), inbound firewall rules are processed after DNAT, so the destination address the firewall will see is the internal address, and you can easily setup a firewall that mentions private addresses on your WAN interface. If it was the other way around, then if you wanted to setup firewall rules for your private addresses, you would have to assign the firewall to the out direction of the LAN interface — not quite as logical or convenient, even if the end result is the same.

Where's IPsec in that processing flow and what are the implications of its position in it?

Let's revisit the complete diagram (image by Jan Engelhardt, CC-BY-SA):

If posthaven can't handle images properly, here's a direct link to the larger version:


The box you are looking for is "XFRM". In Linux, IPsec is not a special component, but a part of the XFRM framework that can do encryption amond other things (it also does compression and header modification).

From the diagram we can see that XFRM decode step (thus IPsec encryption) is before DNAT (NAT prerouting), and IPsec decryption is after SNAT (NAT postrouting). The implications of it are twofold: first you need to be careful when setting up SNAT and IPsec on the same machine, second, you can apply NAT rules to traffic that will go to the tunnel if you really have to.

Avoiding adverse interaction

Suppose you have this config:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 192.168.10.0/24
         }
         remote {
             prefix 10.10.10.0/24
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

What will happen to a packet sent by host 192.168.10.100 to host 10.10.10.200? Since SNAT is performed before IPsec, and the 192.168.10.100 source address matches the rule 10, the rule will be applied and the packet will go down the packet processing pipeline with source address 203.0.113.134, which does not match the IPsec policy from tunnel 1. The packet will be sent out of the eth0 interface, unencrypted, and destined to be dropped by the ISP due to its private destination address (or it will be sent to a wrong host, which is not any better).

In this case this order of packet processing seems to be a real hassle. There's a very easy workaround though: exclude packets with destination address 10.10.10.0/24 from SNAT, like this:

vyos@VyOS-AMI# show nat source 
rule 5 {
    outbound-interface eth0
    destination {
        address 10.10.10.0/24
    }
    exclude
}
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

If you've setup IPsec, the SA is up, but for some reason packets don't get through, make sure that you didn't forget to exclude traffic to the remote network from NAT. It's easy to see with tcpdump whether packets are sent the wrong way or not.

Exploiting the interaction

So far we've only seen how this particular processing order can be bad for our setup. Can it be good for anything then? Sometimes it seems like the Linux network stack was optimized to allow doing crazy things. Just a few days ago I've run into a case when this turned beneficial.

Suppose you setup an IPsec tunnel to your partner, and it turns out you both are using 192.168.10.0/24 subnet internally. None of you is willing to renumber your own network to solve the problem cleanly, but some compromise must be made. The solution is to NAT packets before they are encrypted, which works as expected precisely because IPsec happens after SNAT.

For simplicity let's assume only a single host from our network (internal address 192.168.10.45) needs to interact with a single host from the remote network (10.10.10.55). We will make up an intermediate 172.16.17.45 address and NAT the tunnel traffic to and from 10.10.10.55 host to actually be sent to the 192.168.10.45 host.

The config looks like this:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 172.16.17.45/32
         }
         remote {
             prefix 10.10.10.55/32
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface any
destination {
address 10.10.10.55
}
 source { address 192.168.10.45 }
 translation { address 172.16.17.45 } } vyos@vyos# show nat destination rule 10 { destination { address 172.16.17.45 } inbound-interface any translation { address 192.168.10.45 } }

If IPsec was performed before source NAT, this kind of setup would be impossible.

Copying/renaming, node comments, and other little known features of the VyOS CLI

I promised not to write about either IPsec or NAT this time, so we'll discuss something else: the little known features of the VyOS CLI. Many people only ever use set/delete and commit, but there's more to it, and those features can save quite a bit of time.

The edit level (never write long node paths again)

You might have noticed that after every command, the CLI outputs a mysterious "[edit]" line. This is a side effect of the system that allows editing the config at any level.

By default, you are at the top level, so you have to specify the full path, such as "set firewall name Foo rule 10 action accept". However, to avoid writing or pasting long paths, you can set the edit level to any node with the "edit" command, such as "edit firewall name Foo". Once you are at some level, you can use relative node paths, such as "set rule 10 action accept" in this case.

To move between levels, you can use the "up" command to move one level up, or the "top" command to instantly move back to the top level.

Look at this session transcript:

dmbaturin@reki# edit firewall name Foo
[edit firewall name Foo]

dmbaturin@reki# set rule 10 protocol tcp
[edit firewall name Foo]

dmbaturin@reki# edit rule 10
[edit firewall name Foo rule 10]

dmbaturin@reki# set destination port 22
[edit firewall name Foo rule 10]

dmbaturin@reki# up
[edit firewall name Foo]

dmbaturin@reki# set rule 10 description "Allow SSH"
[edit firewall name Foo]

dmbaturin@reki# top
[edit]