VyOS 1.2.0 development news in July

Despite the slow news season and the RAID incident that luckily slowed us down only for a couple of days, I think we've made good progress in July.

First, Kim Hagen got cloud-init to work, even though it didn't make it to the mainline image, and WAAgent required for Azure is not working yet. Some more work, and VyOS will get a much wider cloud platform support. He's also working on Wireguard integration and it's expected to be merged into current soon.

The new VRRP CLI and IPv6 support is another big change, but it's got its own blog post, so I won't stop there and cover things that did not get their own blog posts instead.

IPsec and VTI

While I regard VTI as the most leaky abstraction ever created and always suggest using honest GRE/IPsec instead, I know many people don't really have any choice because their partners or service providers are using it. In older StrongSWAN versions it used to just work.

Updating StrongSWAN to the latest version had an unforeseen and very unpleasant side effect: VTI tunnels stopped working. A workaround in form of "install_routes = no" in /etc/strongswan.d/charon.conf was discovered, but it has an equally bad side effect: site to site tunnels stop working when it's applied.

The root cause of the problem is that for VTI tunnels to work, their traffic selectors have to be set to 0.0.0.0/0 for traffic to match the tunnel, even though actual routing decision is made according to netfilter marks. Unless route insertion is disabled entirely, StrongSWAN thus mistakenly inserts a default route through the VTI peer address, which makes all traffic routed to nowhere.

This is a hard problem without a workaround that is easy and effective. It's an architectural problem in the new StrongSWAN, according to our investigation of its source code and its developer responses, there is simply no way to control route insertion per peer. One developer responded to it with "why, site to site and VTI tunnels are never used on the same machine anyway" — yeah, people are reporting bugs just out of curiosity.

While there is no clean solution within StrongSWAN, this definitely has been a blocker for the release candidate. Reimplementing route insertion with an up/down script proved to be a hard problem since there are lots of cases to handle and complete information about the intended SA may not always be available to scripts. Switching to another IKE implementation seems like an attractive option, but needs a serious evaluation of the alternatives, and a complete rewrite of the IPsec config scripts — which is planned, but will take a while because the legacy scripts is an unmaintainable mess.

I think I've found a workable (even if far from perfect workaround) — instead of inserting missing routes, delete the bad routes. I've made a test setup and it seems to work reasonably well. The obvious issue is that it doesn't prevent bad things from happening, but rather undoes the damage, so there may still be a brief traffic disruption when VTI tunnels go up. Another problem is a possible race condition between StrongSWAN inserting routes and the script deleting them, though I haven't seen it in practice yet and I hope it doesn't exist. But, at least you can now use both VTI and site to site tunnels on the same machine.

For people who want to use VTI exclusively, there is now "set vpn ipsec options disable-route-autoinstall" option that disables route insertion globally, thus removing the possible disruption, at cost of making site to site tunnels impossible to use. That option is disabled by default.

I hope it will be good enough until we find a better solution. Your testing is needed to confirm that it is!

On "run reset vrrp master"

I've just exorcised a ghost of the old VRRP CLI implementation — the "reset vrrp master" command. I thought it would go away with the vyatta-vrrp package, but in fact it was in vyatta-op. It made me remember that I was going to write about it in the original blog post, but somehow I forgot about it.

Here is why that command was not reimplemented. First, it never worked with preempt to begin with, and with preempt being the default, its usefulness was already limited.

A more serious reason, however, is that it was a rather horrible (even if ingenious) kludge. This is how it worked: first it tried to locate the VRRP group in keepalived.conf, then it would remove it from the config, restart keepalived, insert it again, and restart keepalived again. It sort of worked, but you can see how fragile this approach is. If anything at any stage would go wrong, it would leave VRRP in an inconsistent state.

A much cleaner and general way to do it is to just disable the VRRP group in conf mode (set high-availability vrrp group Foo disable) and commit the change.



New VRRP CLI is here (with IPv6 support)

Ever since I started with Vyatta, I've had a problem with commands for features unrelated to interfaces being defined inside interfaces. I'm sure the person who came up with that arrangement meant well and thought it would be familiar for Cisco and Juniper users, but the more I lived with it, the more I thought it creates more problems than it solves.

From the user perspective, it's hard to easily view the complete configuration of those features. It's also much harder to clone a feature config to another machine. And if you ever want to move some connection to a different NIC, things get even more fun.

For developers, however, it's even worse. First, it means commands for those features needs to be duplicated for every interface type, which makes adding new interfaces much harder. Second, configuration scripts end up more complex due to paths that can be nested quite deeply. Third, with the current config backend specifically, lack of nested end nodes can lead to very interesting tracking of the state to avoid repeated service restarts.

Until recently there was a token excuse for leaving unfortunate UI decisions alone — the difficulty of writing migration scripts. Luckily, it's no longer the case, so we can start cleaning it up. Ok, it is hard and you need to take care of many details, but at least you are not wrestling with a library that is simply inadequate for the task. Now we can go on a quest to remove excessive nesting and redesign the UI is an easier to use, more logical fashion.

VRRP looked a good feature to start the clean up with — we need to get  IPv6 VRRP support to work in the end, its scripts have accumulated quite some cruft, and, well, it really has nothing to do with interface settings since it's a protocol of its own implemented by a userspace daemon.

Today I've rolled out the new implementation and it is already in the latest rolling release image, ready for your testing. Let's walk through the changes.

Making first boot scripts just got easier (but building vyos-1x got a bit harder)

As you probably know already, we are working on integrating cloud-init into VyOS, which will allow us to support multiple cloud platforms, and get rid of the custom script for EC2. The hard part of this project is that just allowing cloud-init to do what it normally does in Debian would not produce desired results, we need to make it modify the config.

This raises a question when this should occur and how it should be done. Since modifying running config with scripts has its difficulties in the current backend, and even if it didn't, it still could potentially clash with user's commits, we thought we may want to modify the config.boot file before it's loaded instead.

One advantage is that once we have common functionality implemented, it can be reused not only in cloud-init, but also in the installer, and in custom first boot scripts if someone wants them.

To test this concept, I've added a library names vyos.initialsetup that includes a collection of functions for common settings such as user passwords and keys, host name, default route, name servers, and interface addresses.

Here's an example of a script you can run on your system for demonstration (adjust user name and do ssh-keygen if necessary):

#!/usr/bin/env python3

import vyos.configtree as vct
import vyos.initialsetup as vis

with open('/opt/vyatta/etc/config.boot.default') as f:
    config_string = f.read()

with open('/home/dmbaturin/.ssh/id_rsa.pub') as f:
    key_string = f.read()

config = vct.ConfigTree(config_string)

vis.set_user_password(config, 'vyos', 'qwerty')
vis.set_user_ssh_key(config, 'vyos', key_string)

# Default level is admin
vis.create_user(config, 'dmbaturin', password=None, key=key_string)

# Default type is ethernet
vis.set_interface_address(config, 'eth0', '192.0.2.10/24')

vis.set_default_gateway(config, '192.0.2.1')

vis.set_name_servers(config, ['203.0.113.10', '203.0.113.20'])

vis.set_host_name(config, 'vyos-test')

print(str(config))

The script will print a customized config based on the default config.

Building vyos-1x

This is the good thing. The bad, or rather somewhat inconvenient thing is that vyos-1x package build now depends on the libvyosconfig0 package that provides the library behind the vyos.configtree module, and it's essential for running unit tests for those modules.

You should add the "deb http://dev.packages.vyos.net/repositories/current/vyos/ current main" repository to the sources.list on your build machine and install libvyosconfig0 with APT, or simply take the file from the repo and install it by hand with dpkg.

I hope the increased reliability we gain from those unit test outweighs the inconvenience of additional setup.


Infrastructure failure resolved, downloads.vyos.io is back now

We have managed to resolve the infrastructure problems and bring the affected machines back intact. Now downloads.vyos.io and ci.vyos.net are back online, and our build hosts with all the build dependencies setup and uncommited code are back online too, so luckily we can resume the work from the point where it was stopped by the RAID issue.

We are not ready to give a complete post mortem on the actual issue yet (may not even be able to at all) since we have limited data and the service provider support was not exactly helpful. What we can say now is that what was thought to be RAID5 actually was a RAID1 with an additional hot spare drive, and the failure mode included a RAID controller glitch apart from drive failure. The hot spare drive is what allowed us to bring the data back intact.

While this issue was resolved without any data loss, it definitely prompts us to reconsider many things about our infrastructure, including backup strategy, deployment mechanisms, and service provider choice.

We are going to write new posts as we decide upon the options and roll out improvements. Infrastructure deployment is also a good area for contributions, and some people already offered help.

Infrastructure failure

Hi everyone,

We've had a catastrophic failure on one of our hosts: two drives in a RAID5 failed simultaneously. A number of VMs related to the development infrastructure are permanently lost and need to be restored since we didn't have backups of them.

The only piece public infrastructure affected by it is downloads.vyos.io. You can use the old packages.vyos.net server or one of the mirrors to download the stable (1.1.8) or historical VyOS releases.

The other pieces of infrastructure that need to be restored are the Jenkins server, the build machines, and the repositories server. We'll have to restore them before we can continue development. I hope we'll get it done by Monday.

Building VyOS images with custom packages just got simpler

While the new build scripts first introduced when we migrated the development branch to jessie made things much simpler for developers and for people who just want to build the latest VyOS image from source, building an image even simply with a package available from Debian Jessie repos but not present in the VyOS package set by default was still quite an ordeal for a person not familiar with live-build and the structure of our build scripts.

Well, until now. Yesterday I've added ./configure script options that should allow everyone to build a custom image without ever touching the plumbing of the build scripts.

The simplest example, building an image with packages available in Debian Jessie:

./configure --custom-packages "bsdgames robotfindskitten"
sudo make iso

A more interesting example, adding a package from a third party repo signed with its own key. In this case, salt-minion:

wget https://repo.saltstack.com/apt/debian/8/amd64/2017.7/SALTSTACK-GPG-KEY.pub
./configure --custom-apt-entry "deb http://repo.saltstack.com/apt/debian/8/amd64/2017.7 jessie main" --custom-packages "salt-minion" --custom-apt-key ./SALTSTACK-GPG-KEY.pub
sudo make iso

Of course it doesn't guarantee that your image will build or work, but at least it will get you to the debugging phase faster,

Versions mystery revealed

Over last few years, I saw many combinations and theories around VyOS, Vyatta, EdgeOS that I decided to shed light on this and also explain current and future VyOS versions


First was Vyatta

You can read the detailed history here at Wikipedia, and I just tell that all started back in 2006 (12 years ago!) and with release v6.0  in 2010 Vyatta Community renamed to Vyatta Core to highlight the fact that the Vyatta Subscription was no longer a separate product, but the open source version extended with proprietary add-ons. In retrospect, this is often seen as the beginning of the end. At the time the future of the open source Vyatta didn’t look as bleak though, and while since 6.0 Vyatta officially became “open core” (aka “freemium”), that release incorporated features formerly available only in the proprietary version, along with significant new features like image-based upgrade, and it seems like a sustainable model for a time.


In 2011 Ubiquiti Networks launched their EdgeMax line of products. EdgeOS™ is the essential part of the product line and is a fork of Vyatta Core 6.3 that exclusively runs on Cavium backed hardware produced by Ubiquiti Networks (EdgeRouter Lite, PoE, Pro) Since then they migrated to Debian 7 and replaced quagga with proprietary ZebOS™


In 2012 Vyatta was acquired by Brocade, and in April 2013 they renamed Vyatta Subscription Edition (VSE) to the Brocade Vyatta 5400 vRouter (and later also 5600) those a no longer open source. By that time all community resources were wiped, and future of Vyatta Core was obvious.


In late 2013 Daniil initiated fork of Vyatta Core 6.6 under VyOS name. It's a start of development of so-called old stable 1.1.x series which based on Debian 6 (patched all the way) the latest version is 1.1.8 and now in Extended support



fall of 2015, we began an upgrade of VyOS project in many senses, that is a time when 1.2.x started (decision to skip Debian 7 and go directly to Debian 8 which includes migration to systemd among other things)  enhancements and new development happening in the rolling version that we introduced some time ago.


Now that know the background, it’s a good time to describe the details on the new VyOS release model.


Rolling

You can already download binary images of the rolling release of VyOS 1.2.x here, which are snapshots of the current development state. All new features, refactoring of old code, improvements of the OS package base go there before they can become a part of a release candidate or a stable release.

Rolling release builds may break occasionally, and new features may not work as expected. They are meant for contributors and networking enthusiasts who are willing to help us test those features.


Release Candidate (RC)

About every six months, we will start a new release cycle. The cycle will begin with branching off the rolling release and preparing a release candidate. Release candidates are neither code nor feature frozen, as we fix bugs or design issues, or new features in rolling become stable and well tested, we may prepare a new RC that incorporates them.

Release candidates are supposed to be stable enough for non-critical production, but if you prefer stability over new features, you may want to stick with EPA or LTS.


Early Production Access (EPA)

After RC cycle is over and the release is sufficiently stable, it will become the basis for the next LTS release.

EPA is for customers and community members who want new features before they are available in the LTS, and are willing to help us weed out last bugs in those new features.


Long Term Support (LTS)

LTS version of VyOS will be receiving security patches and high priority fixes from rolling releases.

They are meant for enterprise and service provider users who value stability over everything.



We’d like to encourage everyone to install the least stable version to help us with testing.

As bonus 

VyOS to Debian version map



All product names, logos, and brands are property of their respective owners.


VyOS 1.2.0 development news in June

If we haven't written any new blog posts in a while, that's because we've all been busy with development. Here is what happened since the last status update.

RADIUS authentication and authorization

We've had nominal support for RADIUS authentication for system login for a long time, but it was essentially useless because it required that the user account to already exist in VyOS to actually work, it was just a password checking method.

Kim Hagen found a way to implement real support for it and it appears to work fine (but of course needs testing). Apart from authentication, his implementation also supports authorization through the priv-lvl attribute. Privilege level 15 is mapped to admin users, anything below it is operator. This seems to be the most common way to do that, after Cisco.

Correct naming of PPTP and L2TP interfaces works again

The idea to switch to the upstream PPPD turned out to be premature — while it does have support for pre-up scripts, it still makes an assumption that the interface name is not changed by pre-up script, so remote access VPN interfaces were named pppX, which broke the "run show vpn remote-access" commands, and would break zone-based firewalls of people who rely on that naming.

We've re-applied the old patches for it to the latest upstream PPPD and opened pull requests for them, so if they are merged, we can finally stop building our own, and if not, we have clean patches that should be easy to apply to future releases.

Writing migration scripts (and manipulating VyOS config files outside VyOS) just got easier

Long story short

VyOS 1.2.0-rolling (starting with the next nightly build) includes a library for parsing and manipulating config files without loading them into the system config. It can be used for automatically converting configs from old versions in case an incompatible change was made, and for standalone utilities. Motivation and history are discussed below.

Here is an example of interacting with the new library:
>>> from vyos import configtree

>>> c = configtree.ConfigTree("system { host-name vyos \n } interfaces { dummy dum0 { address 192.0.2.1/24 \n address 192.0.2.20/24 \n disable \n } }  /* version: 1.2.0 */")

>>> print(c.to_string())
system {
    host-name vyos
}
interfaces {
    dummy dum0 {
        address 192.0.2.1/24
        address 192.0.2.20/24
        disable { }
    }
}

 /* version: 1.2.0 */

>>> c.set(['interfaces', 'dummy', 'dum0', 'address'], value='293.0.113.3/32', replace=False)

>>> c.delete_value(['interfaces', 'dummy', 'dum0', 'address'], '192.0.2.1/24')

>>> c.delete(['interfaces', 'dummy', 'dum0', 'disable'])

>>> c.is_tag(['interfaces', 'dummy'])
True

>>> c.exists(['interfaces', 'dummy', 'dum0', 'disable'])
False

>>> c.list_nodes(['interfaces', 'dummy'])
['dum0']

>>> print(c.to_string())
system {
    host-name vyos
}
interfaces {
    dummy dum0 {
        address 192.0.2.20/24
        address 293.0.113.3/32
    }
}

 /* version: 1.2.0 */

As you can see, it largely mimics the API you get for the running config. The only notable differences are that the "set" method requires that you specify the path and the value separately, and to have nodes formatted as tag nodes (i.e. "ethernet eth0 { ..." as opposed to "ethernet { eth0 { ..." you need to mark them as such with "set_tag", unless they were originally formatted that way in the config you parsed.