VyOS 2.0 development digest #2

In the previous post we talked about the reasons for rewrite, design and implementation issues, and basic ideas. Now it's time to get to details. Today we'll mostly talk about command definitions, or rather interface definitions, since set commands is just one way to access the configuration interface.

Let's review the VyConf architecture (I included a few things in the diagram that we haven't discussed yet, ignore them for now):

At startup, VyConf will load the main config (or the fallback config, if that fails). But to know whether the config is valid, and to know what programs to call to actually configure the target applications, it needs additional data. We'll call that data "interface definitions" since it defines the configuration interface. Specifically, it defines:

  1. What config nodes (paths) are allowed (e.g. "interfaces ethernet", or "protocols ospf")
  2. What values are valid for that nodes (e.g. any IPv4 or IPv6 address for "system name-server")
  3. What script/program should be called when this or that part of the config is changed

The old way

Before we get to the new way, let's review the old way, the way it's one in the current VyOS implementation. In the current VyOS, those definitions are called "templates", no one remembers why.

This is a typical template file:
vyos@vyos# cat /opt/vyatta/share/vyatta-cfg/templates/interfaces/ethernet/node.tag/speed/node.def 
type: txt
help: Link speed
default: "auto"
syntax:expression: $VAR(@) in "auto", "10", "100", "1000", "2500", "10000"; "Speed must be auto, 10, 100, 1000, 2500, or 10000"
allowed: echo auto 10 100 1000 2500 10000

commit:expression: exec "\
	/opt/vyatta/sbin/vyatta-interfaces.pl --dev=$VAR(../@) \
	--check-speed $VAR(@) $VAR(../duplex/@)"

update: if [ ! -f /tmp/speed-duplex.$VAR(../@) ]; then
	   /opt/vyatta/sbin/vyatta-interfaces.pl --dev=$VAR(../@) \
	   	--speed-duplex $VAR(@) $VAR(../duplex/@)
	   touch /tmp/speed-duplex.$VAR(../@)
	fi

val_help: auto; Auto negotiation (default)
val_help: 10; 10 Mbit/sec
val_help: 100; 100 Mbit/sec
val_help: 1000; 1 Gbit/sec
val_help: 2500; 2.5 Gbit/sec
val_help: 10000; 10 Gbit/sec

We can spot a few issues with it already. First, the set of definitions is a huge directory tree where each directory represents a config node, e.g. "interfaces ethernet address" will be "interfaces/ethernet/node.tag/address/node.def". This makes them hard to navigate, you need to read a lot of small files to get the whole picture, and you need to do a lot of file/directory hopping to edit them. Now, try mass editing, or checking for mistakes before release...

Next, they use a custom syntax, which needs custom lexer and parser, and custom documentation (in practice, the source is your best bet, though I did write something of a syntax reference). No effortless parsing, no effortless analysis or transformation either.

Next, the value checking has its peculiarities. There is concept of type (it can be txt, u32, ipv4, ipv4net, ipv6, ipv6net, and macaddr), and there is "syntax:expression:" thing which partially duplicate each other. The types are hardcoded in the config backend and cannot be added without modifying it, even though they only define validation procedure and are not used in any other way. The "syntax:expression:" can be either "foo in bar baz quux", or "pattern $regex", or "exec $externalScript".

But, the "original sin" of those files is that they allow embedded shell scripts, as you can see. Mixing data with logic is rarely a good idea, and in this case it's especially annoying because a) you cannot test such code other than on a live system b) you have to read every single file in a package to get the complete picture, since any of them may have embedded shell.

Now to the new way.

Why services and not donations

This was originally intended as a reply to a comment in the VyOS 2.0 digest #1 post, but that would be a rather long reply to post in comments I guess. This question is brought up once every few months perhaps, but I never got to publishing my own position on it.

Dave Smarts asked there:

>Where / how do we donate? Is there an obvious personal vs. corp. donation system?

Contributions and commitment to development are best donations

Myself I think the best thing individuals can do for the project is contribute to the design and the code. For 2.0, we are especially in need of commited people who can join the core team, so we can work together and make it happen (given the amount of design work and CS heavy lifting it needs). The foundation needs to be at least somewhat complete before "casual" contributors can add small improvement, in the initial stage every developer needs to know the full context.

While the jessie migration work is more friendly to occasional contributors who can add a fix or two when they find a bug, it also needs a lot of dedicated and unrewarding work on cleaning up the mess we inherited and removing the assumptions that are no longer true in jessie.

While we are at it, there's an area that needs contributions especially badly: documentation. If you've got a minute, come to wiki.vyos.net and document some feature. A stab page with some config examples is better than no page at all. If you don't know what to write about, you can use the old Vyatta Core docs as a reference (just don't copy anything verbatim from there! They are not under an open source license, that would be a copyright violation).

Donations create legal issues and paperwork

To be fair, I have no idea how donations must be handled, in any country. It's only any obvious if we start a non-profit "VyOS Foundation", even then starting a non-profit is quite an undertake that comes with loads of paperwork and expenses. Remember PDPC (the foundation set up by Freenode)? It had to be dissolved because donations to Freenode couldn't cover the expenses of running it.

Additionally, whether legally required or not, I believe anyone who receives donations is morally required to publish a complete financial report and disclose what they received and what they spent it on. Non-profits are legally required to do that, in any case people who donated have the right to know what happened to their donations.

Donations create a moral problem

I'll be using Wikipedia as a model project that relies on donations.

For them, donations are unquestionably morally correct because the primary purpose of donations is to stay online, and what it takes to stay online is obvious (servers, hosting, bandwidth...). MediaWiki software (for the most part) and the content of Wikipedia are community effort, and people aren't donating towards better content, they are donating towards keeping the servers online, and as long as they are online, their expectations are met. Note that anything they do with cash left after the hosting bills is often criticized, such as the salaries of the paid foundation executives, and their research projects. But, at least the funds spent on hosting are unquestinably spent the intended way.

Donations towards development, however, are a lot like the part Wikimedia Foundation is criticized for. Donations create expectations (justifiably!), but in this case the expectations are left implicit, and open to interpretations. What exactly constitutes spending money towards development? If the donation is exactly to the VyOS project (even assuming VyOS Foundation did exist), who is eligible to receive shares of it? Is programming the only appropriate activity, or it's right to hire a graphics designer to improve the website? Myself, breaking the expectations of people who donate is the last thing I want, but without knowing their expectations it's impossible to avoid.

The other issue is common for donations and crowdfunding. What's our responsibility if we fail to do something in time, or at all?

For a small project, donations from individuals are not financially viable

VyOS is a small project, whether we like it or not. Making overly optimistic estimates, let's say we have 10 000 users, and if every hundredth user donates $10 a year, we are left with $1000. Just enough to pay the bills of one particularly modest maintainer for a month.

Why services are better

  • For companies, authorizing service purchase is normally a lot easier than authorizing a donation
  • You get real value for your money, rather than a vague promise
  • Since the obligations of each party are well defined, expectations will not be broken accidentally

What to do

If you want to contribute to VyOS as an individual, join the development. If you want to contribute financially, try to persuade your company to buy support from Sentrium: at the very least is will help me (dmbaturin) pay my bills, and if we get enough customers, build a team of fulltime (or at least part time) maintainers.

As an alternative way to get guaranteed developer time, if your company is using VyOS and you have people who already contributed to it or want to, consider allocating some of their paid time for it, we will be more than grateful if you do.

VyOS 2.0 development digest #1

I keep talking about the future VyOS 2.0 and how we all should be doing it, but I guess my biggest mistake is not being public enough, and not being structured enough.

In the early days of VyOS, I used to post development updates, which no one would read or comment upon, so I gave up on it. Now that I think of it, I shouldn't have expected much as the size of the community was very small at the time, and there were hardly many people to read it in the first place, even though it was a critical time for the project, and input from the readers would have been very valuable.

Well, this is a critical time for the project too, and we need your input and your contributions more than ever, so I need to get to fixing my mistakes and try to make it easy for everyone to see what's going on and what we need help with.

Getting a steady stream of contributions is a very important goal. While the commercial support thing we are doing may let the maintainers focus on VyOS and ensure that things like security fixes and release builds get guaranteed attention in time, without occasional contributors who add things they personally need (while maintainers may not, I think myself I'm using maybe 30% of all VyOS features any often) the project will never realize its full potential, and may go stale.

But to make the project easy to manage and easy to contribute to, we need to solve multiple hard problems. It can be hard to get oneself to do things that promise no immediate returns, but if you looks at it the other way, we have a chance to build a system of our dreams together. As of 1.1.x and 1.2.x (the jessie branch), we'll figure it out how to maintain it until we solve those problems, but that's for another post. Right now we are talking about VyOS 2.0, which gets to be a cleanroom rewrite.

Why VyOS isn't as good as it could be, and can't be improved

I considered using "Why VyOS sucks" to catch reader's attention. It's a harsh word, and it may not be all that true, given that VyOS in its current state is way ahead of many other systems that don't even have system-wide config consistency checks, or revisions, or safe upgrades, but there are multiple problems that are so fundamental that they are impossible to fix without rewriting at least a very large part of the code.

I'll state the design problems that cannot be fixed in the current system. They affect both end users and contributors, sometimes indirectly, but very seriously.

Design problem #1: partial commits

You've seen it. You commit, there's an error somewhere, and one part of the config is applied, while the other isn't. Most of the time it's just a nuisance, you fix the issue and commit again, but if you, say, change interface address and firewall rule that is supposed to allow SSH to it, you can get locked out of your system.

The worst case, however, is when commit fails at boot. While it's good to have SSH at least, debugging it can be very frustrating, when something doesn't work, and you have no idea why, until you inspect the running config and see that something is simply missing (if you run into it in VyOS 1.x, do "load /config/config.boot" and commit, this will either work or show you why it failed). It's made worse by lack of notifications about config load failure for remote users, you can only see that error on the console.

The feature that can't be implemented due to it is what goes by "commit check" in JunOS. You can't test if your configuration will apply cleanly without actually commiting it.

It's because in the scripts, the logic for consistency checking and generating real configs (and sometimes applying them too) is mixed together. Regardless of the backend issues, every script needs to be taken apart and rewritten to separate that logic. We'll talk more about it later.

Design problem #2: read and write operations disparity

Config reads and writes are implemented in completely different ways. There is no easy programmatic API for modifying the config, and it's very hard to implement because binaries that do it rely on specific environment setup. Not impossible, but very hard to do right, and to maintain afterwards.

This blocks many things: network API and thus an easy to implement GUI, modifying the config script scripts in sane ways (we do have the script-template which does the trick, kinda, but it could be a lot better).

Design problem #3: internal representation

Now we are getting to really bad stuff. The running config is represented as a directory tree in tmpfs. If you find it hard to believe, browse /opt/vyatta/config/active, e.g. /opt/vyatta/config/active/system/time-zone/node.val

Config levels are directories, and node values are in node.val files. For every config session, a copy of the active directory is made, and mounted together with the original directory in union mount through UnionFS.

There are lots of reasons why it's bad:

  • It relies on behaviour of UnionFS, OverlayFS or another filesystem won't do. We are at mercy of unionfs-fuse developers now, and if they stop maintaining it (and I can see why they may, OverlayFS has many advantages over it), things will get interesting for us
  • It requires watching file ownership and permissions. Scripts that modify the config need to run as vyattacfg group, and if you forget to sg, you end up with a system where no one but you (or root) can make any new commits, until you fix it by hand or reboot
  • It keeps us from implementing role-based access control, since config permissions are tied to UNIX permissions, and we'd have to map it to POSIX ACLs or SELinux and re-create those access rules at boot since the running config dir is populated by loading the config
  • For large configs, it creates a fair amount of system calls and context switches, which may make system run slower than it could

Design problem #3: rollback mechanism

Due to certain details (mostly handling of default values), and the way config scripts work too, rollback cannot be done without reboot. Same issue once made Vyatta developers revert activate/deactivate feature.

It makes confirmed commit a lot less useful than it should be, especially in telecom where routers cannot be rebooted at random even in maintenance windows.

Implementation problem #1: untestable logic

We already discussed it a bit. The logic for reading the config, validating it, and generating application configs is mixed in most of the scripts. It may not look like a big deal, but for the maintainers and contributors it is. It's also amplified by the fact that there is not way to create and manipulate configs separately, the only way you can test anything is to build a complete image, boot it, and painstakingly test everything by hand, or have expect-like tool emulate testing it by hand.

You never know if your changes may possibly work until you get them to a live system. This allows syntax errors in command definitions and compilation errors in scripts to make it into builds, and it make it into a release more than one time when it wasn't immediately apparent and only appread with certain combination of options.

This can be improved a lot by testing components in isolation, but this requires that the code is written in appropriate way. If you write a calculator and start with add(), sub(), mul() etc. functions, and use them in a GUI form, you can test the logic on its own automatically, e.g. does add(2,3) equal 5, and does mul(9, 0) equal 0, does sqrt(-3) raise an exception and so on. But if you embed that logic in button event handlers, you are out of luck. That's how VyOS is for the most part, even if you mock the config subsystem so that config read functions return the test data, you need to redo the script so that every function does exactly one thing testable in isolation.

This is one of the reasons 1.2.0 is taking so long, without tests, or even ability to add them, we don't even know what's not working until we stumble upon it in manual testing.

Implementation problem #2: command definitions

This is a design problem too, but it's not so fundamental. Now we use custom syntax for command definitions (aka "templates"), which have tags such as help: or type: and embedded shell scripts. There are multiple problem with it. For example, it's not so easy to automatically generate at least a command reference from them, and you need a complete live system for that, since part of the templates is autogenerated. The other issue is that right now some components feature very extensive use of embedded shell, and some things are implemented in embedded shell scripts inside templates entirely, which makes testing even harder than it already is.

We could talk about upgrade mechanism too, but I guess I'll leave it for another post. Right now I'd like to talk about proposed solutions, and what's being done already, and what kind of work you can join.

Sentrium Cyber Monday promotion: get some free hands on assistance hours with every VyOS commercial support plan

Hi everyone,

This summer,  Sentrium, an IT consulting company ran by some of long time VyOS developers and users, launched commercial support for VyOS, something that we think will make VyOS more attractive for enterprise and service provider networks and give the VyOS project some funding and ensure its sustainable development and growth.

We also would like to say huge thanks to all of our existing customers for the commitment to the VyOS project and for your trust in us! You are contributing to the VyOS development and we are really happy to see such interest from companies around the world. Thank you!

Based on these few months of experience with VyOS support, we decided to adjust our support plans based on customer feedback You can view the new plans at this web page: https://sentrium.io/vyos-commercial-support/

We tried to cover common use cases which we observing:

  • Small companies and nonprofit organizations with budget constraints
  • Companies with internal expertise which just need a “formal” support contract due to the business requirements.
  • Businesses who need phone support and hands-on assistance with strict SLAs for mission critical applications.

So, at a glance, how our support from vendors:

  • All support plans cover all routers in the company, no matter if this production instance, or you just spin up another VyOS in your lab. You don’t need get to pay more when your network grows, or wait to get support contracts for new routers.

  • Initial configuration and environment review is included in each plan, and this allows us to suggest config improvements and also shorten resolution times (we can go directly to issue, bypassing topology discovery)

  • We only employ people with real networking knowledge and do not do anything to artificially make response time appear shorter than it is (e.g. by sending a form reply). We do not offer guaranteed short response time for all plans, but you can be sure the first reply you get includes some useful information about your problem. For email support, relatively long response time also allows us to do some research about your problem, try a solution in a lab, or consult a maintainer or a contributor, if we cannot offer a useful response offhand.

From Cyber Monday and until end of the year we’ll be running a promotion: everyone who buys a support subscription also gets free hand on assistance hours. The basic plan comes with one free hour, the standard plan comes with four, and the production support plan comes with eight.

We know there are still a number of people running Vyatta Core and who may want to switch to VyOS. Switching to a new system is always a concern, so if you are using Vyatta Core, you can use the hands on assistance hours for migration to VyOS. But with all improvements and, most importantly, security fixes that have been added, and will be added by 1.1.8 and then 1.2.0, upgrade is very important.

If you are running Vyatta Core 6.5 or 6.6, it can be upgraded to VyOS as if it was a new Vyatta version, with a few minor caveats. If your version is 6.4 or earlier, there are some features that may require manual config rewrite, which shouldn’t take too long unless your config is particularly large. We have successfully upgraded versions as old as 6.0 to VyOS without any problems. If, no matter how unlikely, you are still using VC 5.0, things get interesting, since it didn’t have the image upgrade yet, but even that is doable and has been done before (though if you can, we suggest that you reinstall and we can help with migrating the config).

Some VyOS servers are down

Hi everyone,

Due to a hardware issue our hoster is having, some our servers are down. What's down is the wiki.vyos.net/forum.vyos.net host and the packages.vyos.net hosts.

The new website, vyos.io, is up, it's hosted in a different place. If you need to download VyOS images, you can use any of the mirrors, e.g. 0.de.mirrors.vyos.net.

Informal meeting of VyOS users in Barcelona

If you are in Barcelona this October, we have some news for you.

There is a plan to organize an informal meeting of VyOS users and all interested in open source routing, roughly at the same time as VMworld and the OpenStack Summit. 

The current idea is October the 17th at 19-00 for VMWorld week

and on 24th 19-00 on OpenStack Summit week, though it's not set in stone yet 

Who's in charge

Yuriy Andamasov (syncer on #vyos) is the initiator and the primary contact.
On behalf of Sentrium Yuriy and Santiago will take care about administrative questions.

Note: None of the maintainers planning to attend, so if you want to discuss specific issues with the code and contributing to it, it's likely not the best place to ask. 
Yuriy is one of VyOS infrastructure admins and the community manager, so feel free to discuss these issues with him. If you want to contribute to VyOS but don't know where to start, feel free to ask him and he can suggest some tasks to work on too.

What's planned

A walk around Barcelona places and have informal talks about VyOS, virtualization, networking and other IT and non-IT subjects. 
There is no plan to have a formal meeting with talks prepared in advance or anything else at this time, though if there is an interest in it, perhaps it's possible to arrange it.
The idea is for VyOS users and everyone interested in VyOS and open source routing in general, to come together and socialize over sightseeing/food/etc. in great city Barcelona

Want to participate?

Go to https://www.meetup.com/VyOS-Default-Route-Barcelona/events/234567678/, register, and leave your comments. If you want to sponsor laptop stickers, t-shirts, or anything else for the attendees, it will be appreciated. In exchange, we can put your company logo on the VyOS brochure that Yuriy will be distributing at VMworld and OpenStack Summit.

CVE-2016-5696, development meeting, and other things

TCP vulnerability and its fix

There was a vulnerability discovered in the implementation of TCP in Linux (CVE-2016-5696) that is remotely exploitable and allows an attacker to impersonate a connected user. For a hotfix on a live system, you can add sudo sysctl net.ipv4.tcp_challenge_ack_limit=1000000000 to /config/scripts/vyatta-postconfig-bootup.script. The fix will be included in the 1.1.8 release.

Development meetup

Today at 18:00 UTC we will hold a development meeting, everyone is invited to join. It was discussed in phabricator (https://phabricator.vyos.net/Q41) originally, and most people voted for Google Hangouts, so that's what we'll use. I'm not fond of it myself, but so be it.

If you want to participate, fill the form here: https://goo.gl/forms/PBarp2bPWvAncQtJ2 so that we know your email address and can send you an invite.

It will be a semi-structured format, the agenda is the following:

  • 1.1.8 maintenance release and what backports to include in it
  • 1.2.0 and how to go about verifying that it actually works
  • VyOS 2.0 (the clean rewrite) design principles
  • Strategies for attracting more contributors

Rocket Chat

Lately we've setup Rocket Chat , an open source chat platform with some interesting features, including voice support, offline logs, and more. If it works well, we can use it for future development meetings, and possibly other things as well. If you want to give it a try, visit https://chat.vyos.io

 

1.2.0 beta2

People keep asking about 1.2.0-beta2. The truth is, if we build some image, and call it 1.2.0-beta2, nothing will change really. You can always get the latest nightly build from http://dev.packages.vyos.net/iso/current/amd64/ and start testing it. Maybe when it's stable enough for at least non-critical production environments, we will call it beta2, but until then, nightly builds is really a better way to go.

VyOS shirts

A bit to our surprise, there was some interest in VyOS shirts (https://teespring.com/stores/vyos), but they also created a bit of confusion as to what it costs and where the profit goes.

The base cost of the shirt around $15/EUR 10, so the "retail price" is some $5 higher than the base cost. Any profit from it will be used for things directly related to VyOS, such as domain renewals and the like (if the shirts somehow become popular enough to pay for anything else, we'll see what else we can do with it).

We need to come up with some convenient way to make the ledger public so that everyone can see what we received what it was used for.


VyOS remote management library for Python

Someone on Facebook rightfully noted that lately there's been more work on the infrastructure than development. This is true, but that work on infrastructure was long overdue and we just had to do it some time. There is even more work on the infrastructure waiting to be done, though it's more directly related to development, like restructuring the package repos.

Anyway, it doesn't mean all development has stopped while we've been working on infrastructure. Today we released a Python library for managing VyOS routers remotely.

Before I get to the details, have a quick example of what using it is like:

import vymgmt

vyos = vymgmt.Router('192.0.2.1', 'vyos', password='vyos', port=22)

vyos.login()
vyos.configure()

vyos.set("protocols static route 203.0.113.0/25 next-hop 192.0.2.20")
vyos.delete("system options reboot-on-panic")
vyos.commit()

vyos.save()
vyos.exit()
vyos.logout()

If you want to give it a try, you can install it from PyPI ("pip install vymgmt"), it's compatible with both Python 2.7 and Python 3. You can read the API reference at http://vymgmt.readthedocs.io/en/latest/ or get the source code at https://github.com/vyos/python-vyos-mgmt .

Now to the details. This is not a true remote API, the library connects to VyOS over SSH and sends commands as if it was a user session. Surprisingly, one of the tricky parts was to find an SSH/expect library that can cope with VyOS shell environment well, and is compatible with both 2.7 and 3. All credit for this goes to our contributor who goes by Hochikong, who tried a whole bunch of them, settled with pexpect and wrote a prototype.

How the library is better than using pexpect directly, if it's a rather thin wrapper for it? First, it's definitely more convenient to just call set() or delete() or commit() than to format command strings yourself and take care of the sending and receiving lines.

Second, common error conditions are detected (through simple regex matching) and raise appropriate exceptions such as ConfigError (for set/delete failures) or CommitError for commit errors. There's also a special ConfigLocked exception (a subclass of CommitError) that is raised when commit fails due to another commit in progress, so you can recover from it by sleep() and retry. This may seem uncommon, but people who use VRRP transition scripts and the like on VyOS already reported that they ran into it.

Third, the library is aware of the state machine of VyOS sessions, and will not let you accidentally do wrong things such as trying to enter set/delete commands before entering the conf mode. By default it also doesn't let you exit configure sessions if there are uncommited or unsaved changes, though you can override it. If a timeout occursm an exception will be raised too (while pexpect returns False in this case).

Right now it only supports set, delete, and commit, of all high level methods. This should be enough for the start, but if you want something else, there are generic methods for running op and conf mode commands (run_op_mode_command() and run_conf_mode_command() respectively). We are not sure what people want most, so what we implement depends on your requests ans suggestions (and pull requests of course!). Other things that are planned but that aren't there yet are SSH public key auth and top level words other than set and delete (rename, copy etc.). We are not sure if commit-confirm is really friendly to programmatic access, but if you have any ideas how to handle it, share with us.

On an unrelated note, syncer and his graphics designer friend made a design for VyOS t-shirts. If anyone buys that stuff, the funds will be used for the project needs. The base cost is around 20 eur, but you can get them with 15% discount by using VYOSMGTLIB promo code: https://teespring.com/stores/vyos?source=blog&pr=VYOSMGTLIB