My Profile Photo

AndrewCz


Using liberty-minded opensource tools, and using them well


Network in a Bottle




I've always imagined a scenario in which, there is a man, on a chair, in front of blinkenlights and computer screens. He presses a button and slowly raises his hands up as would a conductor in front of a grand symphony orchestra. The network springs to life like a bat outta hell. Disks whirring, fans spinning. It is up and ready; waiting to bend to his every whim. This is what that button press does.

TODO: Break into:

  • Initial Setup
    • New Virtualization Host
    • First Core Router
    • First SHR/ADM router
    • First jumphost
  • Additional components
    • New network router
    • New server

Virtualizing services is fine. Virtualizing stacks is better. Virtualizing entire networks? Well, let’s see what we can come up with. Driven by upvm powered by KVM with libvirt, and enabled by Ansible. I’m creating a malleable, reproducible, and customizable network that virtualized and portable anywhere that uses the above technologies.

This is a marriage of classical networking, along with new virtualization technologies. Will this evolve over time? Of course. I have no doubt that at some point this will be containerized to cut down on resource requirements, and allow faster reproduction. And once SDN comes into its own, there will be more changes to make. After that, who knows? Time will only tell. However, I can say that it is only going to get easier from here.

Philosophy

This may double as a mission statement, but my goal here is to use relatively mature technologies to create reproducible results using packaged configuration rather than pre-baked solutions. In a sense, I could use Proxmox and clone a template or templates over and over, but in order to achieve both my vision of portability as well as ease of maintenance in the long run, I need to be able to use official universally available resources Specifically, VMs available from their respective companies that publish on public box lists. Variable-based configuration. Cross-OS scripts, etc. Not just that, but I want an entire networks worth of services to be brought up by just a single command. Can we do that? I don’t know, but I’m sure as hell gunna try.

Host

Any virtual environment needs a host. In the future, this host may be a Proxmox box, as it’s a maturing technology with a lot of potential, and momentum, at least at the moment. However, if I’ve learned anything in the limited time that I’ve served in a Sysadmin role, it is to KISS. (hello lllllllladies…) So with that in mind, I’ll be using a vanilla Fedora box (version 26 at the time of writing) as the host.

After the initial minimal install and a yum update it’s time to take a look at several technologies to install.

Ansible

Some might choose to install ansible in a virtualenv, and to them I say you’re a better admin than I. For now, I will be using a system-wide install of everything (KISS, remember?) and installing it along with the others. This will handle our post-creation configuration of the servers, as well as the maintenance of the VMs.

$ sudo dnf install ansible

UpVM

This is a wrapper script around virt-builder and virt-install which brings up a virtual machine in libvirt as a minimal install without having to intervene (except for the password prompt at the beginning). Known as an “unattended install”, this is crucial to be able to spin up a network with minimal input.

command -v dnf && dnf=dnf || dnf=yum
sudo $dnf install http://people.redhat.com/rsawhill/rpms/latest-rsawaroha-release.rpm
sudo $dnf install upvm
sudo /usr/share/upvm/initial-setup
upvm -h

libvirt

Libvirt is necessary for the virtual networking that is going to be leveraged to create our network. But moreso, it is our virtualization engine, capable of giving our machines life in the first place.

$ sudo dnf group install --with-optional virtualization

Remote connection

Since I plan on administering this as a non-root user, I will have to set up administration with libvirt that allows non-root users. First, add your admin user to the kvm and libvirt groups:

$ sudo usermod -a -G kvm <admin_user>
$ sudo usermod -a -G libvirt <admin_user>

Then, allow libvirt to be managed by a non-root account via PolicyKit:

$ sudo cat << EOF > /etc/polkit-1/localauthority/50-local.d/50-libvirt-remote-access.pkla
[libvirt Management Access]
# For allowing access to specific user only:
#Identity=unix-user:<admin_user>
# For allowing access to a group:
Identity=unix-group:libvirt
Action=org.libvirt.unix.manage
ResultAny=yes
ResultInactive=yes
ResultActive=yes
EOF

Lastly, make sure that /dev/kvm is owned by group kvm:

$ sudo chown root:kvm /dev/kvm

Virtual Network

This is going to be the most difficult and most essential portion of the entire setup. First, let’s get libvirt up and running.

$ sudo systemctl enable libvirtd && sudo systemctl start libvirtd

In order to visualize what we’re looking to create, we can refer to libvirt’s logical diagram of a network architecture. Of course, it’s not going to look exactly like this, but similar.

For one, the initial passthrough is only going to be going to our virtual pfSense router. From there, we start to architect our concentric rings of security, each separated by a firewall and other security mechanisms:

Internet –> (Host –>) DMZ –> Business Logic/Data –> Shared Services

Let’s create the DMZ first:

$ sudo cat << EOF > /etc/libvirt/qemu/networks/dmz.xml
<network>
  <name>dmz</name>
  <uuid>ff82a960-6a77-4372-b563-9694622643f0</uuid>
  <bridge name='virbr1' stp='on' delay='0'/>
  <mac address='52:54:00:24:48:02'/>
  <domain name='dmz'/>
</network>
EOF

Since the DMZ will house multiple servers, we have to configure a LAN there. Same with the BL/D and the Shared Services layers. And not to get ahead of ourselves, but each service within those LANs needs to be segmented off from all of the others, which we will accomplish using VLANs. But first things first, let’s get a bridge from our Internet (host ethernet port) to our first router/firewall that sits in front of our DMZ.

So to go from a physical ethernet port to a VM allowing direct access (even giving it its own IP address from the external router) add a network interface definition like the following in your domain XML file. Make sure to change ETHERNET_DEVICE to the ethernet port that you wish to bind it to.

<devices>
   <interface type='direct'>
      <mac address='d0:0f:d0:0f:00:01'/>
       <source dev='<ETHERNET_DEVICE>' mode='bridge'/>
   </interface>
</devices>

You’ll also want to create a second interface that can be attached to the bridge to connect to the rest of the VMs in the DMZ.

This first device should be a router/firewall. I recommend pfSense, as that is what I will be using.

Initial DMZ router/firewall setup

Defaults:

  • Network
    • WAN: virtbr0
    • LAN: re0
  • IP Address
    • virtbr0: DHCP
    • re0: 10.0.0.1/24

Building out the rest of the subnets

Since we’re connecting the rest of the networks in the same way, the config can be duplicated all the way down, with a router inbetween the DMZ and BL/D layers, the BL/D and Shared Services layers, and the Shared Services and Hub layers (cubicle layer).

At that point, we need to work our way back to access the web interfaces of the various routers. The initial one is easy from a hub network b/c it’s connected to that same LAN.

The router’s interface on the Hub layer is actually the WAN interface, as we want to consider Shared Services a lower layer of security than our general-purpose internet access layers.

We can use the console to access the pfSense VM and disable pf with pfctl -d. This allows us to access the web interface in order to configure the firewall.

  • username: admin
  • password: pfsense

After that, it will walk you through the setup wizard. You might have to re-connect via the console and shut of pf again after the reload. After that, several things have to happen. First, turn off DHCP by going to Services -> DHCP Server: Uncheck Enable DHCP server on LAN interface.

Next, several firewall rules should be set up so that we can re-enable pf. The first one is to allow ICMP packets across the router. You can do this via Firewall -> Rules -> Floating -> Add:

  • Edit Firewall Rule: Action: Pass Interface: WAN, LAN Protocol: ICMP ICMP Subtypes: any
  • Source: Any
  • Destination: Any
  • Description: Allow all pings anywhere

Instead of Any/Any for the Src/Dest, on the DMZ, BLD or SHR-only routers, you can set Source and Dest to: [Network, 10.0.0.0/16], and make sure to not select “Quick Apply”, in order to define application-specific firewall functions later on.

Then we add the ability to manage the router over https on Firewall -> Rules -> WAN -> Add:

  • Edit Firewall Rule: Action: Pass Interface: WAN Protocol: TCP
  • Source:
    • Network
    • 192.168.2.0/24
  • Destination:
    • Single host or alias
    • 192.168.2.10
  • Description: Allow *.hub to administrate router

Lastly, we add the ability to ssh into the hosts in our Shared Services LAN with Firewall -> Rules -> WAN -> Add:

  • Edit Firewall Rule: Action: Pass Interface: WAN Protocol: TCP
  • Source:
    • Network
    • 192.168.2.0/24
  • Destination:
    • Network
    • 10.0.3.0/24
  • Description: SSH from *.hub to *.shr

Getting to Jumphost

Now we have a good chance at getting to our jumphost. This jumphost was created with the following:

upvm fedora-26 --loglevel info --hostname jmphostsy01sfho.shr.andrewcz.com --os-variant fedora26 --img-size 20G --img-format qcow2 -m 2048 -w bridge=virbr3

So it’s connected to the shr bridge that houses default gateway router as well as the jump router that we just created between shr and hub. So, in that case, we should be able to access it if we give it a static IP address by adding some information to the appropriate ifcfg- entry in /etc/sysconfig/network-scripts:

DNS1=10.0.3.20
DNS2=10.0.3.40
IPADDR0=10.0.3.100
PREFIX0=24
GATEWAY0=10.0.0.1

Once we do that, we should be able to ping it by a machine in the hub network.

$ sudo traceroute -I 10.0.3.100
traceroute to 10.0.3.100 (10.0.3.19), 30 hops max, 60 byte packets
router.hub (192.168.2.1)  0.898 ms  1.216 ms  2.352 ms
pfsjmpsy01sbho.hub (192.168.2.10)  3.323 ms  3.382 ms  3.446 ms
10.0.3.100 (10.0.3.19)  4.336 ms  4.386 ms  4.388 ms

If if doesn’t ping right away, try stopping the firewalld service on the host.

If it still doesn’t ping, check the routes on the remote server and make sure it has a route back to the 192.168.2.0/24 subnet.

If it still doesn’t ping, disable NAT on the router, and try to see if it works then. If not, I’m out of ideas.

Now we should be able to ssh into the box.

Setting up SHR Router/Gateway/Firewall

So in order to connect to the GUI for the SHR router/gateway/firewall, we are going to set up a SOCKS proxy on the jumphost, seeing as we’ve already set up firewall rules to allow ssh connections through into that network, and not to redirect port 443 traffic into the network. Waste not, want not!

So we establish a SOCKs proxy to our jumphost on port 8080 with:

ssh -D 8080 -C -N admin_shr@10.0.3.100

Then we configure a separate (or Private Browsing or Containers) session to use the SOCKs proxy we’ve set up on the localhost. In about:preferences, In the first ‘General’ tab, click the ‘Settings’ button under ‘Network Proxy’ and configure:

  • SOCKS Host: localhost Port: 8080

Navigating to the SHR Router/Gateway/Firewall’s internal IP address will then be able to display the login screen for that router. You can then do an initial wizard configuration.

If we set up the router correctly, the upstream gateway should be the BLD subnet’s router, which we are going to configure next. Since this means that the jumphost should have access to this as well, we will want to use the same SOCKS proxy to connect to the web interface of that router using the same setup.

All pfSense boxes need to turn off packet checksum offloading in order to route packets between two virtual bridges. This took me a week to figure out, and lots of tcpdumping.

Setting up BLD and DMZ Router/Gateway/Firewall

Given that we can establish connections to the other side of the SHR router, then we can use our SOCKS proxy to access the webGUIs for the BLD and DMZ subnets. However, two things need to happen on the routers to get connectivity.

  1. Set route back to SHR subnet
  2. Disable Firewall

Since we’re using virt-manager, we can open up the pfSense box’s terminal, like we would with a crash cart on a local physical server in a datacenter. First, we need to turn off the firewall like we did before:

# pfctl -d

And then we need to add the route. For instance, the BLD router’s LAN IP address is 10.0.2.11, whereas the router that connects to the 10.0.3.0/24 subnet is located at 10.0.2.12. We now add a route to the pfSense box that will route packets correctly back to the SHR subnet.

# route add 10.0.3.0/24 10.0.2.12

This will not survive a reboot, so this needs to be set in the GUI as shown below.

After this, the router’s GUI should be able to be accessed through the SOCKs proxy. Set the appropriate settings through the setup wizard after logging in for the first time (admin/pfsense) and change the passwd. Note that after the wizard, the changes above will have to be redone, as the routes and firewall are reset after the wizard.

Configure Routing

From Advanced -> Routing to add the static routes, add the downstream gateway and then add a static route for that gateway for the appropriate subnets. For instance, the DMZ router can have a route to 10.0.2.0/23 which would encompass 10.0.2.0 to 10.0.3.255.

#### We don’t actually want to route back to the hub, as we want to force jumphost usage to interact with the hosting environments.

upvm

To use upvm, we should know where we want to put the VM that we’re creating, and what hostname it should have, etc.

upvm fedora-26 --loglevel info --hostname jmphostsy01dfho -n dmz-jmphostsy01dfho --os-variant fedora26 --img-size 20G --img-format qcow2 -m 2048 -w bridge=virbr1

Once we get dropped to a login shell, we can work on the box as root. However, there is more networking to do, go figure. Let’s look at the initial setup that upvm gets us:

# cat /etc/sysconfig/network-scripts/ifcfg-ens2
# Generated by dracut initrd
NAME="ens2"
DEVICE="ens2"
ONBOOT="yes"
NETBOOT="yes"
UUID="25ba4fee-de25-410d-b58c-dbd3103e0d94"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"

We need to change a couple things: BOOTPROTO="none" needs to be set for static DNS. Then we also need to configure the static DNS of the box. We will need to configure this in DNS in the appropriate pfSense box when we add it. But for the time, let’s just get the box up and running with an IP address. Make sure that the HWADDR setting is set to whatever the link/ether value is when issuing # ip a.

IPADDR0=10.0.2.100
PREFIX0=24
GATEWAY0=10.0.2.11
HWADDR=00:00:00:00:00:00
DNS1=10.0.2.11
DOMAIN=<subdomain>.opensource.osu.edu

for Debian/Ubuntu, edit /etc/network/interfaces to include:

auto eth0
iface eth0 inet static
  address 192.168.11.100
  netmask 255.255.255.0
  gateway 192.168.11.1
  dns-domain example.com
  dns-nameservers 192.168.11.1

Then a full reboot is needed. Theoretically, a reload of NetworkManager should be sufficient, but I’ve never found that to be the case.

Firewall config

When we’re talking about networking, the firewall rules have to be considered. We’ll want to work from the outside in. First of all, we’ll work on the DMZ router.

Default LAN rules

Since we have no services yet, we’ll just close off all inbound ports from the WAN in the DMZ. Also, we don’t want any connection from the internal network to the WAN yet, however, we want to allow connectivity within the internal network. So we’ll create a rule that allows any LAN address to any other LAN address as long as it comes in on the LAN interface. We’d want to default deny from the WAN. So we’ll add a /16 to the firewall rules, for source and destination, on the LAN interface.

Firewall Administration

Then we want to make sure we only administrate the firewalls from the Shared Services subnet. So we’ll create a similar anti-lockout rule, allowing all from the interfaces on the SHR subnet to port 443 on the firewall, and then change the webconfiguration to SSL. Finally, we disable the anti-lockout rule that allows all networks to access the web GUI.

Disable NAT

Next, we want to disable NATting. This is done by going to Firewall -> NAT -> Outbound and selecting “Disable Outbound NAT rule generation”.

DNS

Routing

We have to set the static routes on the boxes, otherwise we end up with asymmetrical routing.

The SHR network is the only one that should be accessing the WebGUIs of the routers. With the Cisco hierarchical model having the Access layer comprised of switches, if we imagine one switch connecting the LAN of the core router to the WAN of all other routers, then routing from SHR internal would suffer asymmetric routing.

This because the source of 10.0.3.0/24 would be routed to the core gateway during the tcp ack and all other return routes. However, the initial route there would be direct from the SHR router, since the interface is on a shared network with the other routers’. and as the router’s interfaces are typically 172.16.0.13/24, then we need to make sure that the initial packet gets routed through the default gateway. This can be done by putting a firewall rule at the top of the LAN interface of the SHR router dictating that all 172.16.0.0/24 traffic have a gateway of 172.16.0.10 (the upstream gateway).

Luckily since the routes are going to be both to and from the internal networks, we don’t have to worry about this on any other routers.

Make sure it’s for all types of connections, and not just TCPv4

Also, for some reason, it doesn’t like traceroute’s UDP packets, but if you traceroute with the -T flag, you’ll be just fine. In fact, if you try to curl -L from the jmphost to the router, it will route successfully. Most likely this is because the core router is only set up to allow TCPv4 and ICMP through from the LAN subnets (come to think about it). You might consider changing this to IPv4*.

pfsense setup

Https:

  1. Allow http and https in the firewall if connecting from WAN
  2. System -> Advanced: Change to https

ICMP

add floating rule for all interfaces to allow icmp

Core

Firewall

only let jumphosts initiate to outside internet

BLD

Firewall

Only let jumphosts into LAN Net all. Only allow specific hosts from DMZ to other specific hosts in BLD.

Pre-Ansible provisioning

IP

  1. Get the interface hardware address to put in
  2. Put the following into /etc/sysconfig/network-scripts/ifcfg-ensX:
    • HWADDR
    • IPADDR0
    • PREFIX0
    • GATEWAY0
    • DNS1

Admin account

Create the local admin account to be used in a fallback scenario, and disable root.

$ sudo useradd -G wheel <localadmin>
$ sudo passwd <localadmin>

Python

In Fedora boxes, symlink /usr/bin/python to /usr/bin/python3 for ansible.

ln -sT /usr/bin/python3 /usr/bin/python

Least Privilege

There are going to be three groups of users, as follows:

  1. Application
    • Each app would have its own admin
    • App admin would be the same in both envs
    • App admin would not have, or have limited sudo ability (only when it would be to restart services, etc.)
  2. Per-environment admins
    • Local account in case of failure
    • Complicated password
    • no keys
  3. Global admin
    • SHR admin would have keys to production layer
    • Global admin would be pseudo-root admin to all servers
    • Only user to access servers in SHR besides jumphosts

But any running services are done with service accounts - not regular users/admins.

Router firewall

LAN net will have to resolve DNS queries, so in a default LAN net to ANY rule, you’ll have to allow UDP over port 53 (DNSSEC will eventually need TCP as well)

Router needs to disable DHCP on LAN - which can be done when setting the IP address in the console or in the Services –> DHCP Server tab.

Disable HTTP_REFERER_CHECK:

Enter an option: 12
pfSense shell: global $config;
pfSense shell: $config = parse_config(true);
pfSense shell: $config['system']['webgui']['nohttpreferercheck'] = true;
pfSense shell: echo "Disabling HTTP referer check...";
pfSense shell: write_config("PHP shell disabled HTTP referer check");
pfSense shell: echo "done.\n";
pfSense shell: exec

WAN router

  1. Needs to have the static routes for all the subnets
  2. Needs to allow traffic from subnets, not just “LAN Network”

SOCKs proxy if the core router is NAT’d

You’ll want to maintain a persistent connection that forwards a local port on the server to the jumphost, so you can ssh to that port and connect your own socks proxy to it. Yikes.

Basically, you’re socks proxying to a socks proxy. Needless to say, you’ll want socksv5+

Host (in a tmux session)

$ ssh -D 8080 -C -N oscadmin@192.168.122.158 -p 13022

Localhost

$ ssh -L 9999:localhost:8080 oscadmin@opensource.osu.edu -N

New server

take out NATting

Disable anti-lockout rule and add block src LANnet dest this firewall(self)

Disable Hardware Checksum offloading (must reboot after)

Password protect the console(?)

remove first WAN contact and port forward

allow admin to get to core firewall

vtnet0 should always be core (just the way I built them in virt-manager)

Allow jumphost servers into networks over port 22 from WAN

SSH causes out-of-state

reset the ssh socks proxy if you’re getting TCP:PA and TCP:FPA packets blocked.

Containers

virt-bootstrap (github)

$ sudo dnf install -y libvirt-daemon-lxc python2-libguestfs
$ git clone https://github.com/virt-manager/virt-bootstrap.git && cd virt-bootstrap
$ sudo pip install passlib
$ ./setup.py build
$ sudo ./setup.py install
$ sudo mkdir /var/lib/libvirt/filesystems/container1
$ sudo virt-bootstrap virt-builder://fedora-26 /var/lib/libvirt/filesystems/container1

virt-manager

  1. New Machine
  2. Operating system container
  3. Browse to /var/lib/libvirt/filesystems/container1
  4. Set RAM (256MB) and CPUs (1)
  5. Network selection (DMZ/BLD/ADM/VIP)

LDAP Auth

RHEL/CentOS/Fedora

# sudo dns install -y sssd openldap-clients
# # get /usr/local/etc/openldap/certs/cacert.pem from ldap server to mkdir /etc/pki/dev.andrewcz.com and chown root:root and chmod 600
# cat << EOF > /etc/sssd/sssd.conf
[sssd]
config_file_version = 2
domains = dev.andrewcz.com
services = nss,pam
debug_level=9
ldap_tls_cacert = /srv/tls/cacert.pem

[nss]
debug_level=9

[pam]
debug_level=9

[domain/dev.andrewcz.com]
# used for testing/troubleshooting
enumerate = true
debug_level=9

auth_provider = ldap
id_provider = ldap
chpass_provider = ldap

ldap_uri = ldap://ldapmstin01bsdh.shr.dev.andrewcz.com
ldap_search_base = dc=dev,dc=andrewcz,dc=com
ldap_user_search_base = ou=people,dc=dev,dc=andrewcz,dc=com

ldap_user_uid_number = uidNumber
ldap_user_gid_number = gidNumber
ldap_user_fullname = gecos
ldap_user_home_directory = homeDirectory

ldap_group_search_base = ou=groups,dc=dev,dc=andrewcz,dc=com
ldap_group_name = cn
ldap_group_member = memberUid

access_provider = simple
simple_allow_users = smacz
simple_allow_groups = jumphostAdmins

ldap_schema = rfc2307
ldap_use_start_tls = true
cache_credentials = false

# diff /etc/pam.d/password-auth-ac /etc/pam.d/system-auth-ac 
# cat /etc/pam.d/system-auth-ac 
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        required      pam_faildelay.so delay=2000000
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        sufficient    pam_sss.so forward_pass
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account [default=bad success=ok user_unknown=ignore]    pam_sss.so
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    sufficient    pam_sss.so use_authtok
password    required      pam_deny.so

session     required      pam_oddjob_mkhomedir.so skel=/etc/skel umask=0077
session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
-session     optional      pam_systemd.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so
# cat /etc/nsswitch.conf
passwd:      sss files systemd
shadow:     files sss
group:       sss files systemd

hosts:      files dns myhostname

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files sss

netgroup:   nisplus sss

publickey:  nisplus

automount:  files nisplus
aliases:    files nisplus
# dnf install -y oddjob-mkhomedir
# systemctl start oddjobd
# systemctl start sssd
# cat << EOF > /etc/openldap/ldap.conf
#
# LDAP Defaults
#

# See ldap.conf(5) for details
# This file should be world readable but not world writable

BASE    dc=dev,dc=andrewcz,dc=com
URI     ldap://ldapmstin01bsdh.shr.dev.andrewcz.com:389

#SIZELIMIT      12
#TIMELIMIT      15
#DEREF          never

#TLS_CACERTDIR  /etc/openldap/certs
#TLS_REQCERT    demand
TLS_CACERT      /srv/tls/cacert.pem

# Turning this off breaks GSSAPI used with krb5 when rdns = false
SASL_NOCANON    on

References: