TODO: Break into:
- Initial Setup
- New Virtualization Host
- First Core Router
- First SHR/ADM router
- First jumphost
- Additional components
- New network router
- New server
Virtualizing services is fine. Virtualizing stacks is better. Virtualizing entire networks? Well, let’s see what we can come up with. Driven by upvm powered by KVM with
libvirt, and enabled by Ansible. I’m creating a malleable, reproducible, and customizable network that virtualized and portable anywhere that uses the above technologies.
This is a marriage of classical networking, along with new virtualization technologies. Will this evolve over time? Of course. I have no doubt that at some point this will be containerized to cut down on resource requirements, and allow faster reproduction. And once SDN comes into its own, there will be more changes to make. After that, who knows? Time will only tell. However, I can say that it is only going to get easier from here.
This may double as a mission statement, but my goal here is to use relatively mature technologies to create reproducible results using packaged configuration rather than pre-baked solutions. In a sense, I could use Proxmox and clone a template or templates over and over, but in order to achieve both my vision of portability as well as ease of maintenance in the long run, I need to be able to use official universally available resources Specifically, VMs available from their respective companies that publish on public box lists. Variable-based configuration. Cross-OS scripts, etc. Not just that, but I want an entire networks worth of services to be brought up by just a single command. Can we do that? I don’t know, but I’m sure as hell gunna try.
Any virtual environment needs a host. In the future, this host may be a Proxmox box, as it’s a maturing technology with a lot of potential, and momentum, at least at the moment. However, if I’ve learned anything in the limited time that I’ve served in a Sysadmin role, it is to KISS. (hello lllllllladies…) So with that in mind, I’ll be using a vanilla Fedora box (version 26 at the time of writing) as the host.
After the initial minimal install and a
yum update it’s time to take a look at several technologies to install.
Some might choose to install ansible in a virtualenv, and to them I say you’re a better admin than I. For now, I will be using a system-wide install of everything (KISS, remember?) and installing it along with the others. This will handle our post-creation configuration of the servers, as well as the maintenance of the VMs.
$ sudo dnf install ansible
This is a wrapper script around
virt-install which brings up a virtual machine in libvirt as a minimal install without having to intervene (except for the password prompt at the beginning). Known as an “unattended install”, this is crucial to be able to spin up a network with minimal input.
command -v dnf && dnf=dnf || dnf=yum sudo $dnf install http://people.redhat.com/rsawhill/rpms/latest-rsawaroha-release.rpm sudo $dnf install upvm sudo /usr/share/upvm/initial-setup upvm -h
Libvirt is necessary for the virtual networking that is going to be leveraged to create our network. But moreso, it is our virtualization engine, capable of giving our machines life in the first place.
$ sudo dnf group install --with-optional virtualization
Since I plan on administering this as a non-root user, I will have to set up administration with
libvirt that allows non-root users. First, add your admin user to the
$ sudo usermod -a -G kvm <admin_user> $ sudo usermod -a -G libvirt <admin_user>
Then, allow libvirt to be managed by a non-root account via
$ sudo cat << EOF > /etc/polkit-1/localauthority/50-local.d/50-libvirt-remote-access.pkla [libvirt Management Access] # For allowing access to specific user only: #Identity=unix-user:<admin_user> # For allowing access to a group: Identity=unix-group:libvirt Action=org.libvirt.unix.manage ResultAny=yes ResultInactive=yes ResultActive=yes EOF
Lastly, make sure that
/dev/kvm is owned by group
$ sudo chown root:kvm /dev/kvm
This is going to be the most difficult and most essential portion of the entire setup. First, let’s get libvirt up and running.
$ sudo systemctl enable libvirtd && sudo systemctl start libvirtd
In order to visualize what we’re looking to create, we can refer to libvirt’s logical diagram of a network architecture. Of course, it’s not going to look exactly like this, but similar.
For one, the initial passthrough is only going to be going to our virtual pfSense router. From there, we start to architect our concentric rings of security, each separated by a firewall and other security mechanisms:
Internet –> (Host –>) DMZ –> Business Logic/Data –> Shared Services
Let’s create the DMZ first:
$ sudo cat << EOF > /etc/libvirt/qemu/networks/dmz.xml <network> <name>dmz</name> <uuid>ff82a960-6a77-4372-b563-9694622643f0</uuid> <bridge name='virbr1' stp='on' delay='0'/> <mac address='52:54:00:24:48:02'/> <domain name='dmz'/> </network> EOF
Since the DMZ will house multiple servers, we have to configure a LAN there. Same with the BL/D and the Shared Services layers. And not to get ahead of ourselves, but each service within those LANs needs to be segmented off from all of the others, which we will accomplish using VLANs. But first things first, let’s get a bridge from our Internet (host ethernet port) to our first router/firewall that sits in front of our DMZ.
So to go from a physical ethernet port to a VM allowing direct access (even giving it its own IP address from the external router) add a network interface definition like the following in your domain XML file. Make sure to change ETHERNET_DEVICE to the ethernet port that you wish to bind it to.
<devices> <interface type='direct'> <mac address='d0:0f:d0:0f:00:01'/> <source dev='<ETHERNET_DEVICE>' mode='bridge'/> </interface> </devices>
You’ll also want to create a second interface that can be attached to the bridge to connect to the rest of the VMs in the DMZ.
This first device should be a router/firewall. I recommend pfSense, as that is what I will be using.
Initial DMZ router/firewall setup
- WAN: virtbr0
- LAN: re0
- IP Address
- virtbr0: DHCP
- re0: 10.0.0.1/24
Building out the rest of the subnets
Since we’re connecting the rest of the networks in the same way, the config can be duplicated all the way down, with a router inbetween the DMZ and BL/D layers, the BL/D and Shared Services layers, and the Shared Services and Hub layers (cubicle layer).
At that point, we need to work our way back to access the web interfaces of the various routers. The initial one is easy from a hub network b/c it’s connected to that same LAN.
The router’s interface on the Hub layer is actually the WAN interface, as we want to consider Shared Services a lower layer of security than our general-purpose internet access layers.
We can use the console to access the pfSense VM and disable pf with
pfctl -d. This allows us to access the web interface in order to configure the firewall.
After that, it will walk you through the setup wizard. You might have to re-connect via the console and shut of
pf again after the reload. After that, several things have to happen. First, turn off DHCP by going to Services -> DHCP Server: Uncheck
Enable DHCP server on LAN interface.
Next, several firewall rules should be set up so that we can re-enable
pf. The first one is to allow
ICMP packets across the router. You can do this via Firewall -> Rules -> Floating -> Add:
- Edit Firewall Rule: Action: Pass Interface: WAN, LAN Protocol: ICMP ICMP Subtypes: any
- Source: Any
- Destination: Any
- Description: Allow all pings anywhere
Instead of Any/Any for the Src/Dest, on the DMZ, BLD or SHR-only routers, you can set Source and Dest to: [Network, 10.0.0.0/16], and make sure to not select “Quick Apply”, in order to define application-specific firewall functions later on.
Then we add the ability to manage the router over
https on Firewall -> Rules -> WAN -> Add:
- Edit Firewall Rule: Action: Pass Interface: WAN Protocol: TCP
- Single host or alias
- Description: Allow *.hub to administrate router
Lastly, we add the ability to ssh into the hosts in our Shared Services LAN with Firewall -> Rules -> WAN -> Add:
- Edit Firewall Rule: Action: Pass Interface: WAN Protocol: TCP
- Description: SSH from *.hub to *.shr
Getting to Jumphost
Now we have a good chance at getting to our jumphost. This jumphost was created with the following:
upvm fedora-26 --loglevel info --hostname jmphostsy01sfho.shr.andrewcz.com --os-variant fedora26 --img-size 20G --img-format qcow2 -m 2048 -w bridge=virbr3
So it’s connected to the shr bridge that houses default gateway router as well as the jump router that we just created between
hub. So, in that case, we should be able to access it if we give it a static IP address by adding some information to the appropriate
ifcfg- entry in
DNS1=10.0.3.20 DNS2=10.0.3.40 IPADDR0=10.0.3.100 PREFIX0=24 GATEWAY0=10.0.0.1
Once we do that, we should be able to ping it by a machine in the hub network.
$ sudo traceroute -I 10.0.3.100 traceroute to 10.0.3.100 (10.0.3.19), 30 hops max, 60 byte packets router.hub (192.168.2.1) 0.898 ms 1.216 ms 2.352 ms pfsjmpsy01sbho.hub (192.168.2.10) 3.323 ms 3.382 ms 3.446 ms 10.0.3.100 (10.0.3.19) 4.336 ms 4.386 ms 4.388 ms
If if doesn’t ping right away, try stopping the
firewalld service on the host.
If it still doesn’t ping, check the routes on the remote server and make sure it has a route back to the 192.168.2.0/24 subnet.
If it still doesn’t ping, disable NAT on the router, and try to see if it works then. If not, I’m out of ideas.
Now we should be able to ssh into the box.
Setting up SHR Router/Gateway/Firewall
So in order to connect to the GUI for the SHR router/gateway/firewall, we are going to set up a SOCKS proxy on the jumphost, seeing as we’ve already set up firewall rules to allow ssh connections through into that network, and not to redirect port 443 traffic into the network. Waste not, want not!
So we establish a SOCKs proxy to our jumphost on port
ssh -D 8080 -C -N firstname.lastname@example.org
Then we configure a separate (or Private Browsing or Containers) session to use the SOCKs proxy we’ve set up on the localhost. In
about:preferences, In the first ‘General’ tab, click the ‘Settings’ button under ‘Network Proxy’ and configure:
- SOCKS Host: localhost Port: 8080
Navigating to the SHR Router/Gateway/Firewall’s internal IP address will then be able to display the login screen for that router. You can then do an initial wizard configuration.
If we set up the router correctly, the upstream gateway should be the BLD subnet’s router, which we are going to configure next. Since this means that the jumphost should have access to this as well, we will want to use the same SOCKS proxy to connect to the web interface of that router using the same setup.
All pfSense boxes need to turn off packet checksum offloading in order to route packets between two virtual bridges. This took me a week to figure out, and lots of
Setting up BLD and DMZ Router/Gateway/Firewall
Given that we can establish connections to the other side of the SHR router, then we can use our SOCKS proxy to access the webGUIs for the BLD and DMZ subnets. However, two things need to happen on the routers to get connectivity.
- Set route back to SHR subnet
- Disable Firewall
Since we’re using virt-manager, we can open up the pfSense box’s terminal, like we would with a crash cart on a local physical server in a datacenter. First, we need to turn off the firewall like we did before:
# pfctl -d
And then we need to add the route. For instance, the BLD router’s LAN IP address is 10.0.2.11, whereas the router that connects to the 10.0.3.0/24 subnet is located at 10.0.2.12. We now add a route to the pfSense box that will route packets correctly back to the SHR subnet.
# route add 10.0.3.0/24 10.0.2.12
This will not survive a reboot, so this needs to be set in the GUI as shown below.
After this, the router’s GUI should be able to be accessed through the SOCKs proxy. Set the appropriate settings through the setup wizard after logging in for the first time (admin/pfsense) and change the passwd. Note that after the wizard, the changes above will have to be redone, as the routes and firewall are reset after the wizard.
Routing to add the static routes, add the downstream gateway and then add a static route for that gateway for the appropriate subnets. For instance, the DMZ router can have a route to
10.0.2.0/23 which would encompass
#### We don’t actually want to route back to the hub, as we want to force jumphost usage to interact with the hosting environments.
upvm, we should know where we want to put the VM that we’re creating, and what hostname it should have, etc.
upvm fedora-26 --loglevel info --hostname jmphostsy01dfho -n dmz-jmphostsy01dfho --os-variant fedora26 --img-size 20G --img-format qcow2 -m 2048 -w bridge=virbr1
Once we get dropped to a login shell, we can work on the box as root. However, there is more networking to do, go figure. Let’s look at the initial setup that upvm gets us:
# cat /etc/sysconfig/network-scripts/ifcfg-ens2 # Generated by dracut initrd NAME="ens2" DEVICE="ens2" ONBOOT="yes" NETBOOT="yes" UUID="25ba4fee-de25-410d-b58c-dbd3103e0d94" IPV6INIT="yes" BOOTPROTO="dhcp" TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no"
We need to change a couple things:
BOOTPROTO="none" needs to be set for static DNS. Then we also need to configure the static DNS of the box. We will need to configure this in DNS in the appropriate pfSense box when we add it. But for the time, let’s just get the box up and running with an IP address. Make sure that the
HWADDR setting is set to whatever the
link/ether value is when issuing
# ip a.
IPADDR0=10.0.2.100 PREFIX0=24 GATEWAY0=10.0.2.11 HWADDR=00:00:00:00:00:00 DNS1=10.0.2.11 DOMAIN=<subdomain>.opensource.osu.edu
for Debian/Ubuntu, edit
/etc/network/interfaces to include:
auto eth0 iface eth0 inet static address 192.168.11.100 netmask 255.255.255.0 gateway 192.168.11.1 dns-domain example.com dns-nameservers 192.168.11.1
Then a full reboot is needed. Theoretically, a reload of
NetworkManager should be sufficient, but I’ve never found that to be the case.
When we’re talking about networking, the firewall rules have to be considered. We’ll want to work from the outside in. First of all, we’ll work on the DMZ router.
Default LAN rules
Since we have no services yet, we’ll just close off all inbound ports from the WAN in the DMZ. Also, we don’t want any connection from the internal network to the WAN yet, however, we want to allow connectivity within the internal network. So we’ll create a rule that allows any LAN address to any other LAN address as long as it comes in on the LAN interface. We’d want to default deny from the WAN. So we’ll add a
/16 to the firewall rules, for source and destination, on the LAN interface.
Then we want to make sure we only administrate the firewalls from the Shared Services subnet. So we’ll create a similar anti-lockout rule, allowing all from the interfaces on the SHR subnet to port 443 on the firewall, and then change the webconfiguration to SSL. Finally, we disable the anti-lockout rule that allows all networks to access the web GUI.
Next, we want to disable NATting. This is done by going to
Firewall -> NAT -> Outbound and selecting “Disable Outbound NAT rule generation”.
We have to set the static routes on the boxes, otherwise we end up with asymmetrical routing.
The SHR network is the only one that should be accessing the WebGUIs of the routers. With the Cisco hierarchical model having the Access layer comprised of switches, if we imagine one switch connecting the LAN of the core router to the WAN of all other routers, then routing from SHR internal would suffer asymmetric routing.
This because the source of 10.0.3.0/24 would be routed to the core gateway during the tcp ack and all other return routes. However, the initial route there would be direct from the SHR router, since the interface is on a shared network with the other routers’. and as the router’s interfaces are typically 172.16.0.13/24, then we need to make sure that the initial packet gets routed through the default gateway. This can be done by putting a firewall rule at the top of the LAN interface of the SHR router dictating that all 172.16.0.0/24 traffic have a gateway of 172.16.0.10 (the upstream gateway).
Luckily since the routes are going to be both to and from the internal networks, we don’t have to worry about this on any other routers.
Make sure it’s for all types of connections, and not just TCPv4
Also, for some reason, it doesn’t like traceroute’s UDP packets, but if you traceroute with the
-T flag, you’ll be just fine. In fact, if you try to
curl -L from the jmphost to the router, it will route successfully. Most likely this is because the core router is only set up to allow TCPv4 and ICMP through from the LAN subnets (come to think about it). You might consider changing this to IPv4*.
- Allow http and https in the firewall if connecting from WAN
- System -> Advanced: Change to https
add floating rule for all interfaces to allow icmp
only let jumphosts initiate to outside internet
Only let jumphosts into LAN Net all. Only allow specific hosts from DMZ to other specific hosts in BLD.
- Get the interface hardware address to put in
- Put the following into
Create the local admin account to be used in a fallback scenario, and disable root.
$ sudo useradd -G wheel <localadmin> $ sudo passwd <localadmin>
In Fedora boxes, symlink
/usr/bin/python3 for ansible.
ln -sT /usr/bin/python3 /usr/bin/python
There are going to be three groups of users, as follows:
- Each app would have its own admin
- App admin would be the same in both envs
- App admin would not have, or have limited sudo ability (only when it would be to restart services, etc.)
- Per-environment admins
- Local account in case of failure
- Complicated password
- no keys
- Global admin
- SHR admin would have keys to production layer
- Global admin would be pseudo-root admin to all servers
- Only user to access servers in SHR besides jumphosts
But any running services are done with service accounts - not regular users/admins.
LAN net will have to resolve DNS queries, so in a default LAN net to ANY rule, you’ll have to allow UDP over port 53 (DNSSEC will eventually need TCP as well)
Router needs to disable DHCP on LAN - which can be done when setting the IP address in the console or in the Services –> DHCP Server tab.
Enter an option: 12 pfSense shell: global $config; pfSense shell: $config = parse_config(true); pfSense shell: $config['system']['webgui']['nohttpreferercheck'] = true; pfSense shell: echo "Disabling HTTP referer check..."; pfSense shell: write_config("PHP shell disabled HTTP referer check"); pfSense shell: echo "done.\n"; pfSense shell: exec
- Needs to have the static routes for all the subnets
- Needs to allow traffic from subnets, not just “LAN Network”
SOCKs proxy if the core router is NAT’d
You’ll want to maintain a persistent connection that forwards a local port on the server to the jumphost, so you can ssh to that port and connect your own socks proxy to it. Yikes.
Basically, you’re socks proxying to a socks proxy. Needless to say, you’ll want socksv5+
Host (in a tmux session)
$ ssh -D 8080 -C -N email@example.com -p 13022
$ ssh -L 9999:localhost:8080 firstname.lastname@example.org -N
take out NATting
Disable anti-lockout rule and add block src LANnet dest this firewall(self)
Disable Hardware Checksum offloading (must reboot after)
Password protect the console(?)
remove first WAN contact and port forward
allow admin to get to core firewall
vtnet0 should always be core (just the way I built them in virt-manager)
Allow jumphost servers into networks over port 22 from WAN
SSH causes out-of-state
reset the ssh socks proxy if you’re getting TCP:PA and TCP:FPA packets blocked.
$ sudo dnf install -y libvirt-daemon-lxc python2-libguestfs $ git clone https://github.com/virt-manager/virt-bootstrap.git && cd virt-bootstrap $ sudo pip install passlib $ ./setup.py build $ sudo ./setup.py install $ sudo mkdir /var/lib/libvirt/filesystems/container1 $ sudo virt-bootstrap virt-builder://fedora-26 /var/lib/libvirt/filesystems/container1
- New Machine
- Operating system container
- Browse to
- Set RAM (256MB) and CPUs (1)
- Network selection (DMZ/BLD/ADM/VIP)
# sudo dns install -y sssd openldap-clients # # get /usr/local/etc/openldap/certs/cacert.pem from ldap server to mkdir /etc/pki/dev.andrewcz.com and chown root:root and chmod 600 # cat << EOF > /etc/sssd/sssd.conf [sssd] config_file_version = 2 domains = dev.andrewcz.com services = nss,pam debug_level=9 ldap_tls_cacert = /srv/tls/cacert.pem [nss] debug_level=9 [pam] debug_level=9 [domain/dev.andrewcz.com] # used for testing/troubleshooting enumerate = true debug_level=9 auth_provider = ldap id_provider = ldap chpass_provider = ldap ldap_uri = ldap://ldapmstin01bsdh.shr.dev.andrewcz.com ldap_search_base = dc=dev,dc=andrewcz,dc=com ldap_user_search_base = ou=people,dc=dev,dc=andrewcz,dc=com ldap_user_uid_number = uidNumber ldap_user_gid_number = gidNumber ldap_user_fullname = gecos ldap_user_home_directory = homeDirectory ldap_group_search_base = ou=groups,dc=dev,dc=andrewcz,dc=com ldap_group_name = cn ldap_group_member = memberUid access_provider = simple simple_allow_users = smacz simple_allow_groups = jumphostAdmins ldap_schema = rfc2307 ldap_use_start_tls = true cache_credentials = false # diff /etc/pam.d/password-auth-ac /etc/pam.d/system-auth-ac # cat /etc/pam.d/system-auth-ac #%PAM-1.0 # This file is auto-generated. # User changes will be destroyed the next time authconfig is run. auth required pam_env.so auth required pam_faildelay.so delay=2000000 auth sufficient pam_unix.so nullok try_first_pass auth requisite pam_succeed_if.so uid >= 1000 quiet_success auth sufficient pam_sss.so forward_pass auth required pam_deny.so account required pam_unix.so account sufficient pam_localuser.so account sufficient pam_succeed_if.so uid < 1000 quiet account [default=bad success=ok user_unknown=ignore] pam_sss.so account required pam_permit.so password requisite pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type= password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok password sufficient pam_sss.so use_authtok password required pam_deny.so session required pam_oddjob_mkhomedir.so skel=/etc/skel umask=0077 session optional pam_keyinit.so revoke session required pam_limits.so -session optional pam_systemd.so session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid session required pam_unix.so session optional pam_sss.so # cat /etc/nsswitch.conf passwd: sss files systemd shadow: files sss group: sss files systemd hosts: files dns myhostname bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files sss netgroup: nisplus sss publickey: nisplus automount: files nisplus aliases: files nisplus # dnf install -y oddjob-mkhomedir # systemctl start oddjobd # systemctl start sssd # cat << EOF > /etc/openldap/ldap.conf # # LDAP Defaults # # See ldap.conf(5) for details # This file should be world readable but not world writable BASE dc=dev,dc=andrewcz,dc=com URI ldap://ldapmstin01bsdh.shr.dev.andrewcz.com:389 #SIZELIMIT 12 #TIMELIMIT 15 #DEREF never #TLS_CACERTDIR /etc/openldap/certs #TLS_REQCERT demand TLS_CACERT /srv/tls/cacert.pem # Turning this off breaks GSSAPI used with krb5 when rdns = false SASL_NOCANON on