The Education of a SysAdmin: HP

Showing posts with label HP. Show all posts

Sunday, November 22, 2015

A wide-open Iptables firewall with NAT

I recently had to set up an Iptables firewall to enable ip-forwarding / NAT for the compute nodes in the isolated cluster network. The customer was using a hardware monitoring system that needed to pass messages to a server outside the cluster.

Security and access to the cluster were being handled by network department's switches and routers so the Iptables firewall was really not necessary for security purposes.

The software that was to be run on this cluster had a ton of ports that needed to be opened to the outside, and typically the software company recommended keeping firewalls disabled for simplicity. (My requests for a list of ports to open were met with a lot of hemming and hawing, so I just dropped it.)

I experimented for a bit, and found some suggestions on the interwebs. The most helpful find was from Alex Atkinson on StackExchange: http://superuser.com/a/634471

This solution was implemented on RHEL 6.6: A wide-open firewall that only does masquarade of IPs.

1. vi /etc/sysctl.conf
2. change or add net.ipv4.ip_forward = 1
3. sysctl -p /etc/sysctl.conf
4. service iptables start
5. service iptables save
6. cp /etc/sysconfig/iptables /root/iptables.20151125.1308-dah
7. (A bunch of commands to build up a new rule set
iptables -F
iptables -t nat -F
iptables -A INPUT -i lo -j ACCEPT -m comment --comment "Allow all loopback traffic"
iptables -A INPUT ! -i lo -d 127.0.0.0/8 -j REJECT -m comment --comment "Drop all traffic to 127 that does not use lo"
iptables -A OUTPUT -j ACCEPT -m comment --comment "Accept all outgoing"
iptables -A INPUT -j ACCEPT -m comment --comment "Accept all incoming"
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -m comment --comment "Allow all incoming on established connections"
iptables -t nat -A POSTROUTING -o bond5 -j MASQUERADE -m comment --comment "Masquarade traffice headed out bond5"
iptables -A FORWARD -j ACCEPT -m comment --comment "Accept all forwarding"

8. service iptables save
9. The resulting iptables file
#Generated by iptables-save v1.X.X on Wed Nov 25 13:09:31 2015
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A POSTROUTING -o bond1 -m comment --comment "Masquarade traffic headed out bond1" -j MASQUERADE
COMMIT
# Completed on Wed Nov 25 13:09:31 2015
# Generated by iptables-save v1.X.X on Wed Nov 25 13:09:31 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i lo -m comment --comment "Allow all loopback traffic" -j ACCEPT
-A INPUT -d 127.0.0.0/8 ! -i lo -m comment --comment "Drop all traffic to 127 that does not use lo"
-A INPUT -m comment --comment "Accept all incoming" -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -m comment --comment "Allow all incoming on established connections" -j ACCEPT
-A FORWARD -m comment --comment "Accept all forwarding" -j ACCEPT
-A OUTPUT -m comment --comment "Accept all outgoing" -j ACCEPT
COMMIT
# Completed on Wed Nov 25 13:09:31 2015

10. Set default route on all the compute nodes to the bond5 interface on the head node.

DONE!

Monday, February 23, 2015

HP INSIGHT CMU MONITORING and ALERTS of HP SMART ARRAY HARD DRIVES

Setting up monitoring, alerts, and alert_reactions for hard drives attached to an HP Smart Array Controller.

By David Holton, HP

Background

A customer wanted HP Insight CMU to warn him of disk problems in their new Hadoop cluster. The request was for HP Insight CMU to send an email with a warning of possible disk issues. During the initial integration of the cluster, other challenges consumed any time that could be dedicated to this request. During a subsequent visit, there was enough time to workout a solution to meet this need.

Requirements

Use the iLO4's AMS capability to get the data from sensors.
Show the status in the GUI for each node.
Show an alert if the status changes.
Send an email when an alert is raised.

I decided that the solution should rely on the CPU and the OS as little as possible for two reasons:

Some instances of disk problems cause disk-based OS processes to appear to be hung.
Minimize any impact to Hadoop processes.

Limitations:

This solution does not monitor or alert on disk drives controller by the AHCI controller, or the AHCI driver in the OS.
While setting up the alert I realized that the CPU and OS on each node would have to be involved to some degree. I was not able to find a solution around this in the short time I had working out this solution. See the Alerts section of this paper for details.
I was not able to find out what other possible values SNMP would report back for a non-OK disk. Therefore this solution only reports that disk status has changed. There is no attempt to indicate in what way it has changed. The SysAdmin will have to check on the system to determine if action needs to be taken.

MONITORING

Setting up the monitoring of disks controller by HP Smart Array controllers was a relatively straight forward process following the instructions and guidelines found in the HP Insight CMU User's Guide for version 7.2 section 6.5.9.1. I don't think it necessary to reproduce all the steps in the manual in this paper, but an overview of the tasks are:

Enable iLO4 AMS extended metric support in HP Insight CMU's GUI interface.
Configure the iLO4's SNMP port using HP Insight CMU's AMS menu made available in step 1.
Check that SNMP access is working using snmpwalk and snmpget. This also verifies that the system has the correct SNMP related RPMs installed. Use the example commands listed in the User Guide.
Use the “Get/Refresh SNMP data” menu item to gather initial data from all the configured iLOs in the cluster.
Configure HP Insight CMU to pick out the required metrics using the data gathered in step 4.
Configure the ActionsandAlertsFile.txt to show the metrics from step 4.
Restart the Monitoring Engine.
Restart the HP Insight CMU GUI.

FYI: In step 2, the requirements for configuring the iLO are very simple. It sets the SNMP port number, which is normally already set. It also changes the community setting located in the iLO4's /map1/snmp1 to “public.”

Step 5 if accomplished by listing the needed metrics in the cmu_ams_metrics file located in /opt/cmu/etc. I tested this by only listing two of the disk metrics until I had the format settled. Here is a listing of the version of the file I came up with:

# This file is part of the CMU AMS support.

# This file maps SNMP OIDs to CMU metric names.

# First column is the SNMP OID.

# Second column is the CMU metric name.

# The optional 'SUM' keyword in the third column

# is used to add the values of multiple SNMP OIDs

# into a single CMU metric.

SNMPv2-SMI::enterprises.232.6.2.6.8.1.4.0.1 amb1_temp

SNMPv2-SMI::enterprises.232.6.2.6.8.1.4.0.2 cpu1_temp

SNMPv2-SMI::enterprises.232.6.2.6.8.1.4.0.3 cpu2_temp

SNMPv2-SMI::enterprises.232.6.2.9.3.1.7.0.1 power1 SUM power

SNMPv2-SMI::enterprises.232.6.2.9.3.1.7.0.2 power2 SUM power

SNMPv2-SMI::enterprises.232.6.2.9.3.1.7.0.3 power3 SUM power

SNMPv2-SMI::enterprises.232.6.2.9.3.1.7.0.4 power4 SUM power

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.1 sata_drv1_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.2 sata_drv2_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.3 sata_drv3_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.4 sata_drv4_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.5 sata_drv5_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.6 sata_drv6_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.7 sata_drv7_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.8 sata_drv8_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.9 sata_drv9_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.10 sata_drv10_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.11 sata_drv11_log_status

SNMPv2-SMI::enterprises.232.3.2.3.1.1.11.1.12 sata_drv12_log_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.11 sata_drv1_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.12 sata_drv2_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.13 sata_drv3_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.14 sata_drv4_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.15 sata_drv5_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.16 sata_drv6_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.17 sata_drv7_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.18 sata_drv8_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.19 sata_drv9_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.20 sata_drv10_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.21 sata_drv11_phy_status

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.1.22 sata_drv12_phy_status

The entries in cmu_amd_metrics are used by the /opt/cmu/bin/cmu_get_ams_metrics script to query the iLO, and submit the results to HP Insight CMU monitoring.

Next is the entries added to the ActionsandAlertsFile.txt that were made to show the results gathered by cmu_get_ams_metrics. Again, while setting this up, I only used a couple of disk entries.

#-------------HP iLO4 AMS------------------------------------#

amb1_temp "ambient temp" 4 numerical Instantaneous 60 Celsius EXTENDED /opt/cmu/bin/cmu_get_ams_metrics -a

cpu1_temp "CPU 1 temp" 4 numerical Instantaneous 60 Celsius EXTENDED

cpu2_temp "CPU 2 temp" 4 numerical Instantaneous 60 Celsius EXTENDED

power "Power Usage" 4 numerical Instantaneous 100 watts EXTENDED

sata_drv1_log_status "Drive 1 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv2_log_status "Drive 2 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv3_log_status "Drive 3 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv4_log_status "Drive 4 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv5_log_status "Drive 5 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv6_log_status "Drive 6 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv7_log_status "Drive 7 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv8_log_status "Drive 8 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv9_log_status "Drive 9 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv10_log_status "Drive 10 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv11_log_status "Drive 11 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv12_log_status "Drive 12 Logical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv1_phy_status "Drive 1 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv2_phy_status "Drive 2 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv3_phy_status "Drive 3 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv4_phy_status "Drive 4 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv5_phy_status "Drive 5 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv6_phy_status "Drive 6 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv7_phy_status "Drive 7 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv8_phy_status "Drive 8 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv9_phy_status "Drive 9 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv10_phy_status "Drive 10 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv11_phy_status "Drive 11 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

sata_drv12_phy_status "Drive 12 Physical Status" 6 string Instantaneous 2 2=OK EXTENDED

The “OK” SNMP result returned for the disk's logical and physical status is a 2. Since this status should not change, I did not think it needed to be part of the graph displays, so I set the metric type as “string.” I also set the action-to-perform entry to 2=OK since it was displaying any string I put in that position.

After restarting the Monitoring Engine and restarting the GUI, I was able to see these changes.

ALERT

Setting up the alert was a bit less straight forward for me. I had ideas and scripting fragments worked out for having the head node query the individual node's iLO. I failed to realize that the script to be triggered by the ActionsandAlertsFile.txt, would have to be executed on each node individually. I had wanted to avoid using the OS & CPU on each node.

This cause another problem. Because of how the networking was designed for this cluster, the Compute Nodes cannot communicate directly with their own iLO. Only the Head Node had the ability to communicate with iLOs in this cluster.

The solution I came up with was to have a small script on each node that queried its disk's status by executing another script located on the head node. If I had more time, I would have looked for a better solution.

The alert line in the ActionsandAlertsFile.txt for monitoring the disks is:

check_diskdrives "Drive status changed" 4 1 0 > status “/opt/cmu/tools/cmu_hw_monitoring.sh"

Only the indication that the drive status has changed is reported. The line evaluates the returned value to determine if it is greater than 0.

The alert line in the ActionsandAlertsFile.txt executes the cmu_hw_monitoring.sh script found in /opt/cmu/tools on each node. The script is check the Logical and Physical status of the drives. I left it with a generic name, because it could be expanded to monitor more than disks.

cmu_hw_monitoring.sh

#!/bin/bash

#name cmu_hw_monitoring.sh

HEADNODE=headnode01-cmu

MY_HOSTNAME=$(hostname)-cmu

MY_ILONAME=$(hostname)-ilo

CMUTOP=/opt/cmu

CMUCONTRIB=${CMUTOP}/contrib

RETURN=0

for NUM1 in $(seq 1 12)

DRVSTATUS=$(ssh ${HEADNODE} "${CMUCONTRIB}/snmp-hw-alert ${MY_ILONAME} 3 11 \

${NUM1}" | awk -F: '{print $4}')

if [ ${DRVSTATUS} -ne 2 ]

then

RETURN=1

break

done

for NUM2 in $(seq 11 22)

DRVSTATUS=$(ssh ${HEADNODE} "${CMUCONTRIB}/snmp-hw-alert ${MY_ILONAME} 5 37 \

${NUM2}" | awk -F: '{print $4}')

if [ ${DRVSTATUS} -ne 2 ]

then

RETURN=1

break

done

echo ${RETURN}

exit ${RETURN}

This script, obviously, passes arguments to the snmp-hw-alert script on the Head Node in the /opt/cmu/contrib directory. This script runs the SNMP command that communicates with the node's iLO to query AMS data.

#!/bin/bash

NODE_ILO=$1

SUBSYS1=$2

SUBSYS2=$3

DRVNUM=$4

snmpget -v1 -cpublic ${NODE_ILO} \

SNMPv2-SMI::enterprises.232.3.2.${SUBSYS1}.1.1.${SUBSYS2}.1.${DRVNUM}

The result of this script is a digit which goes back to the cmu_hw_monitoring.sh script where it is evaluated to see if it is not equal to 2.

If the result does not equal 2, the value of $RETURN is changed from 0 to 1, and that loop stops processing further disks. The Alert that the disk status has changed will be raised.

ALERT_REACTION

Once the alert is raised the disk status has changed, an email will be sent to appropriate email addresses with this line from the ActionsandAlertsFile.txt:

check_diskdrives "Sending mail to root" ReactOnRaise echo -e "Alert 'CMU_ALERT_NAME' raised on node(s) CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'parted -l | grep Disk'`" | mailx -s "CMU: Alert 'CMU_ALERT_NAME' raised." root

The alert_reaction will attempt to get a list of disks from the OS to include as the body of the message.

TESTING

I tested the solution by changing the evaluation value in the cmu_hw_monitoring.sh script from 2 to 1 or 3, and alerts were raised as expected. I also had the alert_reaction send email to the local root user account, and all emails were received.

CONCLUSION

Had I more time in this environment, I would have experimented with the scripting to reduce the amount of time the script on the node has to run. One possible solution to that is possibly using variable arrays instead of using for loops.

Wednesday, September 17, 2014

Udev Incorrectly Renaming NICs in an HP CMU Cluster Running SLES 11.3

Documenting the Udev System Solution

for _______'s Cloudera Cluster.

By David Holton

Introduction:

The problem and solution described here is for a Cloudera Hadoop cluster using HP hardware running SLES 11.3. This cluster utilizes HP's Insight CMU software to manage provisioning, monitoring, and alerting on worker/compute nodes.

NOTE: Names of systems, devices, networks, vlans etc., have been changed from the system where this challenge was encountered. A small device naming order was replicated that made this problem a bit more interesting, and allows a bit more clarity to how the solution was implemented.

Definitions:

A typical, reference architecture, Cloudera cluster from HP has the same base components as a High Performance cluster from HP. Because of that, some basic HP definitions are needed to keep track of what components perform what function. I find that a lot of confusion can be avoided if everyone uses the same terminology in the same way. Consistency is the key.

There are typically three or four networks involved:

Admin network: this is the just a basic ethernet network, internal to the cluster, over which basic system admin commands and function are performed. This is the network that CMU uses to communicate with the OS running on each node. AKA: CMU network, Management network,

Console network: this is another internal cluster network to which all node iLOs are connected. The Virtual Serial Port of each node's iLO is very important to troubleshooting, and it is made available on this network. This network also give CMU control of power on each node. Additionally, the Admin and Console networks are usually connected together; many times they share the same IP subnet. AKA: iLO network, BMC network,

Enterprise network: this is the corporate network that connects to the cluster, and allows remote access. There are many ways this network is attached to clusters. AKA: User network, Data Network, Company network, Campus network,

Sometimes there is a separate, High-Speed network. The names and uses of this network vary greatly. It is typically used to load data quickly and efficiently on the cluster, and allow message passing between applications running on compute nodes. Normally, no users nor administrative functions are allowed on this network. In an HPC system this is normally an Infiniband network. In a Cloudera cluster it is usually a 10GB Ethernet network. AKA: HSI, MPI network, IB network, 10G network,

There are various type of nodes found in clusters.

Head Node: this is typically the main server from which all CMU and administrative functions originate. AKA: management server, CMU server,

Compute Nodes: on these nodes the actual work of a cluster (HPC or Hadoop) take place. AKA: worker nodes, computes,

Utility Nodes: These can perform many different functions: Database, Application, raw data store, Job Scheduler/Control, Resource Manager, Hadoop Name Nodes, Login nodes, Edge nodes

The Problem:

In the particular cluster where this problem was encountered, the internal networks were a bit unusual.

Admin network - is a separate, flat, internal, CMU-only network.

Cloudera network - is a high-speed, routed, internal, Cloudera-only network.

These two networks had to be kept separate. Typically data wanted to default to the flat, admin network. We want default traffic to use the Cloudera-only network.

Console network - is an externally connected network. One interface on the Head Node had to be configured for this network for CMU to have iLO access to the nodes.

Enterprise Network - is a high-speed, external network that attaches to data as well as allows remote access for users and administrators.

When the ProLiant servers boot using SLES 11.3, the device naming of the ethernet cards will not necessarily be the same as the previous times the system was booted.

Device naming of network interfaces is controlled by use of the 70-persistent-net.rules file in the /etc/udev/rules.d directory. It is a trivial thing to edit this file, on individual systems, to assign the ethX device names as desired. Every system boot after that should maintain consistent device names.

In a CMU environment we are going to be capturing an image of a Golden Node, and cloning any number of systems in the same CMU Logical Group, with that image. Once the cloned nodes come up, it is most likely the interface device names may come up differently. Whether or not CMU cleans the persistent rules file, new rules will be generated on each freshly cloned node. This is because Udev will encounter missing MAC addresses or MAC addresses that are different from what it find in the rules file. So if eth5 & eth6 are connected to separate networks, and the rules reverse them,
they may show as UP in the output of 'ifconfig' or 'ip addr show', but they will not have the proper IP addresses. Therefore, they will not be able to communicate on the networks to which they physically attach.

A solution is needed to configure network rules in udev, on the fly, that name the network interfaces in a consistent manner, to match the device names on the Golden Node. This is the only way that bonded and tagged vlan interfaces will remain consistent.

(NOTE: each Linux distribution may interpret Udev rules in slightly different ways, so the same solution for one distribution, may not work for another. Some rule syntax experimentation may be needed.)

The Desired Configuration:

To summarize the names of NICs and bus locations we want to see on the systems with the most active ethernet interfaces:

Bus Pos.   EQ5 EQ6 EQ7
------- ---- ---- ----
03.00.0 eth0 eth0 eth0
03.00.1 eth1 eth1 eth1
03.00.2 eth3 eth3 eth3
03.00.3 eth4 eth4 eth4
-
04.00.0 eth2 eth2   eth2
04.00.1 eth5 eth5   eth5
-
24.00.0 eth6 eth6   eth6
24.00.1 eth7 eth7   eth7
-
27.00.0 eth8 eth8   eth8
27.00.1 eth9 eth9   eth9

This is the desired IP configuration:

eth0 configured to the 192.168.11.0/24 network.
bond0 is eth2 & eth5 configured for the 192.168.20/22 network.
bond1 is eth6, eth7, eth8, & eth9 configured for 3 VLANs.
VLAN1 on 192.168.23.0/19 network.
VLAN2 on 192.168.64.0/19 network.
VLAN3 on 192.168.96.0/19 network.
Unused: eth1, eth3, & eth4

Needed Solution:

Using CMU's post-cloning script, reconf.sh, capture the MAC address of the NICs while the system is still net booted. Debian seems to initialize and name the disks based on the order they are found on the system bus. This will give us consistent locations of the NICs on other systems configured exactly like the Golden Node. Once the MAC addresses are captured, they are inserted into a rule set, one per NIC, in a newly constructed, persistent rules file.

When the system boots, as long as the rule is properly constructed and contains the proper elements, udev will name the NICs as instructed.

Detailed Example:

First I want to show how three of the systems booted up without any rules predefined in the 70-persistent-net.rules file. All three systems have identical hardware, are running SLES 11.3, and have the same BIOS firmware.

Manufacturer: HP
Product Name: ProLiant DL380p Gen8

EQ5-Node-diskboot-NOrules-Dmidecode.out.txt- Version: P70
EQ5-Node-diskboot-NOrules-Dmidecode.out.txt- Release Date: 02/10/2014
--
EQ6-Node-diskboot-NOrules-Dmidecode.out.txt- Version: P70
EQ6-Node-diskboot-NOrules-Dmidecode.out.txt- Release Date: 02/10/2014
--
EQ7-Node-diskboot-NOrules-Dmidecode.out.txt- Version: P70
EQ7-Node-diskboot-NOrules-Dmidecode.out.txt- Release Date: 02/10/2014

All three nodes initialized the Broadcom and Intel NICs in a different order. This made some interfaces unusable, and some bonded interfaces were either running at reduced capacity or were completely non-functional.

Nodes Booted Without Rules:

From EQ5:
From the Console messages during boot we can see:
Setting up (localfs) network interfaces:
lo
lo IP address: 127.0.0.1/8
IP address: 127.0.0.2/8 done
eth0 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
eth0 IP address: 192.168.11.15/24 done

eth1 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
No configuration found for eth1 unused

eth2 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth3 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
No configuration found for eth3 unused

eth4 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
No configuration found for eth4 unused

eth5 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth6 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth7 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth8 device: Broadcom Corporation NetXtreme BCM5719 Gigabi done

eth9 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

bond0
bond0 enslaved interface: eth2
bond0 enslaved interface: eth5
bond0 IP address: 192.168.20.15/22 done

bond1
bond1 enslaved interface: eth8
bond1 enslaved interface: eth9
bond1 enslaved interface: eth6
bond1 enslaved interface: eth7 done

vlan1
vlan1 IP address: 192.168.23.30/19 done

vlan2
vlan2 IP address: 192.168.64.35/19 done

vlan3
vlan3 IP address: 192.168.93.12/19 done
Setting up service (localfs) network . . . . . . . . . . done

The NICs came up in the following order:
Broadcom
Broadcom
Intel
Intel
Broadcom
Intel
Intel
Intel
Broadcom
Intel

From the booted system we can see the bus paths are not in order when we
sort by NIC name.

ls -l /sys/class/net/eth*
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.1/net/eth1
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.0/net/eth2
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth3 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.1/net/eth3
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth4 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.2/net/eth4
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth5 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.0/net/eth5
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth6 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.1/net/eth6
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth7 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.0/net/eth7
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth8 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.3/net/eth8
lrwxrwxrwx 1 root root 0 Sep 11 08:46 /sys/class/net/eth9 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.1/net/eth9

From EQ6
From the Console messages during boot we can see:
Setting up (localfs) network interfaces:
lo
lo IP address: 127.0.0.1/8
IP address: 127.0.0.2/8 done

eth0 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
eth0 IP address: 192.168.11.16/24 done

eth1 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
No configuration found for eth1 unused

eth2 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth3 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
No configuration found for eth3 unused

eth4 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
No configuration found for eth4 unused

eth5 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth6 device: Broadcom Corporation NetXtreme BCM5719 Gigabi done

eth7 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth8 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth9 device: Broadcom Corporation NetXtreme BCM5719 Gigabi done

bond0
bond0 enslaved interface: eth2
bond0 enslaved interface: eth5
bond0 IP address: 192.168.20.16/22 done

bond1
bond1 enslaved interface: eth8
bond1 enslaved interface: eth9
bond1 enslaved interface: eth6
bond1 enslaved interface: eth7 done

vlan1
vlan1 IP address: 192.168.23.31/19 done

vlan2
vlan2 IP address: 192.168.64.36/19 done

vlan3
vlan3 IP address: 192.168.93.13/19 done
Setting up service (localfs) network . . . . . . . . . . done

On this system the initialized order (and naming) is:
Broadcom
Intel
Intel
Broadcom
Intel
Intel
Broadcom
Intel
Intel
Broadcom

...and the sorted by NIC name we have:
ls -l /sys/class/net/eth*
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth1 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.0/net/eth1
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.1/net/eth2
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth3 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.1/net/eth3
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth4 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.0/net/eth4
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth5 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.1/net/eth5
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth6 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.2/net/eth6
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth7 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.0/net/eth7
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth8 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.1/net/eth8
lrwxrwxrwx 1 root root 0 Sep 11 08:57 /sys/class/net/eth9 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.3/net/eth9

From EQ7
Again the NICs are initialized in a different order:
Setting up (localfs) network interfaces:
lo
lo IP address: 127.0.0.1/8
IP address: 127.0.0.2/8 done

eth0 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
eth0 IP address: 192.168.11.17/24 done

eth1 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
No configuration found for eth1 unused

eth2 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth3 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
No configuration found for eth3 unused

eth4 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+
No configuration found for eth4 unused

eth5 device: Broadcom Corporation NetXtreme BCM5719 Gigabi done

eth6 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth7 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth8 device: Broadcom Corporation NetXtreme BCM5719 Gigabi done

eth9 device: Broadcom Corporation NetXtreme BCM5719 Gigabi done

bond0
bond0 enslaved interface: eth2
bond0 enslaved interface: eth5
bond0 IP address: 192.168.20.17/22 done

bond1
bond1 enslaved interface: eth8
bond1 enslaved interface: eth9
bond1 enslaved interface: eth6
bond1 enslaved interface: eth7 done

vlan1
vlan1 IP address: 192.168.23.32/19 done

vlan2
vlan2 IP address: 192.168.64.37/19 done

vlan3
vlan3 IP address: 192.168.93.14/19 done
Setting up service (localfs) network . . . . . . . . . . done

The order is:
Broadcom
Intel
Intel
Intel
Intel
Broadcom
Intel
Intel
Broadcom
Broadcom

ls -l /sys/class/net/eth*
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth1 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.0/net/eth1
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.1/net/eth2
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth3 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.0/net/eth3
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth4 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.1/net/eth4
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth5 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.1/net/eth5
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth6 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.0/net/eth6
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth7 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.1/net/eth7
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth8 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.2/net/eth8
lrwxrwxrwx 1 root root 0 Sep 11 05:27 /sys/class/net/eth9 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.3/net/eth9

Summary of Random NICs

To summarize the names give by each of these three systems and the system bus location:

Bus Pos. EQ5 EQ6 EQ7
-------- ---- ---- ----
03.00.0 eth0 eth0 eth0
03.00.1 eth1 eth3 eth5
03.00.2 eth4 eth6 eth8
03.00.3 eth8 eth9 eth9
04.00.0 eth2 eth1 eth1
04.00.1 eth3 eth2 eth2
24.00.0 eth5 eth4 eth3
24.00.1 eth6 eth5 eth4
27.00.0 eth7 eth7 eth6
27.00.1 eth9 eth8 eth7

Now from a CMU Net Booted Node:

Now we take EQ7 and boot it via CMU's net boot image, and observe how the NICs are named.

From a Net Booted Node we can get reliable MAC addresses from all of the systems. The net boot image is Debian running in a RAMdisk, which is naming the nodes in the same order on each system by default.

EQ7
From dmesg: We can see that the NICs are named in order of appearance on the system bus while the system is being initialized.

[ 32.308843] tg3 0000:03:00.0: irq 286 for MSI/MSI-X
[ 32.308853] tg3 0000:03:00.0: irq 287 for MSI/MSI-X
[ 32.308863] tg3 0000:03:00.0: irq 288 for MSI/MSI-X
[ 32.308873] tg3 0000:03:00.0: irq 289 for MSI/MSI-X
[ 32.308883] tg3 0000:03:00.0: irq 290 for MSI/MSI-X
[ 33.518953] tg3 0000:03:00.1: irq 291 for MSI/MSI-X
[ 33.518963] tg3 0000:03:00.1: irq 292 for MSI/MSI-X
[ 33.518974] tg3 0000:03:00.1: irq 293 for MSI/MSI-X
[ 33.518984] tg3 0000:03:00.1: irq 294 for MSI/MSI-X
[ 33.519001] tg3 0000:03:00.1: irq 295 for MSI/MSI-X
[ 33.648691] tg3 0000:03:00.2: irq 296 for MSI/MSI-X
[ 33.648701] tg3 0000:03:00.2: irq 297 for MSI/MSI-X
[ 33.648712] tg3 0000:03:00.2: irq 298 for MSI/MSI-X
[ 33.648729] tg3 0000:03:00.2: irq 299 for MSI/MSI-X
[ 33.648739] tg3 0000:03:00.2: irq 300 for MSI/MSI-X
[ 34.007534] tg3 0000:03:00.3: irq 301 for MSI/MSI-X
[ 34.007545] tg3 0000:03:00.3: irq 302 for MSI/MSI-X
[ 34.007556] tg3 0000:03:00.3: irq 303 for MSI/MSI-X
[ 34.007566] tg3 0000:03:00.3: irq 304 for MSI/MSI-X
[ 34.007576] tg3 0000:03:00.3: irq 305 for MSI/MSI-X
[ 34.500756] ixgbe 0000:04:00.0: registered PHC device on eth4
[ 34.751358] ixgbe 0000:04:00.0 eth4: detected SFP+: 4
[ 34.816928] ixgbe 0000:04:00.1: registered PHC device on eth5
[ 34.991322] ixgbe 0000:04:00.1 eth5: detected SFP+: 3
[ 35.059786] ixgbe 0000:24:00.0: registered PHC device on eth6
[ 35.311264] ixgbe 0000:24:00.0 eth6: detected SFP+: 6
[ 35.382667] ixgbe 0000:24:00.1: registered PHC device on eth7
[ 35.475273] ixgbe 0000:04:00.0 eth4: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 35.635214] ixgbe 0000:24:00.1 eth7: detected SFP+: 5
[ 35.655225] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 35.698550] ixgbe 0000:27:00.0: registered PHC device on eth8
[ 35.871132] ixgbe 0000:27:00.0 eth8: detected SFP+: 6
[ 35.936563] ixgbe 0000:27:00.1: registered PHC device on eth9
[ 36.139133] ixgbe 0000:27:00.1 eth9: detected SFP+: 5
[ 36.411973] tg3 0000:03:00.0 eth0: Link is up at 1000 Mbps, full duplex
[ 36.499187] tg3 0000:03:00.0 eth0: Flow control is off for TX and off for RX
[ 36.592109] tg3 0000:03:00.0 eth0: EEE is disabled
[ 36.724664] ixgbe 0000:24:00.0 eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 37.081889] ixgbe 0000:24:00.1 eth7: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 37.316612] ixgbe 0000:27:00.0 eth8: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 37.517785] ixgbe 0000:27:00.1 eth9: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 36.055141] Sending BOOTP requests .. OK
[ 43.165986] IP-Config: Got BOOTP answer from 192.168.44.11, my address is 192.168.44.90
[ 44.382179] ixgbe 0000:04:00.0: removed PHC on eth4
[ 44.524657] ixgbe 0000:04:00.1: removed PHC on eth5
[ 44.666746] ixgbe 0000:24:00.0: removed PHC on eth6
[ 44.811539] ixgbe 0000:24:00.1: removed PHC on eth7
[ 44.955456] ixgbe 0000:27:00.0: removed PHC on eth8
[ 45.099432] ixgbe 0000:27:00.1: removed PHC on eth9
[ 45.242954] IP-Config: Complete:
[ 45.285428] device=eth0, hwaddr=a0:d3:c1:fa:2a:fc, ipaddr=192.168.44.90,\ mask=255.255.255.128, gw=255.255.255.255
[ 45.426422] host=adappsrku007-cmu, domain=, nis-domain=(none)
[ 45.507822] bootserver=192.168.44.11, rootserver=192.168.44.11,\ rootpath=/opt/cmu/ntbt/rp/x86_64
[ 45.627036] nameserver0=192.168.44.11

Once the system is up, we can observe again, that the NIC names appear in system bus order.

PCI addresses of NICs:
CMU netboot adappsrku007-cmu:/tmp# cat EQ-EdgeNode-Netbooted-clioutput.txt
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.1/net/eth1
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.2/net/eth2
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth3 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.3/net/eth3
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth4 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.0/net/eth4
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth5 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.1/net/eth5
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth6 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.0/net/eth6
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth7 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.1/net/eth7
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth8 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.0/net/eth8
lrwxrwxrwx 1 root root 0 Sep 11 04:13 /sys/class/net/eth9 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.1/net/eth9

Hardware (MAC) Addresses of NICs:
for NIC in $(seq 0 9); do printf "Device eth${NIC} hardware address "; cat /sys/class/net/eth${NIC}/address; done
Device eth0 hardware address de:ad:be:ef:2a:fc
Device eth1 hardware address de:ad:be:ef:2a:fd
Device eth2 hardware address de:ad:be:ef:2a:fe
Device eth3 hardware address de:ad:be:ef:2a:ff
Device eth4 hardware address be:ef:de:ad:db:d4
Device eth5 hardware address be:ef:de:ad:db:d5
Device eth6 hardware address be:ef:de:ad:d7:fc
Device eth7 hardware address be:ef:de:ad:d7:fd
Device eth8 hardware address be:ef:de:ad:db:cc
Device eth9 hardware address be:ef:de:ad:db:cd

We can use this consistent naming order to capture the MAC address and write a udev persistent net rules file for use when the system boots from its own hard drive.

CMU reconf.sh script.

We need to write code into the reconf.sh script to write unique, MAC based rules for the
70-persistent-net.rules file for each node. First I put in the section that CMU adds automagically to the rules file which I started out using as an example. I leave it in here as reference. However, on the system this was developed for, the suggested elements and rule structure did not work.

#--custom code starts here --
#
########## Setup of the /etc/udev/rules.d/70-persistent-net.rules file correctly.
# CMU UDEV rule added at cloning time
#
# see CMU_ADD_NETBOOT_NIC_UDEV_RULE environment variable
# into /opt/cmu/etc/cmuserver.conf on the CMU management node
#
#ACTION--"add",SUBSYSTEM=="net",ATTR{address}=="a0:d3:c1:fa:2b:58",NAME="eth0"
#
########## END CMU Section
#Capture the current, net booted MACs which Debian lists in order.
ETH0_MAC=$(cat /sys/class/net/eth0/address)
ETH1_MAC=$(cat /sys/class/net/eth1/address)
ETH2_MAC=$(cat /sys/class/net/eth2/address)
ETH3_MAC=$(cat /sys/class/net/eth3/address)
ETH4_MAC=$(cat /sys/class/net/eth4/address)
ETH5_MAC=$(cat /sys/class/net/eth5/address)
ETH6_MAC=$(cat /sys/class/net/eth6/address)
ETH7_MAC=$(cat /sys/class/net/eth7/address)
ETH8_MAC=$(cat /sys/class/net/eth8/address)
ETH9_MAC=$(cat /sys/class/net/eth9/address)

# In SLES 11.3, SuSE's parsing of Udev rules seemed to need more rule
# elements than other distributions, so the system generated rule
# structure was followed.
RULES_FILE=${CMU_RCFG_PATH}/etc/udev/rules.d/70-persistent-net.rules
RULE_START='SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="'
RULE_MIDDLE='", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", '

echo "" > ${RULES_FILE}
echo '# system board NICs' >> ${RULES_FILE}
echo ${RULE_START}${ETH0_MAC}${RULE_MIDDLE}'NAME="eth0"' >> ${RULES_FILE}
echo ${RULE_START}${ETH1_MAC}${RULE_MIDDLE}'NAME="eth1"' >> ${RULES_FILE}
echo ${RULE_START}${ETH3_MAC}${RULE_MIDDLE}'NAME="eth3"' >> ${RULES_FILE}
echo ${RULE_START}${ETH4_MAC}${RULE_MIDDLE}'NAME="eth4"' >> ${RULES_FILE}
echo '# bond0 NICs' >> ${RULES_FILE}
echo ${RULE_START}${ETH2_MAC}${RULE_MIDDLE}'NAME="eth2"' >> ${RULES_FILE}
echo ${RULE_START}${ETH5_MAC}${RULE_MIDDLE}'NAME="eth5"' >> ${RULES_FILE}
echo '# bond1 NICs' >> ${RULES_FILE}
echo ${RULE_START}${ETH6_MAC}${RULE_MIDDLE}'NAME="eth6"' >> ${RULES_FILE}
echo ${RULE_START}${ETH7_MAC}${RULE_MIDDLE}'NAME="eth7"' >> ${RULES_FILE}
echo ${RULE_START}${ETH8_MAC}${RULE_MIDDLE}'NAME="eth8"' >> ${RULES_FILE}
echo ${RULE_START}${ETH9_MAC}${RULE_MIDDLE}'NAME="eth9"' >> ${RULES_FILE}

# Notice in the above rules that I grouped them by bonding
# so eth2 and eth5 are listed together. The order of the rules does not
# matter. What matters is the correct MAC is assigned the correct name.
# In this case the third device was named eth2 when I began work on the
# cluster. I did not want to change it from what was working in the
# factory.

# I am including in this report, how I set up the bonded interfaces and
# vlans.

# Set up of the bond0 interface for the Cloudera network.
#
##Variables
IFCFG_BOND0=${CMU_RCFG_PATH}/etc/sysconfig/network/ifcfg-bond0
IFCFG_BOND1=${CMU_RCFG_PATH}/etc/sysconfig/network/ifcfg-bond1
IFCFG_VLAN1=${CMU_RCFG_PATH}/etc/sysconfig/network/ifcfg-vlan1
IFCFG_VLAN2=${CMU_RCFG_PATH}/etc/sysconfig/network/ifcfg-vlan2
IFCFG_VLAN3=${CMU_RCFG_PATH}/etc/sysconfig/network/ifcfg-vlan3
#
## I like separate temp files.
TMPFILE_B0=/tmp/cmu-tmpB0
TMPFILE_B1=/tmp/cmu-tmpB1
TMPFILE_V1=/tmp/cmu-tmpV1
TMPFILE_V2=/tmp/cmu-tmpV2
TMPFILE_V3=/tmp/cmu-tmpV3
#
## IP variables
IPSUFFIX=`echo ${CMU_RCFG_IP} | awk -F. '{print $4}'`
BOND0_IP_BASE=192.168.20
BOND1_IP_BASE=192.168
# The variable CMU_RCFG_IP is a "built-in" variable supplied by CMU.
# There are several CMU "built-in" variables available in reconf.sh.
#
## The last octet of the vlan IPs do not match the iLO, eth0, or bond0 IP,
## so they must be adjusted.
### Do the math
VLAN1_IP=$((IPSUFFIX + 15))
VLAN2_IP=$((IPSUFFIX + 20))
VLAN3_IP=$((IPSUFFIX - 5))
#
# This is one reason to have IPs numbering run consistently, or you will
# have to do a LOT more scripting to give each node's interfaces the
# correct IP address(es) on any interface excpet the main CMU interface
# (the one in the CMU database). Read the manual if you are lost here.

## Make bond0 config file.
grep -v IPADDR ${IFCFG_BOND0} > ${TMPFILE_B0}
echo IPADDR=${BOND0_IP_BASE}.${IPSUFFIX} >> ${TMPFILE_B0}
mv ${TMPFILE_B0} ${IFCFG_BOND0}

## Make basic bond1 config file. In this case
### bond1 contains no IP address or NETMASK.
grep -v -e IPADDR -e NETMASK ${IFCFG_BOND1} > ${TMPFILE_B1}
mv ${TMPFILE_B1} ${IFCFG_BOND1}

## Set up the vlan1 interface to the external network
grep -v IPADDR ${IFCFG_VLAN1} > ${TMPFILE_V1}
echo IPADDR=${VLAN1_IP_BASE}.23.${VLAN1_IP} >> ${TMPFILE_V1}
mv ${TMPFILE_V1} ${IFCFG_VLAN1} >> ${TMPFILE_V1}

## Set up the vlan2 interface to the external network
grep -v IPADDR ${IFCFG_VLAN2} > ${TMPFILE_V2}
echo IPADDR=${VLAN2_IP_BASE}.64.${VLAN2_IP} >> ${TMPFILE_V2}
mv ${TMPFILE_V2} ${IFCFG_VLAN2} >> ${TMPFILE_V2}

## Set up the vlan3 interface to the external network
grep -v IPADDR ${IFCFG_VLAN3} > ${TMPFILE_V3}
echo IPADDR=${VLAN3_IP_BASE}.96.${VLAN3_IP} >> ${TMPFILE_V3}
mv ${TMPFILE_V3} ${IFCFG_VLAN3} >> ${TMPFILE_V3}

exit 0

# End of reconf.sh

Result of reconf.sh on diskbooted systems. {{{6

From EQ7 after booting from its disks.

The reconf.sh creates the following rules file on each system, but with MACs that are unique to each interface, and assigned the right names.

(NOTE: each rule begins with SUBSYSTEM and ends with the NAME element. Each rule is on a line by itself, and CANNOT be on two lines. Due to formatting here you may see rules on two lines.)

# system board NICs
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="de:ad:be:ef:2a:fc", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="de:ad:be:ef:2a:fd", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="de:ad:be:ef:2a:fe", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="de:ad:be:ef:2a:ff", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
# bond0
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="be:ef:de:ad:db:d4", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="be:ef:de:ad:db:d5", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth5"
# bond1
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="be:ef:de:ad:d7:fc", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth6"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="be:ef:de:ad:d7:fd", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth7"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="be:ef:de:ad:db:cc", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth8"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="be:ef:de:ad:db:cd", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth9"

From the Console messages during boot we can see:
Setting up (localfs) network interfaces:
lo
lo IP address: 127.0.0.1/8
IP address: 127.0.0.2/8 done

eth0 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
eth0 IP address: 192.168.44.90/25 done

eth1 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
No configuration found for eth1 unused

eth2 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth3 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
No configuration found for eth3 unused

eth4 device: Broadcom Corporation NetXtreme BCM5719 Gigabi
No configuration found for eth4 unused

eth5 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth6 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth7 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth8 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

eth9 device: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ done

bond0
bond0 enslaved interface: eth2
bond0 enslaved interface: eth5
bond0 IP address: 192.168.43.18/26 done

bond1
bond1 enslaved interface: eth8
bond1 enslaved interface: eth9
bond1 enslaved interface: eth6
bond1 enslaved interface: eth7 done

vlan1
vlan1 IP address: 172.21.8.109/22 done

vlan2
vlan2 IP address: 172.21.13.121/22 done

vlan3
vlan3 IP address: 172.21.80.16/23 done
Setting up service (localfs) network . . . . . . . . . . done

The NICs came up in the desired, order:
Broadcom
Broadcom
Intel
Broadcom
Broadcom
Intel
Intel
Intel
Intel
Intel

From the booted system we can see the bus paths are in order expected
from the rules file.

ls -l /sys/class/net/eth*
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.1/net/eth1
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.0/net/eth2
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth3 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.2/net/eth3
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth4 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.3/net/eth4
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth5 -> ../../devices/pci0000:00/0000:00:03.0/0000:04:00.1/net/eth5
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth6 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.0/net/eth6
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth7 -> ../../devices/pci0000:20/0000:20:02.2/0000:24:00.1/net/eth7
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth8 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.0/net/eth8
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth9 -> ../../devices/pci0000:20/0000:20:02.0/0000:27:00.1/net/eth9

Example 2:

In a different set of nodes, SL4540s, a Mellanox Ethernet card was encountered that threw a nice curve ball into this solution. These nodes have 4 Ethernet Interfaces: eth0, eth1, eth2 & eth3. Interfaces 2 & 3 are on the Mellanox NIC, BUT, when you probe the system's hardware via normal
utilities, the Mellanox NIC only reports one of the interfaces; not both.

Interfaces eth0 & eth1 are built into the system board.

There was not much time to spend digging into this variation of the problem. A solution similar to the DL360pG8 solution was put in place for interfaces eth0, eth1, and eth2. Eth0 was used for the Admin network, and eth2 & eth3 were bonded together. (No vlan tagging was used for these node's interfaces.)

Since I was not able to find a quick programmatic way to identify eth3 from a CMU Net Booted system, I decided that the reconf.sh script should NOT write a rule for that interface. Rules are written for NICs eth0, eth1, and eth2.

On boot up from disk, the system will notice that eth3 is not defined and will generate a rule for that interface automagically. As long as there are no UNcommented rules in the file that name eth3, udev will name the interface eth3.

When the network comes up, it finds both eth2 and eth3 defined, and the bonded interface is up at full capacity.

If there is an UNcommented eth3 in the rules file, then the system will generate an interface eth4 for that NIC. The bonded interface will only contain eth2, and will be running at half capacity.

#Example entries for reconf.sh
#Capture the current, net booted MACs which Debian lists in order.
##Built in NICs
ETH0_MAC=$(cat /sys/class/net/eth0/address)
ETH1_MAC=$(cat /sys/class/net/eth1/address)
##Mellanox Ethernet NIC with two ports.
ETH2_MAC=$(cat /sys/class/net/eth2/address)
ETH3_MAC=$(cat /sys/class/net/eth3/address)

RULES_FILE=${CMU_RCFG_PATH}/etc/udev/rules.d/70-persistent-net.rules
RULE_START='SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="'
RULE_MIDDLE='", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", '

echo "" > ${RULES_FILE}
echo '# system board NICs' >> ${RULES_FILE}
echo ${RULE_START}${ETH0_MAC}${RULE_MIDDLE}'NAME="eth0"' >> ${RULES_FILE}
echo ${RULE_START}${ETH1_MAC}${RULE_MIDDLE}'NAME="eth1"' >> ${RULES_FILE}
echo '# bond0 NICs' >> ${RULES_FILE}
echo ${RULE_START}${ETH2_MAC}${RULE_MIDDLE}'NAME="eth2"' >> ${RULES_FILE}
####### Keep the rule for eth3 commented out so the udev system generates it.
####### echo ${RULE_START}${ETH3_MAC}${RULE_MIDDLE}'NAME="eth3"' >> ${RULES_FILE}

When the system comes up it can be observed that another interface was
generated by udev. It should be named eth3.
ls -l /sys/class/net/eth*
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.0/net/eth0
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth1 -> ../../devices/pci0000:00/0000:00:02.0/0000:03:00.1/net/eth1
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:04.0/0000:04:00.0/net/eth2
lrwxrwxrwx 1 root root 0 Sep 11 05:41 /sys/class/net/eth3 -> ../../devices/pci0000:00/0000:00:04.0/0000:04:00.1/net/eth3

Summary:

The udev system is very picky, and can be more so on certain Linux distributions. I know we encounter more udev problems with SuSE than we do with RHEL or derivatives of it.

In this example, we had to have a generic solution that would fit all of the nodes, real or potential, that could exist in the cluster. A few variations on this solution had to be employed due to system and functional differences between the various types of nodes operating in this cluster.

If a different distribution version of Debian were to be used, the solution would have to be double checked again to make sure that Debian still names the devices in bus order.

One major lesson learned is that if Udev appears to be ignoring the rules written for it to follow, that typically means there is something wrong with the rule. Either it does not contain enough elements to be made effective or there is a bad key and/or value present in the rule.

When there are bad keys in the rule, error messages can sometimes be observed scrolling on the boot screen, and may be available in dmesg.

I also found that reboots were necessary to fully realize the changes made manually while experimenting with rule syntax. The udevadm tool was not sufficient to have interface name changes realized in an already running system.

In the end a few different groups of nodes came up with multiple NICs in multiple bond/vlan configurations, connected to various networks.

Thursday, August 21, 2014

Enable and Disable a port on HP Procurve 5800 & 5900 Switches

Not being a Network savvy admin, I often forget the simplest steps on switches and routers.

As of the posting date, the command is the same for both switch series.

To disable a port, use the following sequence:

<SwitchName> system-view
[SwitchName] interface gigabitethernet 1/0/10
[SwitchName-GigabitEthernet1/0/10] shutdown

To enable a port:

<SwitchName> system-view
[SwitchName] interface gigabitethernet 1/0/10
[SwitchName-GigabitEthernet1/0/10] undo shutdown