vsphere bitfusion 2.0.0 installation guide€¦ · esxi 6.7 or higher esxi 7 esxi 7 esxi 7 gpu gpu...

18
vSphere Bitfusion 2.0.0 Installation Guide vSphere Bitfusion Guides WHITE PAPER– JULY 2020

Upload: others

Post on 09-Sep-2020

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide vSphere Bitfusion Guides

W H I T E PA P E R – J U LY 2 0 2 0

Page 2: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

W H I T E PA P E R | 2

vSphere Bitfusion 2.0.0 Installation Guide

Table of ContentsvSphere Bitfusion 2.0.0 Installation Guide

Introduction 3

Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Licensing & Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Installation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1. Install the First VMware vSphere Bitfusion Appliance . . . . . . . . . . . . . . . . . . . . . . . 4

2. Manually Installing the NVIDIA Driver 11

3. Verify vSphere Bitfusion Plug-in in vCenter 12

4. Install Additional vSphere Bitfusion Appliances 12

5. Install and Enable vSphere Bitfusion Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6. Selected Settings 16

7. PVRDMA 16

Page 3: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

This document describes how to install VMware vSphere® BitfusionTM 2.0.0.

IntroductionVMware vSphere Bitfusion allows multiple client machines running AI/ML applications to share access to remote GPUs on machines running Bitfsuion server software.

Bitfusion runs as a client-server architecture. It runs a GPU service on VMware appliances (VMs with prepackaged software and services). These GPU servers require access to local GPUs (usually through VMware vSphere® DirectPath I/OTM). The Host must run vSphere 7.

Bitfusion client software runs on the virtual machines where the applications run. Clients may use vSphere 7 or 6.7.

The Bitfusion servers will register a Bitfusion plug-in on VMware vCenter®. The plug-in provides monitoring and management of the clients and servers.

Figure 1 is an example of a small Bitfusion “cluster” (set of Bitfusion client/server machines and vCenter) on a switched network. A minimal cluster would be one client, one server, and one vCenter. Larger clusters may have multiple clients and multiple servers. Servers may have many GPUs.

Figure 1. A vSphere Bitfusion cluster

Authorization,Configuration

2. vSphere Bitfusion Enable

3. Authorized Access

Guest variables to make registration possible

1. Register vSphere Bitfusion Plug-In

DistributedDatabase

CommunicationSynchronization

vSphere Bitfusion

Appliance

vSphere Bitfusion

Appliance

vSphere Bitfusion

Appliance

vCenter

vSphere BitfusionClient VM

ESXi 6.7 or higher

ESXi 7

ESXi 7 ESXi 7

GPU GPU GPU GPU

GPU GPU

Fldjsjt 00000

Dlkfjskj 1234

Kdf;a 8989

Prerequisites• Client OS: Ubuntu 18.04, Ubuntu 16.04, CentOS 7, RHEL 7.4+

• NVIDIA driver (different installation options given below): NVIDIA-Linux-x86_64-440.64.00.run

• Recommended GPUs are Tesla V100 and T4 (these are the most tested and on the support list)

• Networking supporting TCP/IP or RoCE (e.g. PVRDMA adapters)

• 10 Gbps bandwidth minimum. More is recommended for any machine accessing or housing more than two GPUs.

• Node-to-node latency of 50 microseconds or less; this is to achieve good performance; it is a recommendation, not a strict requirement

• Servers listen on many ports: 56001, 55001-55100, 54000, 45201-46225, 7000, 7001, 7199, and 9042.

The network should not block them. Clients send to the servers on 56001, 55001-55100, 45201-46225.

W H I T E PA P E R | 3

Page 4: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

For vCenter access, ports 443, and 80 should also be open.

• Server memory – The minimum size is the aggregate total of GPU memory on all GPU cards passed through, multiplied by a factor of 1.5

• Server compute – minimum cores is the number of GPU cards multiplied by 4

• Client memory – total all the GPU memory that client applications will consume at one time and multiply by a factor of 1.5

• Client compute requirements are the same as they would be for running applications with dedicated, local GPUs

EncryptionBitfusion encrypts control traffic, but does not encrypt data traffic. A client allocating GPUs would be an example of control traffic. A client sending a model to the GPU would be an example of data traffic. There are no installation steps, controls, or configuration associated with the encryption.

Licensing and CertificationThe Bitfusion server software requires Bitfusion add-on-licensing on top of a VMware vSphere® Enterprise Plus EditionTM license (at the time of this writing). In addition, the VMware ESXTM hosts used to run Bitfusion server VMs must be certified per the VMware Compatibility Guide (VCG) and support the “VM DirectPath IO” feature to enable pass-through of the GPU defices. Verify that your hosts appear on the VCG and support VM DirectPath I/O by examining the VCG at https://www.vmware.com/resources/compatibility/search.php.

After ensuring physical server requirements, apply the license as described in the licensing documentation at https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vcenterhost.doc/GUID-487AACBF-4E49-43E0-A852- FC23734CO774.html.

Installation OverviewThere are four major steps to follow to install vSphere Bitfusion in a cluster.

1. Install first Bitfusion appliance (physical host with GPUs and ESXi 7).

2. Verify Bitfusion plug-in in vCenter.

3. Install additional Bitfusion appliances (same host requirements; has an additional step in vCenter). Install and boot serially.

4. Install and enable Bitfusion clients.

1. Install the First VMware vSphere Bitfusion ApplianceNOTE: THERE ARE IMPORTANT DIFFERENCES BETWEEN THE FIRST AND SUBSEQUENT APPLIANCE INSTALLATIONS

Prerequisites:

• The host must have one or more commercial, NVIDIA GPUs.

• The host must use ESXi 7 as its hypervisor.

• The host must have an Enterprise Plus license and a Bitfusion add-on.

(in vCenter right-click on host → “Assign License” → associate or add licenses)

A. Download the Bitfusion OVA (as with other software) from account on my.vmware.com. It will appear similar to Figure 2.

W H I T E PA P E R | 4

Page 5: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

Figure 2. Representative appearance of download site for Bitfusion OVA

B. Create VMware appliance from the Bitfusion OVA.

• Select the host to deploy on.

• Right click the host and select Deploy OVF Template.

• Enter the URL of the OVA file or upload the OVA file.

• Give the Bitfusion server a name

W H I T E PA P E R | 5

Page 6: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 6

• Select the compute resource

• Review details. Ignore the warning from vCenter. It is reporting that the VM is setting the advanced configuration values which are required for the VM to work. The “64bitMMIOSizeGB value can be modified later. A large value is used by default for safety. The values are:

o pciPassthru.64bitMMIOSizeGB = 256

o pciPassthru.use64bitMMIO = true

Page 7: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 7

• Select storage

• Select Network for Network Adapter 1. This is the network to use for management, though it can be used simultaneously for data. Add any additional adapters later.

• On the Customize Template screen

o Create a hostname for the server (this will be entered in /etc/hostname).

o Enter the vCenter GUID and URL as found the navigation bar of the browser. For example if the navigation bar showed, https://sc2wvc03.vslab.local/ui/app/vm;nav=h/urn:vmomi:VirtualMachine:vm-4450:612d27ff-d297-4573-bdc0-2c0dac8589a5/summary, the GUID is the number highlighted in orange and the URL is highlighted in green.

o Enter the vCenter credentials: log-on name and password.

• The “vCenter TLS Certificate Thumbprint” is the SHA1 signature of the vCenter TLS certificate. To find it:

o In another browser tab, access vCenter

o List the certificates of vCenter

- Chrome: click on the “lock” icon or “not secure” icon to the left of the URL bar in the browser → certificate → details → thumbprint

- Firefox: click on the “lock” icon to the left of the URL bar in the browser → expand Connection (“secure” or “not secure”) → More Information → View Certificate → scroll to fingerprints, SHA-1

o Copy the SHA1 thumbprint (note: even though hexadecimal, it is case-sensitive)

• To log into the VM as “customer,” via the console or ssh, enter a password in the “Credentials” section

• To download and install the NVIDIA driver on first boot, check the checkbox to accept the NVIDIA license; the start-up script will do the rest.

Page 8: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 8

o Use of the NVIDIA driver implies acceptance of the NVIDIA Software License Agreement: https://bit.ly/2yTOhi5

o If operating in a firewalled / air-gapped environment with no access to the internet do not check the box. Instead, manually install the driver later from the command line.

• Enter networking configuration fields. As indicated, leave most items blank for DHCP.

• The default MTU is 1500. For performance, we recommend an MTU of 4K or higher. If possible, set to a higher value here. If setting the MTU higher than 1500, ensure that data center switches are enabled for jumbo frames.

Page 9: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 9

• If the DHCP server does not support sending NTP server information, specify the NTP explicitly.

• Network Adapter 1 is mandatory. Even if most fields are blank, ensure everything is done intentionally.

• Optionally add up to 3 more Network Adapters.

• Click Finish to deploy the template

It will take a few minutes for vCenter to complete the deployment.

Do NOT power on the VM at this point. There are additional steps.

Page 10: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 0

C. Pass through the GPUs to the VM as described here, briefly.

(This may require enabling the pass-through capability and rebooting the host. For this and other information on passing through GPUs, this blog article describes provides more detail: https://blogs.vmware.com/apps/2018/09/using-gpus-with-virtual-machines-on-vsphere-part-2-vmdirectpath-i-o.html)

• Select the VM

• Click Actions > Edit Settings

• Attach the pass-through GPUs to the VM by clicking Add New Device > PCI Device for each GPU to attach

o Adjust the PCI Device Numbers. By default, all new devices added are the first available device. (to adjust, use the expansion arrow at the end of the line showing the PCI address and device description, e.g. “0000:3d:00.0 …Tesla V100”)

D. Add any additional network adaptors.

• Click Add New Device > Network Adapter for each additional adapter

o To change the Network Adapter driver type, e.g. use PVRDMA instead of VMXNET3, do it now

E. Configure host CPU and Memory settings.

• Select the VM

• Click Actions > Edit Settings

Page 11: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 1

• Set CPU and Memory to maximum if the host is a dedicated Bitfusion server. Otherwise…

o Minimum CPU value is the number of GPU devices times 4

o The minimum size is the aggregate total of GPU memory on all GPU cards passed through, multiplied by a factor of 1.5

• Set the “Reserve all guest memory” checkbox

F. Adjust the passthrough MMIO size (the installer uses a default 256GB and that much may not be needed)

• Right-click on VM → Edit Settings → VM Options → Advanced → Edit Configuration → Click “Add Configuration Params”

• Set pciPassthru.64bitMMIOSizeGB=<n> {where n equals (num-cards * size-of-card-in-GB) rounded up to NEXT power of 2 example A: 2 16GB cards => 2 * 16 => 32 => rounded to next power of 2 = 64 example B: 3 16GB cards => 3 * 16 => 48 => rounded to next power of 2 = 64}

G. Best Practice: take second snapshot of the VM at this point, in case something goes wrong during power-on.

H. Power on the VM.

As it powers on, let it run for about 2 to 10 minutes before doing anything else (sometimes more time may be required). It requires a bit of time to install the driver and to register with vCenter and set itself up as the first GPU server in the Bitfusion “cluster.” Cluster here is in quotes, because a Bitfusion cluster does not have a particular vCenter definition. The cluster is just the collection of VMs using Bitfusion.

2. Manually Installing the NVIDIA DriverSkip this section if the installer was configured to install the driver. Otherwise, log in to the new Bitfusion Server VM and install it manually.

The currently tested and certified NVIDIA driver is NVIDIA-Linux-x86_64-440.64.00.run, available from http://us.download.nvidia.com/tesla/440.64.00/NVIDIA-Linux-x86_64-440.64.00.run. Use of the NVIDIA driver implies acceptance of the NVIDIA Software License Agreement: https://bit.ly/2yTOhi5.

Choose one of the three methods below as best suits the environment.

A. Manual Method 1 – Install direct from the internet

# Log into the bitfusion server and install the driver direct from NVIDIA’s web server

ssh customer@$bitfusion_server_ip

sudo install-nvidia-driver http://us.download.nvidia.com/tesla/440.64.00/NVIDIA-Linux-x86_64-440.64.00.run

B. Manual Method 2 – Install in air-gapped environment with local web server

# Download the driver to a laptopwget http://us.download.nvidia.com/tesla/440.64.00/NVIDIA-Linux-x86_64-440.64.00.run # Copy the driver to a local web server to a directory that the web server servesscp NVIDIA-Linux-x86_64-440.64.00.run mylogin@mylocalwebserver:/var/www/html/ssh mylogin@mylocalwebserverchmod +r /var/www/html/NVIDIA-Linux-x86_64-440.64.00.run # Log into the bitfusion server and install the driver from the local web serverssh customer@$bitfusion_server_ipsudo install-nvidia-driver http://mylocalwebserver/NVIDIA-Linux-x86_64-440.64.00.run

Page 12: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 2

C. Manual Method 3 – Install in air-gapped environment without local web server

# Download the driver to a laptopwget http://us.download.nvidia.com/tesla/440.64.00/NVIDIA-Linux-x86_64-440.64.00.run # Copy the driver to the bitfusion serverscp NVIDIA-Linux-x86_64-440.64.00.run customer@$bitfusion_server_ip:~/ # Log into the bitfusion server and install the driver from the local filessh customer@$bitfusion_server_ipsudo install-nvidia-driver ~customer/NVIDIA-Linux-x86_64-440.64.00.run

After any of the three methods, reboot the VM. It will take a few minutes to register the Bitfusion plug-in with vCenter.

3. Verify vSphere Bitfusion Plug-in in vCenterAs mentioned at the end of the previous sections, wait 2 to 10 minutes (possibly more) for the Bitfusion appliance to come up and register itself with vCenter before proceeding.

• A blue alert bar along the top of vCenter (version 7) will appear, announcing the Bitfusion plug-in; click the refresh button.

• Select “Menu” → “Bitfusion” and wait for the Bitfusion plug-in GUI to load

4. Install Additional vSphere Bitfusion AppliancesInstalling the second, third, following vSphere Bitfusion appliances is almost the same as installing the first, but with one critical difference.

NOTE: THE DIFFERENCE BETWEEN THE FIRST AND SUBSEQUENT INSTALLATIONS IS CRITICAL

Hopefully, we’ve made the point strongly enough that those who just skim instructions are now aware of it. Second, third, and subsequent server installations need to find previously installed servers and to avoid re-registering the Bitfusion plug-in.

Follow the steps of section 1 through step G. Then, before powering on in step H, do the following:

• Take vCenter to the “Hosts and Clusters” GUI, and on the new VM(s) right-click → Bitfusion → select “Enable Bitfusion”

In the dialog box, select “server” and press “Enable”

This adds guest variables informing the server it is not the first GPU server in the Bitfusion cluster.

Page 13: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 3

• Proceed with step H and power on the new VM(s). If not using the scripted driver installation, install the NVIDIA driver manually. Wait a couple minutes to see the new server in the Bitfusion plug-in GUI.

5. Install and Enable vSphere Bitfusion ClientsPrerequisites:

• Client machines must be vSphere VMs managed by vCenter. The client machines may run on vSphere 6.7 or 7, but vCenter, itself, must be version 7.

• The BitFusion Remote Plugin must have been registered in vCenter. As a practical matter, just bring up all the Bitfusion servers first and ensure the plug-in appears in vCenter.

• Client machines must run one of the following OSes: CentOS 7, RHEL 7.4+, Ubuntu 16.04, or Ubuntu 18.04

• All four VMware Tools scripts should be enabled for the VM. This is done by default when creating new VMs, but to check: Right-click on VM → Edit Settings → VM Options → VMware Tools Run VMware Tools Scripts

A. Install the Bitfusion software and dependencies on the client VM. Obtain the DEB or RPM file appropriate for your OS. They can be found at https://packages.vmware.com/bitfusion/centos/ or https://packages.vmware.com/bitfusion/ubuntu/.

A-1. To install the RPM on a CentOS system

Page 14: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 4

# For RHEL 7, skip the epel-release installationsudo yum install -y epel-release

sudo rpm --import https://packages.vmware.com/bitfusion/vmware.bitfusion.key

# The fingerprint for the key is

# 09A8D39499E550D1

# Go to https://packages.vmware.com/bitfusion/centos/ to find the client .rpm package for your version of Centos.

# The VMware Bitfusion client package version must match the version of the VMware Bitfusion Server OVA. If you installed version 2.0.0-11 of the VMware Bitfusion Server OVA, you will need to install version 2.0.0-11 of the VMware Bitfusion client package.

# Download the package.

# Install the package using:sudo yum install -y ./bitfusion-client*.rpm......

A-2. To install the DEB on an Ubuntu 18.04 system

# Go to https://packages.vmware.com/bitfusion/ubuntu/ to find the client .deb packagefor your distro.

# The VMware Bitfusion client package version must match the version of the VMwareBitfusion Server OVA. If you installed version 2.0.0-11 of the VMware Bitfusion Server OVA, you will need to install version 2.0.0-11 of the VMware Bitfusion client package.

# Download the package.

# Install the package using:sudo apt-get updatesudo apt-get install -y ./bitfusion-client*.deb......

bitfusion@bf_ubuntu_1804:~$ bitfusion versionBitfusion version 2.0.0

Ubuntu 16.04 systems are similar enough not to need additional explanation.

B. Enable Bitfusion on the client VM. It is generally best to enable a client when it is in a powered-down state. This way, it will remain enabled across reboots. Otherwise, it must be re-enabled after every power cycle. Select a client VM while it is in power-down state.

• On the selected client VM, right-click → Bitfusion → select “Enable Bitfusion”

Page 15: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 5

In the dialog box, select “client” and press “Enable.” You should see a success message as the dialog closes.

C. Add all desired VM users to the “bitfusion” group. Below, the username, “testuser,” serves as an example

# Example: add “testuser” to the bitfusion group $ sudo usermod -aG bitfusion testuser

# Find out what group(s) the user belongs to # You may have to log out and back in

testuser@bf_ubuntu_1804:~$ groups testuser bitfusion

D. Now, perform a quick test on the CLI of the client VM to see if Bitfusion is working. This command will report on all the GPUs in the cluster:

Page 16: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 6

# See if “testuser” can run the remote workload

testuser@bf_ubuntu_1804:/home/bitfusion$ bitfusion list_gpus - server 0 [10.202.8.185:56001]: running 0 tasks |- GPU 0: free memory 16160 MiB / 16160 MiB |- GPU 1: free memory 16160 MiB / 16160 MiB |- GPU 2: free memory 16160 MiB / 16160 MiB |- GPU 3: free memory 16160 MiB / 16160 MiB

6. Selected SettingsPerformance documents, User Guides, and perhaps other documents, in addition to the knowledge base, will document settings and recommendations for good performance and proper function with vSphere Bitfusion. But we include a few basic wrecommendations here.

MTUFor best performance of applications running under Bitfusion, we recommend setting the MTU to 4096 or higher. Set clients to match the MTU size used when deploying the servers. If the MTU is above 1500 take care to enable jumbo frames in the network switches, as well, or frames will be dropped—and dropped silently. Here is one way of setting the MTU to 4096. It assumes the tool, iconfig, is available. It assumes the network interface is named, ens192.

ifconfig ens192 mtu 4096

Open filesIf the open files resource limit is too low, Bitfusion may get an Error establishing connection: Cannot allocate memory error. Set the “open files” limit to 4096 or higher.

ulimit -n 4096# orulimit -n unlimited

7. PVRDMAHere we will share the minimal set of details needed to use PVRDMA. We assume the PVRDMA networking is already set up in the environment.

Bitfusion servers and clients should be set up with two network adapters. Use the first for a management interface with a default adaptor type such as VMXNET3. Use the second for PVRDMA.

On the GPU servers:

• Select the checkbox for “Configure Network Adapter 2?” in OVF Deployment step 7—Customize Template

• Enter other network configuration values if not using DHCP

Page 17: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

vSphere Bitfusion 2.0.0 Installation Guide

W H I T E PA P E R | 1 7

On both clients and GPU servers:

• Prior to booting the VM, Edit Settings (Add a second adapter if necessary) Virtual Hardware tab → Network adapter 2 → Select a PVRDMA network Virtual Hardware tab → Network adapter 2 → Select Connect At Power On Virtual Hardware tab → Network adapter 2 → Adapter Type → PVRDMA

On clients, after booting install the RDMA drivers. These steps will be specific to the OS and the physical network card:

• For Ubuntu (we also install tools and diagnostic packages for convenience)

sudo apt-get install -y rdma-core libmlx4-1 infiniband-diags ibutils ibverbs-utils rdmacm-utils perftest

• For CentOS and RHEL (we also install tools and diagnostic packages for convenience)

yum install -y open-vm-tools rdma-core libibverbs libibverbs-utils infiniband-diags

TestWith two systems you can test the PVRDMA connection between them. Assume IP addresses of 192.168.10.10 and 192.16.10.11

# I am 192.16.10.10; setting up server

ib_send_bw

# I am 192.168.10.11; client – connect to server

ib_send_bw 192.168.10.10

The client-side will issue a bandwidth report to stdout.

Page 18: vSphere Bitfusion 2.0.0 Installation Guide€¦ · ESXi 6.7 or higher ESXi 7 ESXi 7 ESXi 7 GPU GPU GPU GPU GPU Fldjsjt 00000 Dlkfjskj1234 Kdf;a8989 Prerequisites • Client OS: Ubuntu

VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 vmware.com Copyright © 2019 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. and its subsidiaries in the United States and other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. Item No: VMW-0520-2213_VMware vSphere Bitfusion_Installation Guide_3.1_DM 8/20