virtio-net-controller

liaocj 2024-12-03 09:02:56
Categories: Tags:

Nvidia VirtIO Net Controller Daemon

VirtIO net controller enables users to create VirtIO-net emulated PCIe devices
in the system where the NVIDIA® BlueField® DPU is connected. This is done by the
virtio-net-controller software module present in the DPU. Virtio-net emulated
devices allow users to hot plug virtio-net PCIe PF Ethernet NIC devices and
virtio-net PCI VF Ethernet NIC devices in the host system where the DPU is
plugged in. By using this solution, virtio net device offloads traffic handling
from host CPU to the NIC of DPU.

System Preparation

Use BlueField-2 as example, mst device should be adjusted accordingly otherwise.

Prepare DPU for hotplug devices:

$ mst start
$ mlxconfig -d /dev/mst/mt41686_pciconf0 reset
$ mlxconfig -d /dev/mst/mt41686_pciconf0 s \
                PF_BAR2_ENABLE=0 \
                PER_PF_NUM_SF=1
$ mlxconfig -d /dev/mst/mt41686_pciconf0 s \
                PCI_SWITCH_EMULATION_ENABLE=1 \
                PCI_SWITCH_EMULATION_NUM_PORT=16 \
                VIRTIO_NET_EMULATION_ENABLE=1 \
                VIRTIO_NET_EMULATION_NUM_VF=0 \
                VIRTIO_NET_EMULATION_NUM_PF=0 \
                VIRTIO_NET_EMULATION_NUM_MSIX=16 \
                ECPF_ESWITCH_MANAGER=1 \
                ECPF_PAGE_SUPPLIER=1 \
                SRIOV_EN=0 \
                PF_SF_BAR_SIZE=8 \
                PF_TOTAL_SF=64
$ mlxconfig -d /dev/mst/mt41686_pciconf0.1 s \
                PF_SF_BAR_SIZE=10 \
                PF_TOTAL_SF=64

For transitional (virtio spec 0.95) device, the following options are needed:

VIRTIO_NET_EMULATION_PF_PCI_LAYOUT=1
VIRTIO_EMULATION_HOTPLUG_TRANS=1

Cold reboot host

Prepare DPU for static devices with SR-IOV (504 VFs) support:

$ mst start
$ mlxconfig -d /dev/mst/mt41686_pciconf0 reset
$ mlxconfig -d /dev/mst/mt41686_pciconf0 s \
                PF_BAR2_ENABLE=0 \
                PER_PF_NUM_SF=1
$ mlxconfig -d /dev/mst/mt41686_pciconf0 s \
                PCI_SWITCH_EMULATION_ENABLE=0 \
                PCI_SWITCH_EMULATION_NUM_PORT=0 \
                VIRTIO_NET_EMULATION_ENABLE=1 \
                VIRTIO_NET_EMULATION_NUM_VF=126 \
                VIRTIO_NET_EMULATION_NUM_PF=4 \
                VIRTIO_NET_EMULATION_NUM_MSIX=4 \
                ECPF_ESWITCH_MANAGER=1 \
                ECPF_PAGE_SUPPLIER=1 \
                SRIOV_EN=1 \
                PF_SF_BAR_SIZE=8 \
                PF_TOTAL_SF=508 \
                NUM_OF_VFS=0
$ mlxconfig -d /dev/mst/mt41686_pciconf0.1 s \
                PF_TOTAL_SF=1 \
                PF_SF_BAR_SIZE=8

For transitional (virtio spec 0.95) device, the following option is needed:

VIRTIO_NET_EMULATION_VF_PCI_LAYOUT=1

Cold reboot host

NOTE:

  1. To use static device, it is recommended to blacklist kernel module
    virtio_pci and virtio_net from guest OS before proceeding. If those modules
    are built into kernel image, they are not blacklistable. In such case, it’s
    recommended for users to have a way to start the virtio_net_controller from
    DPU using SSH or serial/oob interfaces. This can be achieved by connecting
    the RShim/serial cable to another host, or setting up the OOB interface.

    Normally, this is not needed as controller daemon starts automatically when

system boots.

  1. For the systems which BIOS is not doing address space allocation for
    unpopulated downstream ports of the pcie switch, “pci=realloc” is needed
    inside boot commandline.

    It is recommended to add “pci=assign-busses” to boot command line when

creating more than 127 VFs. Without this option, errors below might appear
from host. And virio driver won’t probe those devices as type is incorrect.

[  617.382854] pci 0000:84:00.0: [1af4:1041] type 7f class 0xffffff
[  617.382883] pci 0000:84:00.0: unknown header type 7f, ignoring device

  1. To support more than 127 VFs, hotplug capability shall be disabled by changing mlxconfig options below:

    PCI_SWITCH_EMULATION_ENABLE=0
    PCI_SWITCH_EMULATION_NUM_PORT=0
  2. Max number of VFs supported by each mst device is calculated by the formula
    below based on mlxconfig options.

    max_num_vf = VIRTIO_NET_EMULATION_NUM_VF * VIRTIO_NET_EMULATION_NUM_PF

    VIRTIO_NET_EMULATION_NUM_VF is reserved for each PF out of VIRTIO_NET_EMULATION_NUM_PF.
    As of now, max_num_vf is 504 for BlueField-2. Note that, to support the
    maximal number of virtio net VFs, NUM_OF_VFS, VIRTIO_BLK_EMULATION_NUM_VF and
    VIRTIO_BLK_EMULATION_NUM_PF should all be 0.

Building

Dependency:
mlnx-ofed, mlnx-libsnap, libev, libev-devel, libmnl-devel

To build and install this daemon, depends on the packet, there are different
steps.

Release Tarball

Follow steps below if you received the release tarball after untar:

$ ./build.sh -g [-d]       # configure [debug]
$ ./build.sh -b [-c] [-v]  # make [clean] [verbose] and make install

Or you can run all steps above in one shot.

$ ./build.sh

Git Repository

Follow steps below if you’re developing on top of the git repository.

$ ./build.sh -a            # autogen and apply patches
$ ./build.sh -g [-d]       # configure [debug]
$ ./build.sh -b [-c] [-v]  # make [clean] [verbose] and make install

Also you can run all steps above in one shot.

$ ./build.sh

Typically the autogen and configure steps only need to be done the first time
unless configure.ac or Makefile.am changes.

To build rpm/deb:

$ scripts/build_rpm.sh
$ tar -xvf *.gz
$ cd virtio-net-controller-<ver>
$ dpkg-buildpackage

Usage

The controller has a systemd service running and a user interface tool to
communicate with the service.

Controller Service

If controller is running on a DPU, the service is enabled by default. It starts
automatically. Run command below to check the status.

$ systemctl status virtio-net-controller.service

If daemon is not running, start controller by running command below.
Make sure to check the status after command starts.

$ systemctl start virtio-net-controller.service

To stop the daemon, first unload virtio-net/virtio-pci from host. Then run
command below from DPU:

$ systemctl stop virtio-net-controller.service

To enable the daemon automatically start, run:

$ systemctl enable virtio-net-controller.service

Controller Recovery

It is possible to recover the control and data planes if communications are
interrupted so the original traffic can resume.

Recovery depends on the JSON files stored in /opt/mellanox/mlnx_virtnet/recovery
where each recovery file corresponds to each device (either PF or VF).

Following is an example of the file:

{
  "port_ib_dev": "mlx5_0",
  "pf_id": 0,
  "function_type": "pf",
  "bdf_raw": 26624,
  "device_type": "hotplug",
  "mac": "0c:c4:7a:ff:22:93",
  "pf_num": 0,
  "sf_num": 2000,
  "mq": 1
}

These files should not be modified/deleted under normal circumstances.
However, if necessary, advanced users may tune settings or delete the file
to meet their requirements.

Note: users are responsible for the validity of the recovery files and should
only modify/delete them when the controller is not running.

Controller Configuration Files

Main Configuration

A config file /opt/mellanox/mlnx_virtnet/virtnet.conf can be used to set
parameters for the service. If config file is changed while controller service
is running, the new config won’t take effect until service is reloaded.

Currently supported options:

Here is an example for non-lag(JSON format):

{
  "ib_dev_p0": "mlx5_0",
  "ib_dev_p1": "mlx5_1",
  "ib_dev_for_static_pf": "mlx5_0",
  "is_lag": 0,
  "recovery": 1,
  "sf_pool_percent": 0,
  "sf_pool_force_destroy": 0,
  "static_pf": {
    "mac_base": "11:22:33:44:55:66",
    "features": "0x230047082b",
  },
  "vf": {
    "mac_base": "CC:48:15:FF:00:00",
    "features": "0x230047082b",
    "vfs_per_pf": 100
  }
}

Here is an example for lag(JSON format):

{
  "ib_dev_lag": "mlx5_bond_0",
  "ib_dev_for_static_pf": "mlx5_bond_0",
  "is_lag": 1,
  "recovery": 1,
  "sf_pool_percent": 0,
  "sf_pool_force_destroy": 0
}

Here is an example for single_port (JSON format):

{
  "ib_dev_p0": "mlx5_2",
  "ib_dev_for_static_pf": "mlx5_2",
  "single_port": 1
}

Provider Configuration

To support different provider other than the default ACE, configuration files
located in /opt/mellanox/mlnx_virtnet/providers can be used.

Format of config files:

A Sample:

ace.provider
# ace.so is one provider
Provider=ace
Score=100

dpa.provider
# dpa.so is another provider
Provider=dpa
Score=200

Controller User Interface

Each command has its own help manual, e.g, virtnet list -h

  1. list current devices
$ virtnet list
  1. query detailed info of all current devices (-a) or a specific device (-p, -v or -u).
    –brief (or -b) can be used to ignore queue info.
$ virtnet query -a [-b]
$ virtnet query -p pf_id [-v vf_id] [-b]
$ virtnet query -u "VUID string" [-b]
  1. Hotplug virtio net device with 1500 mtu and 3 virtio queues of depth 1024 entries with vlan control support, data path relies on mlx5_0:
$ virtnet hotplug -i mlx5_0 -f 0x80000 -m 0C:C4:7A:FF:22:93 -t 1500 -n 3 -s 1024

For transitional (virtio spec 0.95) device, “-l” or “–legacy” can be used.

$ virtnet hotplug -i mlx5_0 ... -l

Note: if Linux kernel version is < 4.0 or CentOS 6 or Ubuntu 12.04, –legacy is
mandatory. Otherwise, even with –legacy, the device will still be driven as a modern
one by default, unless the force_legacy flag is used for virtio_pci driver on the host,
as below:

$ modprobe -v virtio_pci force_legacy=1

or

Append virtio_pci.force_legacy=1 in kernel cmdline

Max number of virtio queues are bounded by the minimum of the numbers below

If hotplug succeeds, hotplug device info returns similar as below:

{
  "bdf": "84:0.0",
  "id": 1,
  "transitional": 0,
  "sf_rep_net_device": "en3f0pf0sf1001",
  "mac": "0C:C4:7A:FF:22:93"
}

Add the SF interface to corresponding OVS bridge. Also, make sure the MTU of SF
rep is configured the same or greater than the virtio net device.

$ ovs add-port <bridge> en3f0pf0sf1001
  1. Modify the MAC address of virtio physical device ID 0 (or with its “VUID string”,
    which can be obtained through virtnet list/query)
$ virtnet modify -p 0 device -m 0C:C4:7A:FF:22:93
$ virtnet modify -u "VUID string" device -m 0C:C4:7A:FF:22:93
  1. Modify the max queue size of device
$ virtnet modify -p 0 -v 0 device -q 2048
$ virtnet modify -u "VUID string" device -q 2048
  1. Modify the MSI-X number of VF device
$ virtnet modify -p 0 -v 0 device -n 8
$ virtnet modify -u "VUID string" device -n 8
  1. Modify the queue options of virtio physical device ID 0
$ virtnet modify -p 0 queue -e event -n 10 -c 30
$ virtnet modify -u "VUID string" queue -e event -n 10 -c 30

Note: to modify MAC, MTU, features, msix_num and max_queue_size, device
unbind and bind on guest OS is necessary, as follows:

$ cd /sys/bus/pci/drivers/virtio-pci/
$ echo "bdf of device" > unbind

perform "virtnet modify ..." on controller side

$ echo "bdf of device" > bind
  1. Unplug virtio physical device ID 0, only applicable to hotplug device
$ virtnet unplug -p 0
$ virtnet unplug -u "VUID string"
  1. Live update controller

Install rpm/deb package

$ rpm -Uvh virtio-net-controller-x.y.z-1.mlnx.aarch64.rpm --force
$ dpkg --force-all -i virtio-net-controller-x.y.z-1.mlnx.aarch64.deb

Then perform following commands:

$ virtnet version        to check destination controller version
$ virtnet update -s      to start live update
$ virtnet update -t      to check live update status

Limitations

  1. When port MTU (p0/p1 from DPU) is changed(e.g, from 1500 to 9000) after
    controller starts, it is required to restart controller service.