## Technical Sessions

Session Session 1

## Video Streaming

Conference
10:30 AM — 11:40 AM JST
Local
Jun 24 Thu, 9:30 PM — 10:40 PM EDT

### Optimizing Quality of Experience for Long-Range UAS Video Streaming

Russell Shirey (Purdue University & US Air Force, USA); Sanjay Rao and Shreyas Sundaram (Purdue University, USA)

0
There is much emerging interest in operating Unmanned Aerial Systems (UAS) at long-range distances. Unfortunately, it is unclear whether network connectivity at these distances is sufficient to enable applications with stringent performance needs. In this paper, we consider this question in the context of video streaming, an important UAS use-case. We make three contributions. First, we characterize network data collected from real-world UAS flight tests. Our results show that while dropouts (i.e., extended periods of poor performance) present challenges, there is potential to enable video streaming with modest delays, and correlation of throughput with flight path (both distance and orientation) provides new opportunities. Second, we present Proteus, the first system for video streaming targeted at long-range UAS settings. Proteus is distinguished from Adaptive Bit Rate (ABR) algorithms developed for Internet settings by explicitly accounting for dropouts, and leveraging flight path information. Third, through experiments with real-world flight traces on an emulation test-bed, we show that at distances of around 4 miles, Proteus reduces the fraction of a viewing session encountering rebuffering from 14.33% to 1.57%, while also significantly improving well-accepted composite video delivery metrics. Overall, our results show promise for enabling video streaming with dynamic UAS networks at long-range distances.

### QoS-Aware Network Energy Optimization for Danmu Video Streaming in WiFi Networks

Nan Jiang and Mehmet Can Vuran (University of Nebraska-Lincoln, USA); Sheng Wei (Rutgers University, USA); Lisong Xu (University of Nebraska-Lincoln, USA)

0
Danmu (a.k.a., barrage videos or bullet comments) is a novel type of interactive video streaming, which displays instantaneous user comments flying across the screen during the video playback to better engage the users. However, such fancy experience brings a considerable burden to the battery of mobile user devices that have limited capacity. For example, WiFi testbed experiments show 15% to 35% increase in WiFi network energy consumption because of the large amount of additional network traffic for user comments. On the other hand, current network energy minimization methods adversely impact the Quality of Service (QoS) of Danmu users, because they put off the transmission and then delay the display of the user comments that should match with the timeline of the corresponding videos. In this paper, for the first time, a heuristic QoS-aware network energy optimization algorithm is proposed to reduce the WiFi network energy consumption while still maintaining the desired QoS of Danmu users. Comprehensive testbed experiments using an open-source Danmu streaming system and with real Danmu user traces indicate up to 28% WiFi network energy saving depending on different system, network, and user settings.

### Understanding and Improving User Engagement in Adaptive Video Streaming

Chunyu Qiao and Jiliang Wang (Tsinghua University, China); Yanan Wang (IQIYI Science & Technology Co., Ltd., China); Yunhao Liu (Tsinghua University & The Hong Kong University of Science and Technology, China); Hu Tuo (IQIYI Science & Technology Co., Ltd., China)

0
Today's video service providers all desire to deeply understand the ever changing factors on user QoE to attract more users. In this paper, we study the user engagement with respect to video quality metrics and improve user engagement in adaptive video streaming systems. We conduct a comprehensive study of the real data from iQIYI, covering 700K users and 150K videos. We find bitrate switch becomes the new dominant factor on user engagement instead of rebuffering events. We also observe the impact of rate of rebuffering is more dominant than rebuffering time. We examine novel interdependencies between quality metrics in the system, e.g., the positive correlation between bitrate switch and average bitrate, which is due to the context system strategy, i.e., conservative bitrate enhancing strategy adopted by iQIYI. To improve user engagement, we propose a new engagement centric QoE function based on real data and design server side ABR algorithm which leverages our new QoE function. We evaluate our method for online test in iQIYI, with 490K real users viewing 666K streams. The results show our approach outperforms existing approaches by significantly improving the viewing time, i.e., 2.8 minutes longer viewing time per user.

### Soudain: Online Adaptive Profile Configuration for Real-time Video Analytics

Kun Wu and Yibo Jin (Nanjing University, China); Weiwei Miao (Communication Bureau, State Grid Jiangsu Electric Power Company, China); Zeng Zeng (State Grid Jiangsu Electric Power CO., LTD., China); Zhuzhong Qian, Jingmian Wang, Mingxian Zhou and Tuo Cao (Nanjing University, China)

0
Since the real-time video analytics with high accuracy requirement is resource-consuming, the profiles regarding such resource-accuracy trade-off are needed before the analytics for better resource allocation at resource-constrained edges. With the inner changes of the video contents, outdated profiles fail to capture the trade-off dynamically over time, which requires the profiles to be updated periodically and incurs an overwhelming resource overhead. Thus, we present Soudain, which dynamically adjusts the configurations in profiles and corresponding profiling intervals to capture the inner changes of multiple video streams at edges. Upon the fine-grained decisions for profiles, we propose an integer program to maximize the accuracy of video analytics in a long-term scope with resource constraint, and then design an algorithm to adjust the profiles in an online manner. We implement Soudain upon the server with GPU. Our testbed evaluations confirm that, by using the live video streams derived from real-world traffic cameras, Soudain ensures the real-time requirement and achieves up to 25% improvement on the detection accuracy, compared with multiple state-of-the-art alternatives.

###### Session Chair

Vijay Gopalakrishnan, AT&T Labs -Research, USA

Session Session 2

## Transport & Security

Conference
12:30 PM — 1:40 PM JST
Local
Jun 24 Thu, 11:30 PM — 12:40 AM EDT

### ASAP: Anti-Spoofing Aphorism using Path-analysis

Eric Muhati and Danda B. Rawat (Howard University, USA)

0
The immense vulnerabilities within network protocols have led to prevalent spoofing attacks. Security measures such as cryptographic authentication, router level filtering, and other anti-spoofing methods are promising yet inadequate. In fact, research shows almost 50% of existing autonomous systems can be "hijacked" and used as spurious network nodes. Spoofing is falsified node identification of a trusted node and subsequent delivery of imposter data packets. Correctly identifying a packets' travel path has been widely used to ascertain node positions and flag unexpected locations as spoofing agents. Challenges and inadequacies arise in analyzing the copious dynamic possible packet paths. We propose a practical anti-spoofing technique when routing is asymmetric or multi-path that contrast network connections from trusted nodes after considering all possible routing alternatives, compared to a network path under scrutiny. We subtly perform obscured intrusion detection through hop-counts clustering via IP header's time-to-live field and special representative nearest captain nodes that differentiate features of valid paths. While many proposed anti-spoofing solutions only analyze static traffic to give ≈ 90% efficacy, we consider all alternatives that deviate from accurate features to achieve improved efficacy of 94.2%.

### Lightning: A Practical Building Block for RDMA Transport Control

Qingkai Meng and Fengyuan Ren (Tsinghua University, China)

0
RoCEv2 (RDMA over Converged Ethernet version 2) is the canonical method for deploying RDMA in Ethernet-based datacenters. Traditionally, RoCEv2 runs over the lossless network which is in turn achieved by enabling Priority Flow Control (PFC) within the network. However, with the scale of data center increases, PFC's side effects, such as head-of-line blocking, congestion spreading, and PFC storms, are amplified. Datacenter operators can no longer tolerate these problems. They are seeking PFC alternatives for RDMA networks. Rather than aim at the lossless RDMA network, we instead handle packet loss effectively to support RDMA over Ethernet.

In this paper, we propose Lightning, a switch building block to enhance RoCE's simple loss recovery. Lightning enhances the switches to send loss notifications directly to the sources with high priority, thus informing sources as quickly as possible. Then, sources can retransmit packets sooner. By addressing challenges such as that shared buffer status is not available at ingress in modern switches, Lightning generates loss notification only when the expected packet is dropped and filters other unexpected packets at ingress, so as to avoid timeouts and prevent unnecessary congestion from unexpected packets. We implement Lightning on commodity programmable switches. In our evaluation, Lightning achieves up to 16.08$$\times$$ reduction of 99.9th percentile flow completion time compared to PFC, IRN and other alternatives.

### Helm: Credit-based Data Center Congestion Control to Achieve Near Global-Optimal SRTF

Jiao Zhang and Jiaming Shi (Beijing University of Posts and Telecommunications, China); Yuxuan Gao (BUPT, China); Yunjie Liu (Beijing University of Posts and Telecommunications, China)

0
To satisfy the ultra-low latency requirement of cloud services, a lot of congestion control mechanisms have been proposed to reduce the Flow Completion Time (FCT) in data center networks. Theoretically, the Shortest Remaining Time First (SRTF) scheduling policy can achieve the minimum FCT. However, existing data center congestion control mechanisms either do not achieve global-optimal SRTF or are difficult to be deployed. In this paper, we analyze the challenges of approximating global-optimal SRTF scheduling in a distributed transport control protocol. Then a credit-based distributed data center congestion control mechanism, Helm, is proposed. Helm solves the challenges by combining the proposed dynamic priority assignment and flow-size-based rate control algorithms at receivers and thus achieves near global-optimal SRTF without modifying commodity switches. Theoretical analysis shows that the performance of Helm is close to SRTF. Besides, extensive simulations are conducted and the results show that Helm reduces the mean and tail FCT by up to 62% and 75% respectively compared with Homa in an oversubscribed data center network.

### MASK: Practical Source and Path Verification based on Multi-AS-Key

Songtao Fu, Ke Xu and Qi Li (Tsinghua University, China); Xiaoliang Wang (Capital Normal University, China); Su Yao, Yangfei Guo and Xinle Du (Tsinghua University, China)

0

###### Session Chair

Danda Rawat, Howard University, USA

Session Session 3

## Federated Learning

Conference
1:50 PM — 3:00 PM JST
Local
Jun 25 Fri, 12:50 AM — 2:00 AM EDT

### FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models

Gaoyang Liu and Xiaoqiang Ma (Huazhong University of Science and Technology, China); Yang Yang (Hubei University, China); Chen Wang (Huazhong University of Science and Technology, China); Jiangchuan Liu (Simon Fraser University, Canada)

0
Federated learning (FL) has recently emerged as a promising distributed machine learning (ML) paradigm. Practical needs of the right to be forgotten'' and countering data poisoning attacks call for efficient techniques that can remove, or unlearn, specific training data from the trained FL model. Existing unlearning techniques in the context of ML, however, are no longer in effect for FL, mainly due to the inherent distinction in the way how FL and ML learn from data. Therefore, how to enable efficient data removal from FL models remains largely under-explored. In this paper, we take the first step to fill this gap by presenting FedEraser, the first federated unlearning methodology that can eliminate the influence of a federated client's data on the global FL model while significantly reducing the time used for constructing the unlearned FL model. The basic idea of FedEraser is to trade the central server's storage for unlearned model's construction time, where FedEraser reconstructs the unlearned model by leveraging the historical parameter updates of federated clients that have been retained at the central server during the training process of FL. A novel calibration method is further developed to calibrate the retained updates, which are further used to promptly construct the unlearned model, yielding a significant speed-up to the reconstruction of the unlearned model while maintaining the model efficacy. Experiments on four realistic datasets demonstrate the effectiveness of FedEraser, with an expected speed-up of 4 times compared with retraining from the scratch. We envision our work as an early step in FL towards compliance with legal and ethical criteria in a fair and transparent manner.

### BatFL: Backdoor Detection on Federated Learning in e-Health

Binhan Xi, Shaofeng Li, Jiachun Li and Hui Liu (Shanghai Jiao Tong University, China); Hong Liu (East China Normal University, China); Haojin Zhu (Shanghai Jiao Tong University, China)

0
Federated Learning (FL) has received significant interest both from the research field and industry perspective. One of the most promising cross-silo applications on FL is electronic health records mining which trains a model on siloed data. In this application, clients can be different hospitals or health centers that are located in geo-distributed data centers. A central orchestration server (superior health center) organizes the training, while never seeing patients' raw data. In this paper, we demonstrate that any local hospital in such an FL training framework can introduce hidden backdoor functionality into the joint global model. The backdoored joint global model will produce an adversary-expected output when a predefined trigger is attached to its input but it will behave normally for clean inputs. This vulnerability is exacerbated by the distributed nature of FL, making detecting backdoor attacks on FL a challenging work. Based on the coalitional game and Shapley value, we propose an effective and real-time backdoor detection system on FL. Extensive experiments over two machine learning tasks show that our techniques achieve high accuracy and are robust against multi-attackers settings.

### Optimizing Federated Learning on Device Heterogeneity with A Sampling Strategy

Xiaohui Xu, Sijing Duan, Jinrui Zhang, Yunzhen Luo and Deyu Zhang (Central South University, China)

0
Federated learning (FL) is a novel machine learning that performs distributed training locally on devices and aggregating the local models into a global one. The limited network bandwidth and the tremendous amount of model data that need to be transported bring up expensive communication cost. Meanwhile, heterogeneity in the devices' local datasets and computation exerts a huge influence on the performance of FL. To address these issues, we provide an empirical and mathematical analysis of device heterogeneity on the performance of model convergence and quality, then propose a holistic design to efficiently sample devices. Furthermore, we design a dynamic strategy to further speed up convergence and propose the FedAgg algorithm to alleviate the deviation caused by device heterogeneity. With extensive experiments performed in PyTorch, we show that the number of communication rounds required in FL can be reduced by up to 52% on the MNIST dataset, 32% on CIFAR-10, and 30% on FashionMnist as compared to the Federated Averaging algorithm.

### Glint: Decentralized Federated Graph Learning with Traffic Throttling and Flow Scheduling

Tao Liu and Peng Li (The University of Aizu, Japan); Yu Gu (Hefei University of Technology, China)

0
Federated learning has been proposed as a promising distributed machine learning paradigm with strong privacy protection on training data. Existing work mainly focuses on training convolutional neural network (CNN) models good at learning on image/voice data. However, many applications generate graph data and graph learning cannot be efficiently supported by existing federated learning techniques. In this paper, we study federated graph learning (FGL) under the cross-silo setting where several servers are connected by a wide-area network, with the objective of improving the Quality-of-Service (QoS) of graph learning tasks. We find that communication becomes the main system bottleneck because of frequent information exchanges among federated severs and limited network bandwidth. To conquer this challenge, we design Glint, a decentralized federated graph learning system with two novel designs: network traffic throttling and priority-based flows scheduling. To evaluate the effectiveness of Glint, we conduct both experiments on a testbed and trace-driven simulations. The results show that Glint can significantly outperform existing federated learning solutions.

###### Session Chair

Baochun Li, University of Toronto, Canada

Session Session 4

## NFV

Conference
3:10 PM — 4:20 PM JST
Local
Jun 25 Fri, 2:10 AM — 3:20 AM EDT

### Gost: Enabling Efficient Spatio-Temporal GPU Sharing for Network Function Virtualization

Andong Zhu and Deze Zeng (China University of Geosciences, China); Lin Gu (Huazhong University of Science and Technology, China); Peng Li (The University of Aizu, Japan); Quan Chen (Shanghai Jiao Tong University, China)

0
Network Function Virtualization (NFV) enables network functions to run on general-purpose servers, thus alleviates the reliance on dedicated hardware and significantly improves the scalability and flexibility in networking service provisioning. Meanwhile, it has recognized that Virtualized Network Functions (VNFs) suffer from serious performance problem. The Graphics Processing Unit (GPU), with massive processing cores, has been advocated as a potential accelerator for improving the performance efficiency of VNFs. However, the extraordinary architecture of GPU makes existing CPU-oriented task scheduling strategies fail to be applied, limiting the acceleration potential of GPUs. To this end, we propose a GPU-oriented spatio-temporal sharing framework as Gost to improve the performance of GPU-accelerated VNFs. We also study how to minimize the end-to-end latency of VNF flows via careful scheduling on the execution order and the GPU resource allocation (i.e., the number of threads). We first formally describe the problem as a non-linear integer programming problem, which is then equivalently transformed into an integer linear programming (ILP) form. Considering the high computation complexity of solving ILP, we further propose a customized list scheduling based spatio-temporal GPU sharing strategy (LSSTG). We have practically implemented a prototype of Gost, based on which we also verify the high efficiency of LSSTG by extensive experiments.

### A-DDPG: Attention Mechanism-based Deep Reinforcement Learning for NFV

Nan He, Song Yang and Fan Li (Beijing Institute of Technology, China); Stojan Trajanovski (Microsoft, United Kingdom (Great Britain)); Fernando A. Kuipers (Delft University of Technology, The Netherlands); Xiaoming Fu (University of Goettingen, Germany)

0
The efficacy of Network Function Virtualization (NFV) depends critically on (1) where the virtual network functions (VNFs) are placed and (2) how the traffic is routed. Unfortunately, these aspects are not easily optimized, especially under time-varying network states with different quality of service (QoS) requirements. Given the importance of NFV, many approaches have been proposed to solve the VNF placement and traffic routing problem. However, those prior approaches mainly assume that the state of the network is static and known, disregarding real-time network variations. To bridge that gap, in this paper, we formulate the VNF placement and traffic routing problem as a Markov Decision Process model to capture the dynamic network state transitions. In order to jointly minimize the delay and cost of NFV providers and maximize the revenue, we devise a customized Deep Reinforcement Learning (DRL) algorithm, called A-DDPG, for VNF placement and traffic routing in a real-time network. A-DDPG uses the attention mechanism to ascertain smooth network behavior within the general framework of network utility maximization (NUM). The simulation results show that A-DDPG outperforms the state-of-the-art in terms of network utility, delay, and cost.

### Towards Chain-Aware Scaling Detection in NFV with Reinforcement Learning

Lin He, Lishan Li and Ying Liu (Tsinghua University, China)

0
Elastic scaling enables dynamic and efficient resource provisioning in Network Function Virtualization (NFV) to serve fluctuating network traffic. Scaling detection determines the appropriate time when a virtual network function (VNF) needs to be scaled, and its precision and agility profoundly affect system performance. Previous heuristics define fixed control rules based on a simplified or inaccurate understanding of deployment environments and workloads. Therefore, they fail to achieve optimal performance across a broad set of network conditions.

In this paper, we propose a chain-aware scaling detection mechanism, namely CASD, which learns policies directly from experience using reinforcement learning (RL) techniques. Furthermore, CASD incorporates chain information into control policies to efficiently plan the scaling sequence of VNFs within a service function chain. This paper makes the following two key technical contributions. Firstly, we develop chain-aware representations, which embed global chains of arbitrary sizes and shapes into a set of embedding vectors based on graph embedding techniques. Secondly, we design an RL-based neural network model to make scaling decisions based on chain-aware representations. We implement a prototype of CASD, and its evaluation results demonstrate that CASD reduces the overall system cost and improves system performance over other baseline algorithms across different workloads and chains.

Xiang Chen (Peking University, Pengcheng Lab, and Fuzhou University, China); Qun Huang (Peking University, China); Wang Peiqiao (Fuzhou University, China); Zili Meng (Tsinghua University, China); Hongyan Liu (Zhejiang University, China); Yuxin Chen (University of Science and Technology of China, China); Dong Zhang (Fuzhou University, China); Haifeng Zhou (Zhejiang University, China); Boyang Zhou (Zhejiang Lab, China); Chunming Wu (College of Computer Science, Zhejiang University, China)

1
In network function virtualization (NFV), network functions (NFs) are chained as a service function chain (SFC) to enhance NF management with high flexibility. Recent solutions indicate that the processing performance of SFCs can be significantly improved by offloading NFs to programmable switches. However, such offloading requires a deep understanding of NF properties to achieve the maximum SFC performance, which brings non-trivial burdens to network administrators. In this paper, we propose LightNF, a novel system that simplifies NF offloading in programmable networks. LightNF automatically dissects comprehensive NF properties (e.g., NF performance behaviors) via code analysis and performance profiling while eliminating manual efforts. It then leverages the analyzed NF properties in its SFC placement so as to produce the performance-optimal offloading. We have implemented a LightNF prototype. Our experiments show that LightNF outperforms state-of-the-art solutions with orders-of-magnitude reduction in per-packet processing latency and 9.5× improvement in SFC throughput.

###### Session Chair

Zehua Guo, Beijing Institute of Technology, China

Session Session 5

## Blockchain & Security

Conference
4:30 PM — 5:40 PM JST
Local
Jun 25 Fri, 3:30 AM — 4:40 AM EDT

### Secure and Scalable QoS for Critical Applications

Marc Wyss, Giacomo Giuliari and Markus Legner (ETH Zürich, Switzerland); Adrian Perrig (ETH Zurich Switzerland & Carnegie Mellon University, USA)

0
With the proliferation of online payment systems, the emergence of globally distributed consensus algorithms, and the increase of remotely managed critical IoT infrastructure, the need for critical-yet-frugal communication---high-availability and low-rate---is becoming increasingly pressing. For many of these applications, the use of leased lines or SD-WAN solutions is impractical due to their inflexibility and high costs, while standard Internet communication lacks the necessary reliability and attack resilience.

To address this rising demand for strong quality-of-service (QoS) guarantees, we develop the GMA-based light-weight communication protocol (GLWP), building on a recent theoretical result, the GMA algorithm. GLWP is a capability-based protocol which is able to bootstrap network-wide bandwidth allocations in single round-trip times, and achieves high availability even under active attacks. Due to its clever use of cryptographic mechanisms, GLWP introduces minimal state in the network and causes low computation and communication overhead. We implement a GLWP prototype using Intel DPDK and show that it achieves line rate on a 40 Gbps link running on commodity hardware, thus showing that GLWP is a viable solution to provide strong QoS guarantees for critical-yet-frugal communications.

### A Novel Proof-of-Reputation Consensus for Storage Allocation in Edge Blockchain Systems

Jiarui Zhang, Yaodong Huang, Fan Ye and Yuanyuan Yang (Stony Brook University, USA)

0
Edge computing guides the collaborative work of widely distributed nodes with different sensing, storage, and computing resources. For example, sensor nodes collect data and then store it in storage nodes so that computing nodes can access the data when needed. In this paper, we focus on the quality of service (QoS) in storage allocation in edge networks. We design a reputation mechanism for nodes in edge networks, which enables interactive nodes to evaluate the quality of service for reference. Each node publicly broadcasts a personal reputation list to evaluate all other nodes, and each node can calculate the global reputation of all nodes by aggregating personal reputations. We then propose a storage allocation algorithm that stores data to appropriate locations. The algorithm considers fairness, efficiency, and reliability which is derived from reputations. We build a novel Proof-of-Reputation (PoR) blockchain to support consensus on the reputation mechanism and storage allocation. The PoR blockchain ensures safety performance, saves computing resources, and avoids centralization. Extensive simulation results show our proposed algorithm is fair, efficient, and reliable. The results also show that in the presence of attackers, the success rate of honest nodes accessing data can reach 99.9%.

### Accelerating Transactions Relay in Blockchain Networks via Reputation

Mengqian Zhang (Shanghai Jiao Tong University, China); Yukun Cheng (Suzhou University of Science and Technology, China); Xiaotie Deng (Peking University, China); Bo Wang (Nervina Labs Ltd., China); Jan Xie (Cryptape Technology Co., Ltd., China); Yuanyuan Yang and Jiarui Zhang (Stony Brook University, USA)

0
For a blockchain system, the network layer is of great importance for scalability and security. The critical task of blockchain networks is to provide a fast delivery of data. A rapid spread accelerates the transactions to be included into blocks and then confirmed. Existing blockchain systems, especially the cryptocurrencies like Bitcoin, take a simple strategy that requires relay nodes to verify all received transactions and then forward valid ones to all outbound neighbors. Unfortunately, this design is inefficient and slows down the transmission of transactions.
In this paper, we introduce the concept of reputation and propose a novel relay protocol, RepuLay, to accelerate the transmission of transactions across the network. First of all, we design a reputation mechanism to help each node identify the unreliable and inactive neighbors. In this mechanism, two values are used to define one's reputation. Each node keeps a local list of reputations of all its neighbors. Based on the reputation mechanism, RepuLay adopts probabilistic strategies to process transactions. More specifically, after receiving a transaction, the relay node verifies it with a certain probability, which is deduced from the first value of sender's reputation. Next, the valid and unverified transactions are forwarded to some neighbors. Each neighbor has some probability to be chosen as a receiver and the probability is determined by its second value of reputation. Theoretically, we prove that our design can guarantee the quality of relayed transactions. Further simulation results confirm that RepuLay effectively accelerates the spread of transactions and optimize the usage of nodes' bandwidths.