Session 1-D

## Network Intelligence I

Conference
2:00 PM — 3:30 PM EDT
Local
Jul 7 Tue, 2:00 PM — 3:30 PM EDT

### Camel: Smart, Adaptive Energy Optimization for Mobile Web Interactions

Jie Ren (Shaanxi Normal University, China); Lu Yuan (Northwest University, China); Petteri Nurmi (University of Helsinki, Finland); Xiaoming Wang and Miao Ma (Shaanxi Normal University, China); Ling Gao, Zhanyong Tang and Jie Zheng (Northwest University, China); Zheng Wang (University of Leeds, United Kingdom (Great Britain))

2
Web technology underpins many interactive mobile applications. However, energy-efficient mobile web interactions is an outstanding challenge. Given the increasing diversity and complexity of mobile hardware, any practical optimization scheme must work for a wide range of users, mobile platforms and web workloads. This paper presents CAMEL, a novel energy optimization system for mobile web interactions. CAMEL leverages machine learning techniques to develop a smart, adaptive scheme to judiciously trade performance for reduced power consumption. Unlike prior work, CAMEL directly models how a given web content affects the user expectation and uses this to guide energy optimization. It goes further by employing transfer learning and conformal predictions to tune a previously learned model in the end-user environment and improve it over time. We apply CAMEL to Chromium and evaluate it on four distinct mobile systems involving 1,000 testing webpages and 30 users. Compared to four state-of-the-art web-event optimizers, CAMEL delivers 22% more energy savings, but with 49% fewer violations on the quality of user experience, and exhibits orders of magnitudes less overhead when targeting a new computing environment.

### COSE: Configuring Serverless Functions using Statistical Learning

Nabeel Akhtar (Boston University & Akamai, USA); Ali Raza (Boston University, USA); Vatche Ishakian (Bentley University, USA); Ibrahim Matta (Boston University, USA)

4
Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud computing with a simplified programming model, that aims to abstract away most operational concerns. Running serverless functions requires users to configure multiple parameters, such as memory, CPU, cloud provider, etc. While relatively simpler, configuring such parameters correctly while minimizing cost and meeting delay constraints is not trivial. In this paper, we present COSE, a framework that uses Bayesian Optimization to find the optimal configuration for serverless functions. COSE uses statistical learning techniques to intelligently collect samples and predict the cost and execution time of a serverless function across unseen configuration values. Our framework uses the predicted cost and execution time, to select the "best" configuration parameters for running a single or a chain of functions, while satisfying customer objectives. In addition, COSE has the ability to adapt to changes in the execution time of a serverless function. We evaluate COSE not only on a commercial cloud provider, where we successfully found optimal/near-optimal configurations in as few as five samples, but also over a wide range of simulated distributed cloud environments that confirm the efficacy of our approach.

### Machine Learning on Volatile Instances

Xiaoxi Zhang, Jianyu Wang, Gauri Joshi and Carlee Joe-Wong (Carnegie Mellon University, USA)

10
Due to the massive size of the neural network models and training datasets used in machine learning today, it is imperative to distribute stochastic gradient descent (SGD) by splitting up tasks such as gradient evaluation across multiple worker nodes. However, running distributed SGD can be prohibitively expensive because it may require specialized computing resources such as GPUs for extended periods of time. We propose cost-effective strategies that exploit volatile cloud instances that are cheaper than standard instances, but may be interrupted by higher priority workloads. To the best of our knowledge, this work is the first to quantify how variations in the number of active worker nodes (as a result of preemption) affects SGD convergence and the time to train the model. By understanding these trade-offs between preemption probability of the instances, accuracy, and training time, we are able to derive practical strategies for configuring distributed SGD jobs on volatile instances such as Amazon EC2 spot instances and other preemptible cloud instances. Experimental results show that our strategies achieve good training performance at substantially lower cost.

### Optimizing Mixture Importance Sampling Via Online Learning: Algorithms and Applications

Tingwei Liu (The Chinese University of Hong Kong, Hong Kong); Hong Xie (Chongqing University, China); John Chi Shing Lui (Chinese University of Hong Kong, Hong Kong)

3
Importance sampling (IS) is widely used in rare event simulation, but it is costly to deal with \textit{many rare events} simultaneously. For example, a rare event can be the failure to provide the quality-of-service guarantee for a critical network flow. Since network providers often need to deal with many critical flows (i.e., rare events) simultaneously, if using IS, providers have to simulate each rare event with its customized importance distribution individually. To reduce such cost, we propose an efficient mixture importance distribution for multiple rare events and formulate the mixture importance sampling optimization problem (MISOP) to select the optimal mixture. We first show that the “\textit{search direction}” of mixture is computationally expensive to evaluate, making it challenging to locate the optimal mixture. We then formulate a “\textit{zero learning cost}" online learning framework to estimate the “\textit{search direction}”, and learn the optimal mixture from simulation samples of events. We develop two multi-armed bandit online learning algorithms to (1) Minimize the sum of estimation variances with regret of $$(\ln{T})^2/T$$; (2) Minimize the simulation cost with regret of $$\sqrt{\ln{T}/T}$$. We demonstrate our method on a realistic network and show that it can reduce cost measures by $$61.6%$$ compared with the uniform mixture IS.

###### Session Chair

Christopher G. Brinton (Purdue University)

Session 2-D

## Network Intelligence II

Conference
4:00 PM — 5:30 PM EDT
Local
Jul 7 Tue, 4:00 PM — 5:30 PM EDT

### Autonomous Unknown-Application Filtering and Labeling for DL-based Traffic Classifier Update

Jielun Zhang, Fuhao Li, Feng Ye and Hongyu Wu (University of Dayton, USA)

1
Network traffic classification has been widely studied to fundamentally advance network measurement and management. Machine Learning is one of the effective approaches for network traffic classification. Specifically, Deep Learning (DL) has attracted much attention from the researchers due to its effectiveness even in encrypted network traffic without compromising user privacy nor security. However, most of the existing models learned only from a closed-world dataset, thus they can only classify some existing classes sampled in the limited dataset. One drawback is that unknown classes which emerge frequently will not be correctly classified. To tackle this issue, we propose an autonomous learning framework to effectively update DL-based traffic classification models during active operations. The core of the proposed framework consists of a DL-based classifier, a self-learned discriminator, and autonomous self-labeling. The discriminator and self-labeling process can generate new dataset during active operations to support updates of the classifiers. Evaluation of the proposed framework is performed on an open dataset, i.e. ISCX VPN-nonVPN, and independently collected data packets. The results demonstrate that the proposed autonomous learning framework can filter packets from unknown classes and provide accurate labels. Thus, the DL-based classification models can be updated successfully with the autonomously generated dataset.

### Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs

Shaohuai Shi, Qiang Wang and Xiaowen Chu (Hong Kong Baptist University, Hong Kong); Bo Li (Hong Kong University of Science and Technology, Hong Kong); Yang Qin (Harbin Institute of Technology (Shenzhen), China); Ruihao Liu and Xinxiao Zhao (ShenZhen District Block Technology Co., Ltd., China)

1
Distributed synchronous stochastic gradient descent (SGD) algorithms are widely used in large-scale deep learning applications, while it is known that the communication bottleneck limits the scalability of the distributed system. Gradient sparsification is a promising technique to significantly reduce the communication traffics, while pipelining can further overlap the communications with computations. However, gradient sparsification introduces extra computation time, and pipelining requires many layer-wise communications which introduce significant communication startup overheads. Merging gradients from neighbor layers could reduce the startup overheads, but on the other hand it would increase the computation time of sparsification and the waiting time for the gradient computation. In this paper, we formulate the trade-off between communications and computations (including backward computation and gradient sparsification) as an optimization problem, and derive an optimal solution to the problem. We further develop the optimal merged gradient sparsification algorithm with SGD (OMGS-SGD) for distributed training of deep learning. We conduct extensive experiments to verify the convergence properties and scaling performance of OMGS-SGD. Experimental results show that OMGS-SGD achieves up to 31% end-to-end time efficiency improvement over the state-of-the-art sparsified SGD while preserving nearly consistent convergence performance with original SGD without sparsification on a 16-GPU cluster connected with 1 Gbps Ethernet.

### Tracking the State of Large Dynamic Networks via Reinforcement Learning

Matthew Andrews (Nokia Bell Labs, USA); Sem Borst (Eindhoven University of Technology & Nokia Bell Labs, USA); Jeongran Lee (Nokia Bell Labs, USA); Enrique Martín-López and Karina Palyutina (Nokia Bell Labs, United Kingdom (Great Britain))

4
A Network Inventory Manager (NIM) is a software solution that scans, processes and records data about all devices in a network. We consider the problem faced by a NIM that can send out a limited number of probes to track changes in a large, dynamic network. The underlying change rate for the Network Elements (NEs) is unknown and may be highly non-uniform. The NIM should concentrate its probe budget on the NEs that change most frequently with the ultimate goal of minimizing the weighted Fraction of Stale Time (wFOST) of the inventory. However, the NIM cannot discover the change rate of a NE unless the NE is repeatedly probed. We develop and analyze two algorithms based on Reinforcement Learning to solve this exploration-vs-exploitation problem. The first is motivated by the Thompson Sampling method and the second is derived from the Robbins-Monro stochastic learning paradigm. We show that for a fixed probe budget, both of these algorithms produce a potentially unbounded improvement in terms of wFOST compared to the baseline algorithm that divides the probe budget equally between all NEs. Our simulations of practical scenarios show optimal performance in minimizing wFOST while discovering the change rate of the NEs.

### Unsupervised and Network-Aware Diagnostics for Latent Issues in Network Information Databases

Hua Shao (Tsinghua University, China); Li Chen (Huawei, Hong Kong); Youjian Zhao (Tsinghua University, China)

2
Network management database (NID) is essential in modern large-scale networks. Operators rely on NID to provide accurate and up-to-date data, however, NID---like any other databases---can suffers from latent issues such as inconsistent, incorrect, and missing data. In this work, we first reveal latent data issues in NIDs using real traces from a large cloud provider, Tencent. Then we design and implement a diagnostic system, NAuditor, for unsupervised identification of latent issues in NIDs. In the process, we design a compact and graph-based data structure to efficiently encode the complete NID as a Knowledge Graph, and model the diagnostic problems as unsupervised Knowledge Graph Refinement problems. We show that the new encoding achieves superior performance than alternatives, and can facilitate adoption of state-of-the-art KGR algorithms. We also have used NAuditor in a production NID, and found 71 real latent issues, which all have been confirmed by operators.

###### Session Chair

Wenye Wang (NC State University)