Session H1

Workshop — Big Data Systems

Conference
3:00 PM — 4:20 PM HKT
Local
Dec 2 Wed, 11:00 PM — 12:20 AM PST

How Fast Can We Insert? An Empirical Performance Evaluation of Apache Kafka

Guenter Hesse, Christoph Matthies and Matthias Uflacker

0
Message brokers see widespread adoption in modern IT landscapes, with Apache Kafka being one of the most employed platforms. These systems feature well-defined APIs for use and configuration and present flexible solutions for various data storage scenarios. Their ability to scale horizontally enables users to adapt to growing data volumes and changing environments. However, one of the main challenges concerning message brokers is the danger of them becoming a bottleneck within an IT architecture. To prevent this, knowledge about the amount of data a message broker using a specific configuration can handle needs to be available. In this paper, we propose a monitoring architecture for message brokers and similar Java Virtual Machine-based systems. We present a comprehensive performance analysis of the popular Apache Kafka platform using our approach. As part of the benchmark, we study selected data ingestion scenarios with respect to their maximum data ingestion rates. The results show that we can achieve an ingestion rate of about 420,000 messages/second on the used commodity hardware and with the developed data sender tool.

Performance Modeling and Tuning for DFT Calculations on Heterogeneous Architectures

Hadia Ahmed, David Williams-Young, Khaled Z. Ibrahim and Chao Yang

0
Tuning scientific code for heterogeneous computing architecture is a growing challenge. Not only do we need to tune the code to multiple architectures, but also we need to select or schedule computations to the most efficient compute variant. In this paper, we explore the tuning and performance modeling question of one of the most time computing kernels in density functional theory calculations on systems multicore host CPU accelerated with GPUs. We show the problem configuration dictates the choice of the most efficient compute engine. Such choice could alternate between the host and the accelerator, especially while scaling. As such, a performance model to predict the execution time on the host CPU and GPU is essential to select the compute environment and to achieve optimal performance. We present a simple model that empirically carry out such task and could accurately steer the scheduling of computation.

Graph-based Approaches for the Interval Scheduling Problem

Panagiotis Oikonomou, Nikos Tziritas, Georgios Theodoropoulos, Maria Koziri, Thanasis Loukopoulos and Samee U. Khan

0
One of the fundamental problems encountered by large-scale computing systems, such as clusters and cloud, is to schedule a set of jobs submitted by the users. Each job is characterized by resource demands, as well as start and completion time. Each job must be scheduled to execute on a machine having the required capacity between the start and completion time (referred as interval) of the job. Each machine is defined by a parallelism parameter g that indicates the maximum number of jobs that can be processed by the machine, in parallel. The above problem is referred to as the interval scheduling problem with bounded parallelism. The objective is to minimize the total busy time of all machines. Majority of the solutions proposed in the literature consider homogeneous set of jobs and machines that is a simplified assumption as in practice, heterogeneous jobs and machines are frequently encountered. In this article, we tackle the aforesaid problem with a set of heterogeneous jobs and machines. A major contribution of our work is that the problem is addressed in a novel way by combining a graphbased approach and a dynamic programming approach which is based on a variation of bin packing problem. A greedy algorithm is also proposed by employing only a graph-based approach at the aim to reduce the computational complexity. Experimental results show that the proposed algorithms can significantly reduce the cumulative busy interval over all machines compared with state-of-the-art algorithms proposed in the literature.

MPI parallelization of NEUROiD models using docker swarm

Raghu Sesha Iyengar and Mohan Raghavan

0
NEURON along with other systems simulators is increasingly being used to simulate neural systems where the complexity demands massive parallel implementations. NEURON��s ParallelContext allows parallelizing models using MPI. However, when using NEURON models in a docker container, this parallelization does not work out-of-the-box. We propose an architecture for MPI parallelization of NEURON models using docker swarm. We integrate this on our NEUROiD platform and obtain almost 16x improvement in simulation time on our cluster.

Session Chair

Yuanyuan Xu (Hohai University)

Session H2

Workshop — Edge Intelligence for Smart IoT Applications

Conference
3:00 PM — 5:20 PM HKT
Local
Dec 2 Wed, 11:00 PM — 1:20 AM PST

S-GAT: Accelerating Graph Attention Networks Inference on FPGA Platform with Shift Operation

Weian Yan, Weiqin Tong, and Xiaoli Zhi

1
Deep learning has been successful in many fields such as acoustics, image, and natural language processing. However, due to the unique characteristics of graphs, deep learning using universal graph data is not easy. The Graph Attention Networks (GATs) show the best performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to design and implement an FPGA-based accelerator called S-GAT for graph attention networks that achieves excellent performance on acceleration and energy efficiency without losing accuracy, and does not rely on DSPs and large amounts of on-chip memory. We design S-GAT with software and hardware co-optimization. Specifically, we use model compression and feature quantization to reduce the model size, and use shift addition units (SAUs) to convert multiplication into shift operation to further reduce the computation requirements. We integrate the above optimizations into a universal hardware pipeline for various structures of GATs. At last, we evaluate our design on an Inspur F10A board with an Intel Arria 10 GX1150 and 16 GB DDR3 memory. Experimental results show that S-GAT can achieve 7.34 times speedup over Nvidia Tesla V100 and 593 times over Xeon CPU Gold 5115 while maintaining accuracy, and 48 times and 2400 times on energy efficiency respectively.

Explainable Congestion Attack Prediction and Software-level Reinforcement in Intelligent Traffic Signal System

Xiaojin Wang, Yingxiao Xiang, Wenjia Niu, Endong Tong, and Jiqiang Liu

1
With connected vehicle(CV) technology, the nextgeneration transportation system is stepping into its implementation phase via the deployment of Intelligent Traffic Signal System (I-SIG). Since the congestion attack was firstly discovered in USDOT (U.S. Department of Transportation) sponsored ISIG, deployed in three cities including New York, such realistic threat opens a new security issue. In this work, from machine learning perspective, we perform a systematic feature analysis on congestion attack and its variations from last vehicle of different traffic flow pattern. We first adopt the Tree-regularized Gated Recurrent Unit(TGRU) to make explainable congestion attack prediction, in which 32-dimension features are defined to character a 8-phase intersection traffic. We then develop corresponding software-level security reinforcements suggestions, which can be further expanded as an important work. In massive experiments based on real-world intersection settings, we eventually distill 384 samples of congestion attacks to train a TGRU-based attack prediction model, and achieve an average 80% precision. We further discussed possible reinforcement defense methods according to our prediction model.

Dynamic-Static-based Spatiotemporal Multi-Graph Neural Networks for Passenger Flow Prediction

Jingyan Ma, Jingjing Gu, Qiang Zhou, Qiuhong Wang and Ming Sun

0
Various sensing and computing technologies have gradually outlined the future of the intelligent city. Passenger flow prediction of public transports has become an important task in Intelligent Transportation System (ITS), which is the prerequisite for traffic management and urban planning. There exist many methods based on deep learning for learning the spatiotemporal features from high non-linearity and complexity of traffic flows. However, they only utilize temporal correlation and static spatial correlation, such as geographical distance, which is insufficient in the mining of dynamic spatial correlation. In this paper, we propose the Dynamic-Staticbased Spatiotemporal Multi-Graph Neural Networks model (DSSTMG) for predicting traffic passenger flows, which can concurrently incorporate the temporal and multiple static and dynamic spatial correlations. Firstly, we exploit the multiple static spatial correlations by multi-graph fusion convolution operator, including adjacent relation, station functional zone similarity and geographical distance. Secondly, we exploit the spatial dynamic correlations by calculating the similarity between the flow pattern of stations over a period of time, and build the dynamic spatial attention. Moreover, we use time attention and encoder-decoder architecture to capture temporal correlation. The experimental results on two realworld datasets show that the proposed DSSTMG outperforms state-of-the-art methods.

Incentive-driven Data Offloading and Caching Replacement Scheme in Opportunistic Mobile Networks

Tong Wu, Xuxun Liu, Deze Zeng, Huan Zhou and Shouzhi Xu

0
Offloading cellular traffic through Opportunistic Mobile Networks (OMNs) is an effective way to relieve the burden of cellular networks. Providing data offloading services requires a lot of resources, and nodes in OMNs are selfish and rational, they are not willing to provide data offloading services for others without any compensation. Therefore, it is urgent to design an incentive mechanism to stimulate mobile nodes to participate in data offloading process. In this paper, we propose a Reverse A uction-based Incentive Mechanism to stimulate mobile nodes in OMNs to provide data offloading services, and take the cache management into consideration. We model the incentive-driven data offloading process as a non-linear integer programming problem, then a Greedy Helper Selection Method (GHSM) and a Caching Replacement Scheme (CRS) are proposed to solve the problem. In addition, we also propose an innovative payment rule based on the Vickrey-Clarke-groves (VCG) model to ensure the individual rationality and authenticity of the proposed algorithm. Trace-driven simulation results show that the proposed algorithm can reduce the cost of Content Service Provider (CSP) significantly in different scenarios.

Adaptive DNN Partition in Edge Computing Environments

Weiwei Miao, Zeng Zeng, Lei Wei, Shihao Li, Chengling Jiang and Zhen Zhang

0
Deep Neural Network (DNN) has been applied widely nowadays, making remarkable achievements in a wide variety of research fields. With the improvement of the accuracy requirements for the inference results, the topology of DNN tends to be more and more complex, evolving from chain topology to directed acyclic graph (DAG) topology, which leads to the huge amount of computation. For those end devices which have limited computing resources, the delay of running DNN models independently may be intolerable. As a solution, edge computing can make use of all available devices in the edge computing environments comprehensively to run DNN inference tasks, so as to achieve the purpose of acceleration. In this case, how to split DNN inference task into several small tasks and assign them to different edge devices is the central issue. This paper proposes a load-balancing algorithm to split DNN with DAG topology adaptively according to the environment. Extensive experimental results show the the propose adaptive algorithm can effectively accelerate the inference speed.

Efficient Edge Service Migration in Mobile Edge Computing

Zeng Zeng, Shihao Li, Weiwei Miao, Lei Wei, Chengling Jiang, Chuanjun Wang and Mingxuan Zhang

0
Edge computing is one of the emerging technologies aiming to enable timely computation at the network edge. With virtualization technologies, the role of the traditional edge providers is separated into two: edge infrastructure providers (EIPs), who manage the physical edge infrastructure, and edge service providers (ESPs), who purchase slices of physical resources (e.g., CPU, bandwidth, memory space, disk storage) from EIPs and then cache service entities to offer their own value-added services to end users. These value-added services are also called virtual network function or VNF. As we know, edge computing environments are dynamic, and the requirements of edge service for computing resources usually fluctuate over time. Thus, when the demand of a VNF cannot be satisfied, we need to design the strategies for migrating the VNF so as to meet its demand and retain the network performance. In this paper, we concentrate on migrating VNFs efficiently (MV), such that the migration can meet the bandwidth requirement for data transmission. We prove that MV is NP-complete. We present several exact and heuristic solutions to tackle it. Extensive simulations demonstrate that the proposed heuristics are efficient and effective.

A protocol-independent container network observability analysis system based on eBPF

Chang Liu, Zhengong Cai, Bingshen Wang, Zhimin Tang, and Jiaxu Liu

0
Technologies such as microservices, containerization and Kubernetes in cloud-native environments make large-scale application delivery easier and easier, but problem troubleshooting and fault location in the face of massive applications is becoming more and more complex. Currently, the data collected by the mainstream monitoring technologies based on sampling is difficult to cover all anomalies, and the kernel's lack of observability also makes it difficult to monitor more detailed data in container environments such as the Kubernetes platform. In addition, most of the current technology solutions use tracing and application performance monitoring tools (APMs), but these technologies limit the language used by the application and need to be invasive into the application code, many scenarios require more general network performance detection diagnostic methods that do not invade the user application. In this paper, we propose to introduce network monitoring at the kernel level below the application for the Kubernetes cluster in Alibaba container service. By nonintrusive collection of user application L7/L4 layer network protocol interaction information based on eBPF, data collection of more than 10M throughputs per second can be achieved without modifying any kernel and application code, while the impact on the system application is less than 1%. It also uses machine learning methods to analyze and diagnose application network performance and problems, analyze network performance bottlenecks and locate specific instance information for different applications, and realize protocol-independent network performance problem location and analysis.

Session Chair

Yanchao Zhao (Nanjing University of Aeronautics and Astronautics) and Sheng Zhang (Nanjing University)

Session H3

Workshop — Heterogeneous Multi-access Mobile Edge Computing and Applications

Conference
3:00 PM — 4:40 PM HKT
Local
Dec 2 Wed, 11:00 PM — 12:40 AM PST

Performance Guaranteed Single Link Failure Recovery in SDN Overlay Networks

Lilei Zheng, Hongli Xu, Suo Chen and Liusheng Huang

0
An SDN overlay network is a legacy network improved through SDN and overlay technology. It has some traits including the cheap upgrade cost, flexible network management and the sharing of physical network resources which has brought huge benefits to the multi-tenant cloud platform. Link failure is an important issue that shoulde be solved in any large network. In SDN overlay networks, link failure recovery brings new challenges different from the legacy network, such as how to maintain the performance of overlay networks in the post-recovery network. Thus, in the case of single link failure, we devise a recovery approach to guarantee the performance of overlay networks by the coordination between SDN switches and traditional switches. We formulate the link failure recovery (LFR) problem as an integer linear program and prove its NP-hardness. A roundingbased algorithm with bounded approximation factors is devised to solve the LFR problem. The simulation results show that the devised scheme can guarantee the performance of the overlay network after restoration. The results also show that, compared with SPR and IPFRR, the designed method can reduce the maximum link load rate by approximately 41.5% and 51.6%.

A Personal Distributed Real-time Collaborative System

Michalis Konstantopoulos, Nikos Chondros and Mema Roussopoulos

0
In this paper, we present O3REAL, a privacypreserving distributed middleware for real-time collaborative editing of documents. O3REAL introduces a novel approach for building peer-to-peer real-time collaborative applications, using a reliable broadcast channel mechanism for network communication, but at the same time provides for persistent storage management of collaborative documents using the filesystem interface of a POSIX compliant filesystem. This approach enables real-time, completely decentralized collaboration among users, without the need for a third party to intervene, and significantly simplifies the creation of peer-to-peer collaborative applications. We demonstrate that O3REAL scales well for real-time collaboration use-cases. For example, with 33 users simultaneously collaborating on a document in real time over a WAN with a 50 ms link delay, the average perceived latency is approximately 54 ms, which is very close to the optimal baseline. In comparison, Etherpad exhibits nearly twice the perceived latency.

Cooperative Resource Sharing Strategy With eMBB Cellular and C-V2X Slices

Yan Liang, Xin Chen, Shuang Chen and Ying Chen

0
The emerging fifth generation (5G) wireless technologies support services with huge heterogeneous requirements. Network slicing technology can compose multiple logical networks and allocate wireless resources according to the needs of each user, which can reduce the cost of hardware and network resources. Nevertheless, considering how systems containing different types of users reduce the cost of resources remains challenging. In this paper, we study the system cost of two types of user groups requesting resource blocks (RBs) at the radio access network (RAN), which are the enhanced mobile broadband (eMBB) cellular user group and the cellular vehicle to everything (C-V2X) user group. In order to improve the rational utilization, we make dynamic resource pricing according to the needs of users. Then, we propose a Cooperative Resource Sharing (CRS) Algorithm, which makes two user groups jointly purchase and share resources. The simulation results show that the strategy used in this algorithm can effectively reduce the unit price of RB and minimize the total cost of the system.

CP-BZD Repair Codes Design for Distributed Edge Computing

Shuangshuang Lu, Chanting Zhang and Mingjun Dai

0
In edge computing applications, data is distributed across several nodes. Failed nodes mean losing part of the data which may hamper edge computing. Node repair is needed for frequent nodes failure in edge computing systems. Codes with both the combination property (CP) and Binary Zigzag Decodable (BZD) are referred to as CP-BZD codes. In this paper, without adding extra checking bits, new coding constructions of CP-BZD codes are proposed to repair the failed node in distributed storage systems. All constructed codes can be decoded by the zigzag-decoding algorithm. Numerical analysis shows that compared with the original CP-BZD codes, our proposed schemes obtain better repair efficiency.

Computation Task Scheduling and Offloading Optimization for Collaborative Mobile Edge Computing

Bin Lin, Xiaohui Lin, Shengli Zhang, Hui Wang and Suzhi Bi

0
Mobile edge computing (MEC) platform allows its subscribers to utilize computational resource in close proximity to reduce the computation latency. In this paper, we consider two users each has a set of computation tasks to execute. In particular, one user is a registered subscriber that can access the computation service of MEC platform, while the other unregistered user cannot directly access the MEC service. In this case, we allow the registered user to receive computation offloading from the unregistered user, compute the received task(s) locally or further offload to the MEC platform, and charge a fee that is proportional to the computation workload. We study from the registered user��s perspective to maximize its total utility that balances the monetary income and the cost on execution delay and energy consumption. We formulate a mixed integer non-linear programming (MINLP) problem that jointly decides the execution scheduling of the computation tasks (i.e., the device where each task is executed) and the computation/communication resource allocation. To tackle the problem, we first derive the closed-form solution of the optimal resource allocation given the integer task scheduling decisions. We then propose a reduced-complexity approximate algorithm to optimize the combinatorial computation scheduling decisions. Simulation results show that the proposed collaborative computation scheme effectively improves the utility of the helper user compared with other benchmark methods, and the proposed solution method approaches the optimal solution within 0.1% average performance gap with significantly reduced complexity.

Session Chair

Yuan Wu (University of Macau)

Made with in Toronto · Privacy Policy · © 2022 Duetone Corp.