Summary
Software orchestrators like Kubernetes are growing in popularity with computer engineers for deploying and running complex software systems. Interestingly, there are now new technical standards being proposed for the telecom industry to begin utilizing software orchestration for the software that runs inside cellular networks. The telecom industry is currently transitioning from 4G to 5G technology. One of the central pieces of this development work is implementing a software orchestrator for 5G networks. This raises some questions about how and why the telecom industry will use software orchestration in their cellular networks. Software orchestration is a complex technology and it is challenging to develop an implementation of a software orchestrator.
Some important questions that this thesis addresses are: What do network operators need from this technology? Furthermore, telecom vendors, like Ericsson and Huawei, have developed their own versions of a 5G software orchestrator, which orchestrator should the network operators choose? Furthermore, we investigate what 5G is, why the telecom industry is developing software orchestrators for the 5G roll-out, and importantly, we determine the design requirements that the telecom industry has for these “5G orchestration systems”. We interpret and break down technical whitepapers from the industry, and we build a picture of the IT stack of upcoming 5G networks.
In our research, we find that software orchestration is being used to deploy and maintain complex software stacks such as software-defined networking (SDN) system that is central to 5G networks. We uncover some of the specializations needed in a software orchestrator for the telecom industry, such as modularity, high-availability, and specialized system integration. With this information, we make feature and design recommendations for 5G orchestrators, and we compile a list of criteria that network operators can use to assess and compare different 5G orchestrators.
Introduction
In Cloud 3.0 we will complement Swedish industry and research with that available in Finland and Portugal. This includes competence in Computer Science being used in ongoing projects, a mobile phone operator plus a rail authority/operator. 5G and Edge computing demands new software components in the telecommunications infrastructure that goes to the heart of distributed systems. Therefore we have identified partners able to bring in this knowledge to ensure a focussed, competent consortium, and hopefully successful project execution.
The problem statement
Cloud services have been transformed from bare-metal to virtual machines, to containers, and most recently, to serverless systems. Each of these execution abstractions has been accompanied by a new layer of centralized management, which have led to a complex environment that requires a lot of additional operation and infrastructure programming task in order to managing these layers and their respective execution contexts.
Figure 1: Cloud functionality and operations: Infrastructure, Containers, Platform, and Functions as cloud services.
In 2019, we are in the 3rd generation of cloud management platforms (CMPs): the 1st was virtual machines such as VirtualBox/VMware with limited or very proprietary management support, the 2nd with centralised resource managers like OpenStack or Kubernetes, and the 3rd generation offering serverless computing such as OpenFaaS or Amazon Lambda [1-5]. Different abstraction levels are convenient for the user, such as the bare metal server, machine, container environment, function calls as above, however they add system complexity, specifically management services for each abstraction. Existing resource managers are a poor fit for mobile edge computing (MEC), an important 5G component [6]. Implementing MEC using existing managers is possible, but it can introduce performance, reliability and security issues [7,16].
Fundamental issues
Centralised systems | Decentralised systems | ||
Pros | 1. Faster 2. Easier to manage 3. Systems already exist | 1. Adding resources is easier 2. Robustness 3. Scalability 4. Allow heterogeneity E.g. GPUs for AI apps. | |
Cons | 1. Single point of failure 2. Edge nodes are remote (Signalling Latency) 3. Updates tricky | 1. Potentially slow(er) 2. Synchronisation overhead 3. Control overhead (Messages contra signals) 4. Harder to design |
Table 1: Pros and cons of distributed systems design, does a middle ground exist?
Table 1 indicates the tussle between 2 computing paradigms, particularly within distributed systems philosophies. A full CELTIC-NEXT proposal will explore this question, a major European telecommunications company.
The mobile edge: A challenge to operating systems
Despite the promises for edge computing [21-22], challenges exist [7-8]. ClouldLets and Kubernetes Edge/Light rely on containerisation, virtualized switches and network functions, and even recent simplifications, e.g. universal nodes [23] may not be sufficiently fine grained. Synchronisation at high scales remains unsolved for the Edge, and at least two projects [17-18] are exploring conflict free data types (CFDTs) to resolve synchronisation issues, whilst maintaining data consistency in distributed systems [24]. Essentially some solutions are available in existing systems, however the most intrinsic problems lie in the field of distributed systems.
Fig. 2: A combined Edge and Cloud view for both operators and developers would be beneficial. Cloud 3.0 focuses on the developer needs (illustration by Nicolae Paladi, CanaryBit AB).
Mobile operators will have experience in telecom systems, which is quite different from Cloud systems, i.e. dealing with handover times, paging requests, dimensioning, and network failures. Just as how the growth of mobile data in availability, bandwidth, and responsiveness has helped enable the smartphone revolution, many actors in the telecom industry hopes that the deployment of 5G systems will enable new technology by improving mobile data even further and adding new ways to run software on mobile networks. Another major benefit of deploying 5G systems is the modernization benefits it provides to network operators. 5G technologies bring new innovations to the telecom industry, improving the quality of cellular networks, and making them easier and cheaper to run and maintain. 5G orchestration, is one of these innovations, making it easier for cellular network operators to run and maintain the software that provides their network.
NFC and MEC
NFV is a new technology associated with 5G that provides a platform for network operators to use SDN on their cellular network. Functions of a cellular network, such as network state and routing packets is run inside software containers on general-purpose computers, instead of on dedicated networking appliances. The development of complex NFV systems and maintaining them on a production network requires the use of software orchestration to automate their maintenance needs. A readable introduction to NFV can be found in [XX]. MEC is another new technology associated with 5G that allows for running edge computing applications on cellular network infrastructure. Software from network operators and customers can take advantage of edge computing by running directly on the cellular network that client devices are connected to. MEC allows for new technology use-cases by allowing for low-latency, high-bandwidth compute resources. As software comes from multiple sources and deployed in a distributed way, MEC is also a technology that requires the use of software orchestration [etsi-mec].
Telecom bases for the Edge : A history
In 1991 the Ameoba distributed operating system presented files, processes and devices as simple system capabilities [12].
One novel way forward is a cloud-native operating system inspired by this previous idea known as a single system image [10]. Additional system properties include i) all cloud services are fully distributed ii) all resources belong to a single resource manager iii) there are different communication mechanisms for different network “distances” [11].
From a programmability perspective WebAssembly is one alternative.
Although inspired by single system images, there are some important differences: i) There is no distributed kernel in Nefele, a single server Linux kernel is used as a component (Fig. 3 “Linux” above), which provides the virtual memory and thread management. In Nefele, operating system services like process creation and resource management are distributed, i.e. fine grained. ii) Processes obtain only as much virtual memory and as many threads as the local Linux OS provides iii) Applications need to be designed modularly and are not distributed automatically. Performance critical services e.g. virtual memory and thread management will remain local. Cloud resources have dedicated managers per resource type compute, network and storage allowing tenants to perform computational or communication tasks. Nefele shares some goals with contemporary container management systems, such as Kubernetes, Docker Swarm or Mesos [13-15], in that CPUs in the same data center are dedicated to the same task, the latter, like Nefele, is a fine-grained resource-sharing approach.
European operating and cloud systems are lagging behind US efforts. Cloud 3.0 assists large European actors in interesting and relevant new research ideas relevant to their business case and customers.
Scenario I: The ANIARA project
Digital transformation is ongoing in many areas of today’s society, which will impact many aspects of people’s lives via means such as smart cities, robotic, transportation, and next-generation industries. At the same time, the current centralized cloud infrastructure is not adequate to serve the transformation’s requirements. We believe that three technologies can come together to shape a new secure service and application platform; 5G, edge-centric compute & artificial intelligence. In this context, European industry has a good position in 5G networks, transportation and industrial applications, but need to strengthen the position in a secure cloud, data centre and artificial intelligence technologies to be at the front of the development.
The primary objective of the ANIARA project is to provide enablers and solutions for high-performance services deployed and operated at the network edge. To manage complexity, we need to take advantage of artificial intelligence to complement traditional optimisation algorithms. Currently, deep edge network nodes will be deployed at locations not prepared for the power requirements of edge-centric compute. To answer this, we need to analyse requirements and develop methods to minimize energy consumption.
Scenario II: Bicycle safety
–
Scenario III: train journeys
Many train commuters do not receive a stable Internet service. An Edge infrastructure should deliver lower latency, faster handovers and higher capacity at rail stations. Often high density and time critical, people at stations represent a typical crowd-like setting. On the move, people need dependable long range coverage and form a sparse environment, more suited to existing cellular communications. Both modalities are needed for the rollout of Edge systems.
Acknowledgements
Nicolae Paladi and Henrik Abrahamsson at RISE, Justin-Lex Hammerskjöld at KTH / SEB, Pontus Sköldström at Ericsson Research.
References
Commercial:
[1] VMware
[2] VirtualBox
[3] OpenStack
[4] OpenFaaS
[5] Amazon Lambda
Books
Papers
[6] ETSI, Cloud RAN and MEC: A Perfect Pairing, Feb. 2018.
[7] M. Satyanarayanan, Edge Challenges, HotCloud 2017.
[8] W. Shi, Edge computing: Vision and challenges, J. Internet of Things, 2018.
[9] M. Satyanarayanan, The emergence of edge computing, Computer, 2017.
[10] P. Healy, Single system image: A survey, J. of Parallel & Dist. Comp, 2016.
[11] W. John, et. al. Making Cloud Easy: Design Considerations and First Components of a Distributed Operating System for Cloud, HotCloud, 2018.
[12] S. Mullender, Amoeba: A distributed operating system for the 1990s.
[13] Cloud OS: Kubernetes, [14] Docker Swarm
[15] B. Hindman, Mesos: A platform for fine-grained resource sharing, 2011.
[19] A. Zavodovski, ExEC: Elastic Extensible Edge Cloud, Edge Systems, 2019.
[20] P. Enberg et. al, I/O Is Faster Than the CPU, HotOS, 2019.
[21] Y. C Hu, Mobile edge computing, A key technology towards 5G, 2015.
[22] ETSI, Developing Software for Multi-Access Edge Computing, Feb. 2019.
[23] The UNIFY project, 2013-2016.
[24] M. Shapiro, Conflict-Free Replicated Data Types, Springer, 2011.
[25] The Finnish 5GSafe project, 2015-2018.
EU Projects
[16] Cola
[17] LightKone
[18] SyncFree.
[19] 5GCoral project
Software
[20] Unikernels -Wikipedia red hat, link (note SICS was one of the first with the
[21] Nemesis link
[22] Open Computing
https://blog.cedriccharly.com/post/20191109-the-configuration-complexity-curse/
https://engineeringblog.yelp.com/2019/11/open-source-clusterman.html
Hobbies
https://blog.alexellis.io/raspberry-pi-homelab-with-k3sup/
Microks8
Management
https://media.ccc.de/v/ASG2019-134-senpai-automatic-memory-sizing-for-containers
https://blog.cedriccharly.com/post/20191109-the-configuration-complexity-curse/