Critical reroute: a practical approach to network flow prioritization using segment routing

Critical reroute: a practical approach to network flow prioritization using segment routing

Title	Critical reroute: a practical approach to network flow prioritization using segment routing
Publication Type	thesis
School or College	School of Computing
Department	Computing
Author	Redman, Simon
Date	2019
Description	It is widely recognized that reliable communications are a key element of a successful response to a disaster situation. To address this need, local and regional governments in all parts of the world have deployed dedicated communications networks for first responders. These systems are often prohibitively expensive, voice-only, and are needed only on rare occasions. It would be more cost-effective to use already-existing networks or to deploy networks for shared use. However, due to the sudden increase in demand or physical failure caused by the disaster, shared networks may become overloaded. In such situations, it would be desirable to prioritize traffic flows belonging to public safety applications over others. Solutions such as priority queuing and differentiated services provide partial answers to that goal but leave other problems unsolved. This work presents a novel solution using Segment Routing and a Genetic Algorithm optimizer to minimize the impact of network overload on critical traffic flows. The results show that these methods can reroute flows using a single midpoint such that the total network overload is reduced compared to traditional shortest-path rou
Type	Text
Publisher	University of Utah
Subject	network flow prioritization; priority queuing; differentiated services; disaster response
Dissertation Name	Master of Science
Language	eng
Rights Management	© Simon Redman
Format	application/pdf
Format Medium	application/pdf
ARK	ark:/87278/s6qz8c8v
Setname	ir_etd
ID	1714205
OCR Text	Show CRITICAL REROUTE: A PRACTICAL APPROACH TO NETWORK FLOW PRIORITIZATION USING SEGMENT ROUTING by Simon Redman A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Master of Science in Computer Science School of Computing The University of Utah August 2019 Copyright c Simon Redman 2019 All Rights Reserved The University of Utah Graduate School STATEMENT OF THESIS APPROVAL The thesis of Simon Redman has been approved by the following supervisory committee members: Jacobus Van der Merwe , Chair(s) 16 May 2019 Date Approved Robert Ricci , Member 16 May 2019 Date Approved Sneha Kumar Kasera , Member 16 May 2019 Date Approved and by Ross Whitaker , Chair/Dean of the Department/College/School of Computing and by David B. Kieda , Dean of The Graduate School. ABSTRACT It is widely recognized that reliable communications are a key element of a successful response to a disaster situation. To address this need, local and regional governments in all parts of the world have deployed dedicated communications networks for first responders. These systems are often prohibitively expensive, voice-only, and are needed only on rare occasions. It would be more cost-effective to use already-existing networks or to deploy networks for shared use. However, due to the sudden increase in demand or physical failure caused by the disaster, shared networks may become overloaded. In such situations, it would be desirable to prioritize traffic flows belonging to public safety applications over others. Solutions such as priority queuing and differentiated services provide partial answers to that goal but leave other problems unsolved. This work presents a novel solution using Segment Routing and a Genetic Algorithm optimizer to minimize the impact of network overload on critical traffic flows. The results show that these methods can reroute flows using a single midpoint such that the total network overload is reduced compared to traditional shortest-path routing while avoiding unnecessarily long paths and taking priority of flows into account. CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi CHAPTERS 1. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Segment Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Genetic Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 4 THE CRITICAL REROUTE DESIGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. 3.1 Critical Reroute Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Route Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4. THE CRITICAL REROUTE IMPLEMENTATION . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 4.2 4.3 4.4 4.5 4.6 5. 20 21 21 24 24 25 EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.1 5.2 5.3 5.4 6. Route Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operator Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topology Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Live Traffic Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SDN Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SDN Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Critical Reroute Can Leverage Multiple Paths . . . . . . . . . . . . . . . . . . . . . . . . . . Critical Reroute Protects Priority Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Critical Reroute Scales to Realistic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Critical Reroute Responds to Network Changes . . . . . . . . . . . . . . . . . . . . . . . . 26 26 28 31 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 LIST OF FIGURES 3.1 Architecture Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Critical Reroute Controller Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Steps in the Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Genetic Algorithm Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1 Operator Portal Showing Live Information About the Network . . . . . . . . . . . . . 22 4.2 Detailed Implementation Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1 A Simple Network Based on the Testbed Being Deployed in Ammon, Idaho . . 27 5.2 Overload Factor vs. Iterations for the rf3967 Topology . . . . . . . . . . . . . . . . . . . . 30 5.3 Overload Factor vs. Iterations for the rf1239 Topology . . . . . . . . . . . . . . . . . . . . 30 5.4 Graph Showing Recovery Time From Network Failures for rf1239 . . . . . . . . . . 32 5.5 Graph Showing Recovery Time From Network Failures for rf3967 . . . . . . . . . . 33 ACKNOWLEDGEMENTS Thanks to the Flux Research Group for supporting this work. This work is supported in part by the National Science Foundation under grant number 1647264. CHAPTER 1 MOTIVATION It is widely recognized that reliable communications are a key element of a successful response to a disaster situation. In order to address this need, many governments in all parts of the world have deployed dedicated communications networks for first responders [25]. However, these systems are often prohibitively expensive. They are only needed on rare occasions, so there is little motivation to gather the necessary investment. In the United States, the Government Emergency Telecommunications Service (GETS) [2] and Wireless Priority Service (WPS) [3] enable prioritized use of the commercial landline and cellphone networks when needed, thus avoiding the cost of dedicated infrastructure. Voice-only calls are an important tool, but increasingly, emergency services would like to add reliable data communications to their toolbox, which has led to the creation of the First Responder Network Authority (FirstNet) [5]. However, this requires the installation of dedicated infrastructure and is still under deployment. Public safety’s takeup of FirstNet has been limited but is gaining momentum [44]. Just as with phone networks, developing the technology to reliably share already-deployed data networks would significantly and immediately benefit the ability of public safety to execute their mission. We present a practical approach to prioritizing data traffic in the presence of network overload using single-midpoint Segment Routing. This solution operates on a single autonomous system with a relatively simple core network which supports segment routing and programmable traffic classifiers at the edges of the network for installing routing rules. Live traffic and topology information is captured from the network and used to optimize the network in real time. Midpoint selection is an NP-hard problem, as discussed in the paper “A Declarative and Expressive Approach to Control Forwarding Paths in Carrier-Grade Networks” (DEFO) by Hartert et al. [22]. Therefore, it is difficult to establish any guarantees about compu- 2 tation time of an optimizer or how a solution compares to the optimal. Nevertheless, we have designed a genetic optimizer which can provide useful solutions while running quickly enough to be practical. Real-world evaluation is ongoing, and initial results from the Emulab testbed environment [45] show that the system meets its design goal of real-time prioritization of network traffic. Calculation of priority-protecting routes happens on a timescale such that there would be minimal disruption to real-world applications. This work contributes the following: • Design and implementation of an optimizer which computes routing rules to protect priority flows. • Sufficient framework to run this optimizer in a real-world scenario. Thesis Statement: It is possible to prioritize traffic on a packet-routed network such that critical flows are protected from network overload CHAPTER 2 BACKGROUND This chapter provides an introduction to background material for several topics with which the reader might not be familiar. 2.1 Segment Routing Segment Routing [16] is a modern source-routing paradigm which encodes a list of instructions, called segments or segment identifiers (SIDs), and sends them along with the packet. The key advantage of this approach is that Segment Routing can define arbitrarily many routes without adding complexity to the network core. All the necessary routing information is carried along with the packet, so custom routes can be defined at the network edge without adding entries to a core router’s memory tables. A segment can be a node, which instructs that the packet should be steered towards a particular node, an adjacency, which instructs that the packet should be forwarded using a particular link or interface, or an action, which instructs that some custom action - specified in advance by the operator - should be taken. This allows Segment Routing to be very scalable and reliable, while being expressive enough to achieve complex traffic engineering goals. Segment Routing can be applied to an IPv4 network by adding Multiprotocol Label Switching (MPLS) rules. In an IPv6 network, a defined Segment Routing Header (SRH), also known as Segment Identifier (SID) can be applied, which carries all three kinds of segments as properly-encoded IPv6 addresses. IPv6 segment routing is often called SRv6. Support for either of these options requires some minimal modification of the core. Alternately, IPv6 segment routing can be implemented without an SRH by simply wrapping IPv6 packets in IPv6 packets, with the destination field as an IPv6 address which doubles as an encoded segment. In such an implementation, only the target of adjacency or action SIDs need to support segment routing. All other routers simply perform standard 4 IPv6 forwarding. 2.2 Genetic Optimizer Genetic Optimizer [13] refers to a class of cost-function minimization algorithms based on the theory of biological evolution. To start, a number of ‘genomes’ are generated, either at random or based on some heuristic, representing a proposed solution. Then, at every step, the genomes are tested against the cost function to determine their ‘fitness’, with the fittest solutions being given the best chance to be selected to reproduce. A new generation is created by choosing two representations and selecting partial solutions from each of them, with a chance of mutation for each solution component. The process is then repeated until a desired fitness is reached or a certain number of iterations has been run. Finally, the best solution in the gene pool is returned. Genetic Optimizers have the rare advantage among minimization algorithms of working on discrete solution spaces. This is a requirement for this work, since the midpoint for a flow must be some discrete value representing a physical router. The particulars of our Genetic Optimizer are discussed in Section 3.2.3 2.3 Related Works Traffic engineering is not a new goal, and the proliferation of works solving different aspects of this problem reflect its importance. We present some recent efforts in this area, and compare each with our contributions. 2.3.1 Optimizing Restoration with Segment Routing Optimizing Restoration with Segment Routing [21] is similar work in this space which inspired this work. Fundamentally, the problem solved is very similar as they work to eliminate overloaded links by describing the situation as a linear programming problem. However, in order to arrive at a form suitable for a linear programming problem, the authors had to relax some constraints, such as allowing a flow to be divisible across infinitely-many midpoints. Such a relaxation is very difficult to implement in the real world. Our work requires that a single midpoint be assigned to each flow. We also do not assume that there is sufficient bandwidth in the network to allow for all flows to travel 5 unimpeded. These constraints more accurately model our application to today’s deployed networks. 2.3.2 Declarative and Expressive Forwarding Optimizer Declarative and Expressive Forwarding Optimizer [22] is a solution for applying SoftwareDefined Networking (SDN) to carrier-grade networks. They describe a general-purpose optimizer which can be run on individual flows to build segment-routed midpoints to achieve operator-specified goals. They use reasonable assumptions about a network so that, like our work, it could be applied today. Their evaluation shows that their solver is very convincingly able to solve the targeted problem of MinMaxLoad. However, their optimizer appears to work best when providing midpoints for only a handful of flows, while our optimizer is designed to have full control over the network so that it can, for example, move low-priority traffic to already-overloaded links. 2.3.3 FirstNet First Responder Network Authority (FirstNet) [5] is an initiative by the US Government, authorized by the Middle Class Tax Relief and Job Creation Act of 2012 [12], to build an LTE network to support the unique requirements of public safety. FirstNet offers extremely high reliability and prioritization of public safety traffic [6] to ensure capacity is available when needed, as well as other services uniquely useful to public safety, such as group calling and messaging, location-awareness, and “proximity services”, enabling communications even as the networks begins to break down [25, 26]. The FirstNet project naturally has some relation to our work, since they are also focused on network reliability for public safety. Since FirstNet uses a modern Evolved Packet Core (EPC), our work could be applied to their network to further prioritize internal traffic to realize their service requirements. 2.3.4 Adaptive Routing Using OpenFlow There are several earlier efforts to use OpenFlow to dynamically control traffic flows in response to network conditions. For instance, [27] attends to the problem of ensuring video playback does not freeze in case of network congestion, and MediFlow [23] works in a medical setting to replace an overprovisioned core router with several less expensive 6 OpenFlow routers, which dynamically reroute traffic around overloaded core links to handle usage spikes. While these works consider a problem space similar to ours, our solution is distinct by leveraging segment routing, reducing the resource burden on core routers by keeping per-flow information at the edge, and by using an automated optimizer to balance the competing resource demands on the network. 2.3.5 Data Center Route Prioritization Much work has gone into developing optimized networking protocols for datacenter networks to support the ever-increasing need for computation and data storage. PASE [32] and Juggler [18] are two recent examples in this area, both of which make use of flow prioritization and give good performance in their domain. However, low-latency datacenter networks, where traffic might pass through only one or a handful of switches, serve fundamentally different goals from wide-area networks. Therefore, despite leveraging similar ideas and solving similar problems, solutions to datacenter-specific use-cases are not broadly applicable to wide-area networks. 2.3.6 Network Flow Maximization Conceptually, the problem being solved seems in many ways similar to the muchstudied Network Flow Maximization (NFM) problem, in which the goal is to assign routes through the network such that as much traffic as possible is able to travel between a source and a sink. One recent example is [19] which also uses a genetic optimizer to approach this problem. However, in the general form, NFM is only concerned with a single source and single sink, while we seek to simultaneously ensure a specific amount of bandwidth between many such source/sink pairs in the network, all while taking flow priority into consideration. 2.3.7 Segment Routing Research Software Resolved Networks [30] is a recent work leveraging IPv6 Segment Routing in an enterprise environment to let end-host applications have an active role in selecting their traffic’s path through the network. This contrasts with our work where we assume the end-hosts have no knowledge of the core network and we assign paths at the edge of the core. 7 2.3.8 Priority Queue Forwarding Before SDN paradigms revolutionized network programability, most routers supported some form of priority queuing. One proposed IETF standard is DSCP [36], which allows the operator to encode packet classification in the IP header which might result in prioritized forwarding in the network. The simplicity of this design retains a large benefit of segment routing by avoiding complexity in the core network, but is not expressive enough to solve all problems. Consider a situation in which a large amount of traffic of equal priority arrives at a chokepoint but alternate paths are available. Without additional mechanisms, priority forwarding cannot avoid packet loss, while our solution would leverage those alternate paths. However, in case no parallel routes exist, priority forwarding is a necessary part of a complete solution to ensure lower-priority flows are dropped first. CHAPTER 3 THE CRITICAL REROUTE DESIGN Critical Reroute is an ecosystem which operates on a single autonomous domain to minimize the volume of critical traffic flow (the amount of traffic associated with a highpriority application) on overloaded links. This is achieved using segment routing by assigning a single routing midpoint to every flow. To realize its policy, Critical Reroute assumes that there is some programmable traffic classifier on the ingress routers which add SIDs to incoming packets. The traffic classifiers receive commands from the controller describing traffic classes and which SIDs to install. The core of the network must only support segment routing and might otherwise be very simple. By only requiring segment routing support, this low-complexity design allows the core network to remain simple while supporting many use-cases. For instance, consider a 4G LTE/EPC network operator who wants to prioritize the latency-critical traffic required for voice calls over the less sensitive data needed for buffered video streaming. Alternately, consider a municipal network which would like to support its first responders by prioritizing the video stream of an incident back to the operations center over the day-to-day traffic which is carried on the same network. Critical Reroute can assign an SID to any packet which the Traffic Classifier can match. For the implementation discussed in Chapter 4, we assume a traditional source-destination pair, but that is not an inherent requirement of the design. 3.1 Critical Reroute Architecture At the highest level, Critical Reroute has two components, shown in Figure 3.1. The Critical Reroute Controller is responsible for producing segment-routed paths for all flows, which minimize the amount of bandwidth on overloaded links, while the Critical Reroute Traffic Classifier is given directions by the controller to implement the policy. Figure 3.2 shows a logical view of the internal components of the Controller. In the 9 Figure 3.1: Architecture Diagram 10 Route Planner Operator Input/Output Optimized Midpoints Network Information Traffic Matrix ______________ ______________ ______________ ______________ Network Manager Live Topology Flow Information Routing Directives Network Dataplane Figure 3.2: Critical Reroute Controller Components 11 southbound direction, the Controller’s Network Manager communicates with the network to collect live topology information, useful when the network might be changing due to node or link failure. Once the Route Planner has computed optimized midpoints, those are processed by the Network Manager into actionable directives and pushed to the network. Flow information, such as actual bandwidth usage, is also collected by the network and delivered to the controller. To compute optimized midpoints, the Critical Reroute Controller contains a Traffic Matrix of flows annotated with the priority and bandwidth allocation of each flow. Flows are populated by live information, with the option of the operator defining static flows with dedicated bandwidth allocations. This Traffic Matrix and the current topology information are the inputs to the Route Planner. In the case that human intervention is required, such as defining flow priority match rules or inputting initialization information, the Controller has a northbound Operator Interface. This interface also provides information back to the operator about the state of the network. 3.2 Route Planner The core component of the Critical Reroute system is the route planner. The route planner takes as inputs a graph of the physical topology and a traffic matrix annotated with flow bandwidth requests and priority and outputs a list of midpoints, one for each flow, which minimizes the amount of overload on links carrying critical traffic. While the primary design goal of the Route Planner is to provide optimal routes, an important consideration is that it must run quickly enough that it can adapt to changing circumstances. For instance, if nodes are removed from the topology view or new flows are added to the traffic matrix, the route planner should be able to respond to those changes quickly, such that services using the network are disrupted as little as possible. It should also never return a worse solution than using pure shortest-path routing. We elect to generate a solution with only a single midpoint per flow for the practical reason that the search space involving one midpoint per flow is vastly reduced compared to any additional midpoints, allowing the solver to run in a realistic amount of time. Simultaneously, one midpoint gives the controller good control over the path the traffic 12 takes based on our experience as well as the experiences discussed in prior work [21]. After receiving information about the current topology and requested traffic allocations, the route planner begins to compute midpoints. Since the goal of this work is to find the paths which result in the least overload, we are by definition solving some kind of optimization problem. Section 3.2.1 details the design of the cost function and Section 3.2.3 discusses our optimization algorithm. 3.2.1 Cost Function In order to define “better” or “minimum”, we need to be able to quantify the usefulness of a solution. In optimization problems, this quantifier is called the cost function. Our cost function is three-dimensional, with each dimension acting as a tiebreaker for the one before it. The factors considered are: • Total amount of critical traffic flow on overloaded edges • Sum of the latency of each flow multiplied by its priority value • Total number of extra edges used in a solution compared to shortest-path The first dimension is essentially a restatement of the problem goal, so it is a natural fit for our solver. It is computed by, for each flow, computing the shortest-path route from the flow’s start to the elected midpoint and the shortest-path from the midpoint to the flow’s end. Then, for every link in that segmented path, the amount of bandwidth in use on that link is increased by the amount requested by the flow. If the priority value of a flow is greater than the highest priority value currently recorded for the link, the priority value of the link is set to that of the new flow. After the bandwidth usage on every link is computed, the overload-priority factor of the network is calculated by, for every link in the network, checking whether the amount of bandwidth requested exceeds the amount of bandwidth available on the link. If a link has enough bandwidth to support all of its allocations, there is no problem, and it does not contribute to the cost of the solution. If the link does not have enough bandwidth, we calculate the overload as the amount of excess bandwidth requested, then multiply that excess by the priority value recorded for the link. Thus, if highly-critical information is traveling on a heavily overloaded link, the contributed cost will be very high, while 13 if low-criticality information is traveling on a slightly overloaded link, the contributed cost will be lower. The cost of the network is calculated by taking the sum of the cost contributed by every link. By itself, the single dimension of the cost of the network can be solved and delivers solutions which, on paper, meet our design goal. However, in cases where the network was not significantly overloaded, we observed that the solver would sometimes generate long and winding paths. Provided it did not generate solutions with overloaded links, the solver was not penalized for such solutions. However, in the application of this work, latency might also be an issue. Therefore, we add another term to the cost function which takes this into consideration. To help combat this, we add a second dimension to the cost function which directly measures the latency of every flow based on the links it traverses. The total latency is computed for each flow, and then multiplied by that flow’s priority value. The weighted latency factors for each flow are added together, and the resulting latency term is used as a tiebreaker for the first term. Provided it does not result in a higher cost for the first dimension of the cost function, the optimizer will select a midpoint assignment which reduces the total latency of all flows in the network. Finally, since in the real world the delay added by each router might be a concern, the third term of the cost function measures the total number of links used by the solution as compared to the shortest-path. It is expected that the segmented path will be at least as long as the shortest path, since routing via a midpoint is expected to diverge from the shortest by adding links. However, in cases where the intra-AS routing protocol has been configured to avoid certain links, it may be the case that the segmented path is shorter. Regardless, the path stretch is computed by subtracting the number of links used in the shortest path from the number of links used in the segmented path, and that result is returned. With this three-level cost function, the solver generates sane and useful solutions. In cases where it can move a midpoint to reduce the overload of a solution, it does so. Additionally, if it can move a midpoint such that the particular flow travels a faster path, it does so. In practice, while optimizing a solution, the optimizer might move a midpoint which reduces the overload, then in subsequent iterations, it might move that midpoint 14 several times as it finds a shorter path. 3.2.2 Priority Value The priority value of a flow can be any integer or floating-point value greater than or equal to zero. If a priority of zero is assigned to a flow, the optimizer will treat the flow as totally unimportant, and any congestion it experiences would not be optimized, though it still contributes to the used bandwidth of a link. Priority values should be considered relative to other priority assignments in a solution, where higher priority means the flow is more important. Operators can access flow information and assign priorities via the Operator Input/Output interface shown in Figure 3.2. If several flows have similar values, the optimizer might sacrifice a slightly-higher priority flow to let a few lower-priority flows travel unimpeded, while if some flows have vastly larger priority values, the optimizer is much less likely to allow those to travel on overloaded links. 3.2.3 Optimizer Having now defined the cost function, we can address the problem of how to find a “better” or “best” solution. The most important consideration is that the solution space is discrete. The midpoint for any flow can be exactly one core router and flows are not divided into fractional parts, as was done in [21]. Additionally, since [21] uses a very similar cost function which they present as a linear programming problem, we can see that the polytope for this solution would not be integral. Thus, we need an optimizer which works directly on discrete solution spaces. There are precious few such algorithms, but one class which works well in many cases is Genetic Optimizers [13]. Genetic Optimizers are easy to use for nearly any problem, but care must be taken to properly fit the problem, otherwise the resulting solutions will be far from optimal and might take a long time to compute. As might be guessed from the name, Genetic Optimizers take inspiration and vocabulary from the theory of biological evolution. At all times, we have a “gene pool” of possible solutions, called “genomes” or “individuals”. Each individual is composed of “alleles” which represent the smallest possible unit of a partial solution. At each iteration of the 15 algorithm, called a “generation”, individuals from the gene pool are selected, evaluated based on their “fitness” as a solution, then fit solutions are “mated” using a crossover function, the resulting “children” are potentially mutated, a new gene pool representing the new generation is constructed, and the whole cycle repeats until a specified number of generations or until a specified fitness threshold is reached. The challenge with using a genetic optimizer is selecting the parameters which give the best solution with the least computational time. For instance, how should an allele or an individual be defined? How large should the gene pool be? What crossover and mutation operator should be used? How often should a child solution be mutated? Since the parameters of a genetic algorithm could be compared based on some metric, such as final solution fitness or computational speed, it stands to reason that we might need another optimization algorithm to optimize our genetic optimizer! In fact, evolving a genetic algorithm to solve the maximum flow problem is researched by the work in [19]. In practice, the parameters are usually selected based on human experience and trial-anderror. Figure 3.3 shows a block diagram of how the steps of a genetic algorithm fit together. These steps are discussed in detail in Section 3.2.5 through Section 3.2.9. 3.2.4 Individual Definition In this work, an allele is a single midpoint for a particular flow, and an individual is a list of such midpoints, one per flow. A population is a list of individuals. The interworkings of these items are shown in Figure 3.4. A different definition of allele, such as one which defines more than one midpoint per flow, would also produce a valid solution, but doing so would vastly increase the dimensionality of the solution space. We found that a single midpoint gives useful control over a network flow, while also being computationally efficient. We also found that a relatively small gene pool of 10 individuals gave good solutions while being quick to compute. A larger gene pool would slow the rate of genomes converging, thus potentially finding a more optimal assignment, but also increases memory usage and requires more “mating” to build the next generation. 16 Figure 3.3: Steps in the Genetic Algorithm Figure 3.4: Genetic Algorithm Components. Image components from [1]. Used with permission. 17 3.2.5 Selection Algorithm The first step in an iteration of the Genetic Optimizer is selecting parents which will be used to produce a new individual of the next generation, as shown by the Select Parents step in Figure 3.3. Parents are selected using tournament selection. In tournament selection, a small number of individuals are selected from the gene pool and evaluated, and the most-fit individual is selected as the first parent, then the tournament is repeated for the second parent. This selection strategy has the advantage of not needing to evaluate every individual in the gene pool, so it works much more quickly, while still statistically being very likely to find highly fit parents when repeated enough times to generate the whole gene pool. Several interesting selection methods exist which require evaluating all individuals. For instance, it is possible to just select a handful of the most-fit solutions, then use those to produce the next generation. While simple, this solution will tend to miss out on less-fit solutions which nonetheless have a useful mutation. Allowing these less-fit solutions some chance of passing on to the next generation is important to allow the optimizer to find the best solution. In order to allow this, a natural answer is to randomly sample from the set of solutions, giving each a chance to be selected proportional to its fitness. The random sampling can be implemented in more- or less-efficient ways. Whitley’s Genetic Algorithm tutorial [13] recommends Stochastic Universal Sampling, which can be thought of as spinning a roulette wheel with N evenly-spaced pointers, where N is the number of items being selected. 3.2.6 Crossover This work uses a two-point crossover operator. Two alleles are randomly selected and a child is initially constructed by copying the region of alleles between the two selected points from one parent, while the region outside is copied from the other parent. The intuition behind any crossover in a genetic optimizer is that the solution might be structured such that nearby alleles somehow influence each other and keeping such groups together will result in a better solution. For this work, the solution is by default unordered, meaning there is not much gain from the crossover operator. However, future work could experiment with changing the ordering of the incoming traffic matrix, thus giving the solution some meaningful structure which benefits from crossover. Intuitively, grouping 18 flows by some property, such as source, destination, bandwidth requirement, or priority, might be useful for crossover so that groups of co-optimized flows stay together. Once a new individual has been generated, the entire process repeats from selecting new parents to generate another individual until the entire new population has been generated, shown by the inner back-edge in Figure 3.3. 3.2.7 Mutation When building the next generation, we use a 25% chance that a child will be mutated, and a 10% chance for every allele in the child that the particular allele will be mutated. A too-high value for either of these probabilities would make the solution unstable and non-converging, while a too-low value would make the solution take longer to converge. By experimentation, we found that these values work well. When an allele is selected to be mutated, the midpoint that it represents is moved to a neighboring router uniformly at random. We experimented with selecting a random router from the entire network, but the unstructured nature of such selection meant that it takes a long time to generate useful solutions. The downside to moving to a neighboring router is that the solution search is constrained, so we lose the guarantee of exploring the whole solution space and might get stuck in a local optima. One compromise which remains for future work is to non-uniformly randomly select a new midpoint from the set of all routers. Instead of limiting selection to only neighbors of the current midpoint, favour selecting a router which is closer, so that neighbors will more often be chosen, but still have some chance of selecting far-away midpoints. This avoids being stuck in the case where, for example, a neighboring midpoint is significantly worse than the current midpoint, but the neighbor’s neighbor is a better choice. 3.2.8 Initial Population The initial population selection is another important parameter to genetic optimizer success. In theory, it does not matter how the population is initially seeded: over time, the genetic optimizer should converge towards the optimum. However, in practice, having a good heuristic for initial population generation makes the optimizer converge much more quickly. For this work, we build an initial individual by selecting the router which is the midpoint of the shortest path, then copy that individual enough times to fill the gene pool. 19 This seems to be a useful heuristic, and the optimizer produces results much more quickly than randomly assigning midpoints to the initial individuals. However, we think that this assignment is a local optimum and initializing the optimizer to this point might make it difficult to sample a wide variety of other possible solutions. 3.2.9 Termination In general, it is not possible to determine whether a particular solution is the global optimum. Often, this problem is solved by detecting when the optimization algorithm stabilizes and assuming that the found optimum, whether local or global, is “good enough”. Another solution is, given a limited time budget, find the best solution possible. Given this work’s goal of attempting to optimize a live network, we opt for a time-limited approach. Since, at any given time, the individuals in the population are valid midpoint assignments, every time the optimizer finds a new best assignment, it can be reported immediately to the Network Manager. This way, it is possible to continuously improve the efficiency of the network without needing to wait for the optimizer to terminate. This step is captured in Figure 3.3 as Report Best Individual, at the end of an iteration, just before starting the next generation of the population. CHAPTER 4 THE CRITICAL REROUTE IMPLEMENTATION 4.1 Route Planner All components of the Route Planner are implemented in Python 3. The main work of the Route Planner is the Genetic Optimizer which is implemented using the DEAP evolutionary computation framework [17]. We use the NetworkX [20] library for handling and manipulating network topology graphs, and we use the NetDiff [7] library for handling the sharing of NetworkX graphs. 4.1.1 Asymptotic Analysis At its core, the Route Planner is essentially a loop which runs for a specified number of iterations or until a specified amount of time has elapsed, the creation of new midpoint assignments, and the calculation of the cost function for each proposed solution. Of these, the most asymptotically-expensive part is the cost function calculation. Let P be the genetic algorithm’s population size, F be the number of flows in the Traffic Matrix, and L be the number of links in the network. For each p member of the P, we look at each of the F midpoint assignments in p. For every midpoint assignment f in F, we look at every link used by that flow to determine how much load is on that link which, on average, should be log( L) links. This computes the load on every link at an asymptotic cost of O( F ∗ log( L)). Finally, we take the summation over every link of the overload on that link at a cost of O( L) and the summation for every flow of the latency of that flow at a cost of O( F ). This gives us a final cost per-individual of O( F ∗ log( L) + L). However, for a normal network, there are likely to be many more flows than links, meaning the first term will dominate. Thus, the complexity of the cost function for a single individual is O( F ∗ log( L)) and the complexity per-iteration of cost function updates is O( P ∗ F ∗ log( L)). It is possible to optimize this cost function update by observing that child solutions are 21 similar to parent solutions, so the child’s cost function result is likely to be similar to the parent’s cost function result. The idea, then, is to save partial computations of the parent’s cost function, then do minimal work to update it for the child’s cost function. In theory, this should reduce the asymptotic complexity by reducing the number of midpoints per solution which need to have their contributions to the cost function re-calculated by a factor of their probability to have changed. The Big-O version of that cost function would be O( P ∗ F ∗ m ∗ log( L)), where m is the probability that some allele would be changed as the result of mutation or crossover. This should result in significant cost savings. To test this, I implemented both styles of cost function in Python and noticed that, for my test cases, there was a factor of 6x speed improvement of the updating cost function compared to the cost function which does everything from scratch. However, overall both of these implementations are a factor of 2-3x slower than the original implementation because both relied heavily on short helper function calls which vastly slowed down the program. The original implementation was written as one long function call, so did not suffer from this. I had hoped Python would inline the small helper functions to avoid this performance overhead, but that apparently was not the case. Therefore, for the purposes of this work, all results presented are run using an implementation of the cost function which computes all costs from scratch every time. 4.2 Operator Portal Figure 4.1 shows an example of the information the operator can access using Critical Reroute. The portal is designed to show the operator live information about the network as well as upload routing information if required. This application communicates over the SDN controller’s northbound REST API, labeled 1 in Figure 4.2. 4.3 Topology Discovery In order to do its job, the route planner needs a view of the network topology. This could be provided statically by the network operator. However, manual provisioning does not work well with Critical Reroute’s goal of adapting to fast-moving disaster and failure situations. Therefore, we rely on automatic topology detection based on observing routing protocol packets. 22 Figure 4.1: Operator Portal Showing Live Information About the Network Figure 4.2: Detailed Implementation Diagram 23 In the context of SDN, it would usually make sense to use a built-in discovery protocol, such as those discussed in [37]. However, we make no assumptions that the core of our network is SDN-enabled beyond its ability to correctly forward IPv6 Segment Routing packets. With a stand-alone solution, we need only be able to insert a hardware- or softwarebased packet sniffer and listen to routing protocol updates. Our sniffer implements OSPF, but any other similar protocol could be used. Our automatic topology discovery tool takes inspiration from [41], though our implementation varies from their suggestions. Instead of aggregating OSPF Link-State Advertisements (LSA) updates at a centralized point, every sniffer aggregates its own LSAs and locally constructs a graph from those. This is more useful for our work because it allows the centralized controller to only concern itself with the high-level idea of a graph rather than the low-level routing protocol packets. We lose the functionality of aggregating routing updates for future introspection. This was a fundamental goal of [41], but is not necessary for our work. The network is provisioned with one or more OSPF sniffers which each individually build a local view of the network topology. The local views are sent to the SDN controller via its OSPF collector interface, labeled 2 in Figure 4.2. In case the network experiences a partition, the controller is responsible for merging disparate graphs. The OSPF sniffer is based on the Python Routeing Toolkit [31] library. This library has support for IPv4 OSPFv2, but since this work focuses on IPv6, we have modified the OSPFv2 module to support “just enough” OSPFv3 to support our sniffer. The PyRT library does not work with Python 3, and the cause of this incompatibility has not been investigated, so the OSPF sniffer runs in Python 2. Although our sniffer does a good job at detecting broken routers and links, it is not able to detect metadata of the network such as how much bandwidth is available on a link or whether there exist invisible nodes such as layer-2 switches. Such data can be provided to the controller by adding a static topology map with extended information. The original implementation of the OSPF sniffer was part of the earlier work in the Flux Research Group [34], though significant modifications were made as part of this work to make the sniffer more reliable. 24 4.4 Live Traffic Detection Live network information is a necessity for any traffic engineering endeavor, so there is a significant volume of prior work discussing how it can best be collected [9, 14, 38, 39]. One easy, scalable, and practical solution is to use NetFlow data, as tested by prior work in [15]. NetFlow [11] is a commonly-supported way of collecting network flow information by keeping per-flow statistics, such as flow start time, flow end time, and bandwidth usage at each NetFlow-supporting router, then exporting it to some centralized collector. For this work, we used the open-source Linux iptables Netflow module [8] for the netflow collectors, and the open-source SiLK collector toolkit [10]. We have implemented a small custom module in SiLK to process the collected NetFlow records into a Traffic Matrix suitable for Critical Reroute’s optimizer based on an operatorconfigurable rolling time window of samples. The size of this windows is dependent on the use case, but 15 seconds seems to give useful measurements while avoiding too much hysteresis in route planning. In order to avoid duplicate counting of the same packet, a common method use in practice is to configure the NetFlow recorders to only look for traffic on the ingress interfaces of the core network [33]. Since a packet should only enter a particular network once, this avoids the question of which NetFlow collector is responsible for collecting data of a particular packet. 4.5 SDN Implementation IPv6-native Segment Routing is a very new concept, with the relevant IETF standard still in draft status [43]. As such, there are no publicly-available implementations of common SDN tools which support SRv6. In spite of being only a draft, prior work [29] has already implemented Linux kernel support for SRv6 packet creation and forwarding. Earlier work at the Flux Research Group [35] produced a set of patches for Open vSwitch (OVS) [4] to extend its OpenFlow support with SRv6 commands, but these patches are based on an older version of OVS which is not compatible with modern Linux kernel versions, and the patched version is not able to perform at line rates above 1Gb/s. 25 While it should be possible to upgrade a modern version of OVS with high-performance SRv6 commands, for the purpose of this work, a simpler SDN implementation called nlsdn was chosen based on Linux NetLink commands [24]. The resulting network programming interface gives sufficient flexibility to be useful while being able to sustain line rates of at least 10Gb/s. 4.6 SDN Controller Once the controller has received the results from the optimizer, it needs to instruct the network ingress routers to add the segment routing rules which implement the new policy. Since segment routing is relatively new, there are no standardized protocols for delivering these rules to the network devices. The authors of [28] have implemented handling for IPv6 segment routing in the Linux kernel, as well as command-line tools for adding headers to incoming packets. These are awkward to use from a centralized remotecontroller. Previous work at the Flux Research Group [34, 35] has defined an extension for OpenFlow which provides an action which pushes segment routing headers. Since that work is open-source, we use their extended OVS and extended Ryu controller. The central controller elements are implemented in Python using the Ryu SDN controller framework [40], using the extensions from the earlier work in [35]. It is able to send OpenFlow or nlsdn commands to the controlled routers in the network, as shown by the connections labeled 3 in Figure 4.2 Ryu provides a built-in REST library which we have used to implement our REST API, which supports both the northbound user-facing applications as well as the OSPF collector. The OpenVSwitch extensions from [35] are used directly with no modifications needed. CHAPTER 5 EVALUATION We have evaluated Critical Reroute’s route optimizer on synthetic and inferred topologies in a variety of bandwidth use cases. We evaluate our results as compared to pure, un-optimized OSPF as well as against related works where such comparisons can be meaningfully made. 5.1 Critical Reroute Can Leverage Multiple Paths One benefit of customizing paths per-flow is that, when multiple paths exist in a network, it should be possible to load balance across them. Critical Reroute can leverage such paths. Consider the network topology shown in Figure 5.1a with three paths and the three flows shown in Table 5.1, any two of which would cause a path to be overloaded. For this example, Critical Reroute assigns one flow to use N7 as a midpoint, another to use N8, and leaves one flow on the default OSPF path. 5.2 Critical Reroute Protects Priority Flows For this example, we use the same topology shown in Figure 5.1b, but now consider one of the paths has failed. We would still like to route the same three flows from Table 5.1 through the network, but now overload is unavoidable. Critical Reroute notices that one flow is annotated with a higher priority than the other two, so it selects that flow to go on its own path, while the other two flows are assigned to use the same, now overloaded, path. Critical Reroute protects the higher-priority traffic, while the lower-priority flows get the best possible service given the situation. 27 (a) A Simple Network With Three Paths (b) A Simple Network With Two Paths Figure 5.1: A Simple Network Based on the Testbed Being Deployed in Ammon, Idaho. Each link has the same latency and can handle 10Gb/s of traffic. The OSPF default route from CB to N10 goes through N2. Table 5.1: Demands Applied to Figure 5.1 Source CB CB CB Destination N10 N10 N10 Bandwidth 7 Gb/s 7 Gb/s 7 Gb/s Priority 1 1 10 28 5.3 Critical Reroute Scales to Realistic Networks 5.3.1 Experiment Setup The topologies and demands used to evaluate this work are summarized in Table 5.2 and come from the collection of test cases published alongside DEFO [22]. From that dataset, we have elected to use only the inferred topologies originally from the Rocketfuel project [42] since they best correspond to our real-world usecase. All experiments were run on a Xeon E5-2630 v3 CPU with 64GB RAM. Because DEFO does not optimize with respect to priority values, the traffic matrices we use have no priority annotation. Due to lack of better information, we have assigned each demand a priority value of 1 to represent non-critical flows and 10 to represent critical flows, with each flow being considered critical with 20% probability. Each traffic matrix is rerun 5 times to evaluate the variance of the optimizer. 5.3.2 Results In many cases, the result presented is overload factor, since this is the quantity the optimizer is primarily trying to reduce. Recall that this is computed for a particular solution as the product of the highest priority value on each link, multiplied by the bandwidth usage which exceeds the link capacity, if any, summed over every edge in the network graph. For instance, if for a particular solution a single link in the entire network carrying traffic with a priority value of 10 were overloaded by 1000 bandwidth units, the overload factor would be 10000. Measuring a solution in this way should be thought of in comparison to other solutions on the same topology, rather than as a global measure. In all cases, an overload factor of 0 indicates that the optimizer managed to find a solution with no overload. The other measure of interest is the latency factor, which is more readily understandable in the real world. Latency factor can be roughly understood in the same units as the link delay (often milliseconds) since it is the product of the delay of every link traversed by a flow multiplied by the flow’s priority value. Once a flow has been optimized such Table 5.2: Summary of Tested Topologies Topology Name rf3967 rf1239 Nodes 79 315 Edges 294 1944 Demands 6162 96057 29 that it is avoiding overloaded links, which have an unbounded negative effect on network performance, it becomes interesting to make sure that it is taking the shortest path possible. 5.3.3 Analysis DEFO [22] seeks to optimize the most overloaded link such that, ideally, all links are below their theoretical bandwidth limit. For that work, the generated traffic demands always have some solution where bandwidth per link is below the threshold. In this environment, Critical Reroute can also find a solution such that no link is overloaded. However, this may take more time since Critical Reroute has to compute a more complicated cost function. Figures 5.2 and 5.3 show example runtimes of the tested topologies and show the decrease in cost function over time for each of the five trials on a particular topology using the same traffic matrix and priority value annotations. Figure 5.2 shows the relatively small 3697 Rocketfuel [42] topology. Because of its size, it does not take long for the optimizer to find an initial solution which eliminates overload. Figure 5.3 shows runtimes of the largest topology we have evaluated against. On our test hardware, Critical Reroute needs roughly 30 minutes to converge to an initial solution which eliminates overload on this topology. This is a larger scale than Critical Reroute was originally designed to handle, but is a good example of the limits of the current approach and the ability of these methods to apply to more general problems. It may seem that 30 minutes, or even 3 minutes, is too long to wait in a real-world scenario. However, this is just the initial setup time. In an operational deployment, we would be more concerned with how well Critical Reroute can adapt to changes in the network to keep traffic flowing smoothly. Section 5.4 simulates how the optimizer might respond to the dynamic nature of a live network. Though DEFO [22] is able to optimize for its cost function in all listed topologies in 3 minutes or less, Critical Reroute’s cost function is more complicated to take into account the priority and latency of individual flows; thus it is not able to solve larger problems as quickly. Each test with the same inputs was rerun five times to test a total of 80 traffic matrices with different priority assignments for each topology. In all runs, the optimizer was able to find a solution which brought the total overload to 0. This is an indicator that the 30 Figure 5.2: Overload Factor vs. Iterations for the rf3967 Topology Figure 5.3: Overload Factor vs. Iterations for the rf1239 Topology 31 optimizer is not getting stuck in local optima or that at least those local optima result in useful solutions. This has the real-world implication that leaving the optimizer to run for the maximum available amount of time would result in the best solution, as opposed to restarting the optimizer to avoid locally-optimal solutions. Table 5.3 shows the average factor increase in latency and path length for the tested topologies. This equates to a latency increase of about 10 milliseconds and one additional link. When considering that the initial network was suffering from severe overload, causing routers to drop significant amounts of traffic, network performance would have been terrible. An increase of one extra link is a low price! 5.4 Critical Reroute Responds to Network Changes Once the optimizer has converged to an acceptable solution, it is expected that the network will only change in small steps. Therefore, instead of being started from scratch, we can re-use the midpoint assignment from the previous solution as a starting point for a new solution. For this simulation, we assume that Critical Reroute has been given enough time to find a solution which eliminates overload. Then, 1% of the nodes in the network are removed. Even as the network experiences continuing failure, Critical Reroute can respond in a reasonable amount of time to maintain service performance. Figure 5.4 shows the response time of Critical Reroute to a simulated failure in the larger rf1239 topology. It is not surprising that finding a solution gets harder at every iteration of this process since the same traffic has to share fewer paths. For the first two iterations, Critical Reroute can find a solution which eliminates overload in all test runs. In the third instance, only two test runs are able to completely eliminate overload. Figure 5.5 shows a similar test run for the smaller rf3967 topology. Surprisingly, the results from the rf1239 topology do not scale well to this experiment. At each step, only a single node is removed, and the optimizer is restarted with the midpoint assignment from the earlier run. Notice that, while the overload value starts lower than when seeded with OSPF midpoints, after 3 minutes, the optimizer is not able to make much progress. One potential explanation for this behavior could be that the smaller network is much less densely-connected, with about 3.7 edges per node as compared to rf1239’s approximately 32 Table 5.3: Average Factor Increase in Latency and Path Length for the Segment-Routed Solutions Compared to OSPF Topology rf3967 rf1239 Latency 1.12x 1.15x Path Length 1.12x 1.20x Figure 5.4: Graph Showing Recovery Time From Network Failures for rf1239. The dashed vertical line indicates the point where 1% of the remaining nodes are removed from the network. Each different symbol represents a re-run of the same test with the same input parameters 33 Figure 5.5: Graph Showing Recovery Time From Network Failures for rf3967. The dashed vertical line indicates the point where 1 node is removed from the network. Each different symbol represents a re-run of the same test with the same input parameters 34 6.2 edges per node. This means there are far fewer redundant paths, so a single node failure is likely to do more damage. The conclusion to be drawn from this experiment is that Critical Reroute is best able to recover from node failure in a network where there is plenty of redundancy. Taking the results from Figures 5.4 and 5.5 into consideration with the asymptotic analysis discussed in Section 4.1.1, it seems clear that Critical Reroute is able to respond to long-timescale events like network failure, but a large number of flows in the flow matrix significantly hampers performance. In case better performance is desired, it would make sense to summarize individual network flows, such as combining all which use the same ingress port and egress ports on the same router, to reduce the number of flows being optimized by the route planner. CHAPTER 6 CONCLUSION Modern, high-performance networks provide significant advantages over last-generation networks but take significant investment. Given the high cost of installation, it is desirable to enable as many uses of these networks as possible. In particular, public safety applications often have stringent performance demands which are often met with dedicated resources. These resources are often so costly as to limit deployment, and are only needed occasionally. A solution which allows modern networks to be shared with high- and lowdemand applications would seem to solve both of these problems simultaneously. This work presents a practical approach to network flow prioritization using singlemidpoint segment routing, automated by a genetic algorithm. Additionally, it describes the design and implementation of sufficient framework to run the system on a real-world network. Our evaluations show that the optimizer can achieve its goals of protecting high-priority traffic from overload in an environment where traditional tools would not be sufficient, on timescales which could be useful to real-world applications. REFERENCES [1] Open Clip Art Library. https://openclipart.org/. [2] Government Emergency Telecommunications Service (GETS). https://www.dhs.gov/government-emergency-telecommunications-service-gets, Apr. 2013. [3] Wireless Priority Service (WPS). wps, Apr. 2013. https://www.dhs.gov/wireless-priority-service- [4] Open vSwitch. http://www.openvswitch.org/, 2016. [5] FirstNet. https://www.firstnet.com, 2018. [6] FirstNet Top Ten FAQs, Jan. 2018. [7] NetDiff: Python Library for Parsing Network Topology Data. Ninux.org - Wireless Network Community, Nov. 2018. [8] Netflow iptables module for Linux kernel, Apr. 2019. [9] B ALESTRA , G., L UCIANO , S., P IZZONIA , M., AND V ISSICCHIO , S. Leveraging Router Programmability for Traffic Matrix Computation. In Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow (New York, NY, USA, 2010), PRESTO ’10, ACM, pp. 11:1–11:6. [10] CERT/N ET SA AT C ARNEGIE M ELLON U NIVERSITY. SiLK (System for Internet-Level Knowledge). https://tools.netsa.cert.org/silk, 2019. [11] C ISCO. Introduction to Cisco IOS NetFlow - A Technical Overview. Cisco (May 2012), 1–16. [12] C ONGRESS , U. S. Middle Class Tax Relief and Job Creation Act of 2012, Feb. 2012. [13] D ARRELL , W. A Genetic http://bipad.cmh.edu/ga tutorial1994.pdf, 1994. Algorithm Tutorial. [14] D AVY, A., B OTVICH , D., AND J ENNINGS , B. An Efficient Process for Estimation of Network Demand for Qos-aware IP Network Planning. In Proceedings of the 6th IEEE International Conference on IP Operations and Management (Berlin, Heidelberg, 2006), IPOM’06, Springer-Verlag, pp. 120–131. [15] F ELDMANN , A., G REENBERG , A., L UND , C., R EINGOLD , N., R EXFORD , J., AND T RUE , F. Deriving Traffic Demands for Operational IP Networks: Methodology and Experience. IEEE/ACM Trans. Netw. 9, 3 (June 2001), 265–280. 37 [16] F ILSFILS , C., N AINAR , N. K., P IGNATARO , C., C ARDONA , J. C., AND F RANCOIS , P. The Segment Routing Architecture. In 2015 IEEE Global Communications Conference (GLOBECOM) (Dec. 2015), pp. 1–6. [17] F ORTIN , F.-A., R AINVILLE , F.-M. D., G ARDNER , M.-A., PARIZEAU , M., AND G AGN É , C. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (July 2012), 2171–2175. [18] G ENG , Y., J EYAKUMAR , V., K ABBANI , A., AND A LIZADEH , M. Juggler: A Practical Reordering Resilient Network Stack for Datacenters. In Proceedings of the Eleventh European Conference on Computer Systems (New York, NY, USA, 2016), EuroSys ’16, ACM, pp. 20:1–20:16. [19] H AFNER , J. H. Evolving a Genetic Algorithm for Network Flow Maximization. PhD thesis, University of Akron, 2012. [20] H AGBERG , A. A., S CHULT, D. A., AND S WART, P. J. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference (Pasadena, CA USA, 2008). [21] H AO , F., K ODIALAM , M., AND L AKSHMAN , T. V. Optimizing Restoration with Segment Routing. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications (Apr. 2016), pp. 1–9. [22] H ARTERT, R., V ISSICCHIO , S., S CHAUS , P., B ONAVENTURE , O., F ILSFILS , C., T ELKAMP, T., AND F RANCOIS , P. A Declarative and Expressive Approach to Control Forwarding Paths in Carrier-Grade Networks. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (New York, NY, USA, 2015), SIGCOMM ’15, ACM, pp. 15–28. [23] I WASAKI , Y., O NO , S., S ARUWATARI , S., AND WATANABE , T. Design and Implementation of OpenFlow Networks for Medical Information Systems. In GLOBECOM 2017 - 2017 IEEE Global Communications Conference (Dec. 2017), pp. 1–7. [24] J OHNSON , D., AND R EDMAN , S. Nlsdn (Linux Netlink SDN-like control) — nlsdn 0.2.1 documentation. https://gitlab.flux.utah.edu/safeedge/nlsdn, May 2019. [25] K UMBHAR , A., AND G UVENC , I. A Comparative Study of Land Mobile Radio and LTE-based Public Safety Communications. ResearchGate. [26] K UMBHAR , A., K OOHIFAR , F., G ÜVENÇ , I., AND M UELLER , B. A Survey on Legacy and Emerging Technologies for Public Safety Communications. IEEE Communications Surveys Tutorials 19, 1 (Firstquarter 2017), 97–124. [27] L AGA , S., C LEEMPUT, T. V., R AEMDONCK , F. V., VANHOUTTE , F., B OUTEN , N., C LAEYS , M., AND T URCK , F. D. Optimizing scalable video delivery through OpenFlow layer-based routing. In 2014 IEEE Network Operations and Management Symposium (NOMS) (May 2014), pp. 1–4. [28] L EBRUN , D. Reaping the Benefits of IPv6 Segment Routing. PhD thesis, Université Catholique de Louvain, Sept. 2017. 38 [29] L EBRUN , D., AND B ONAVENTURE , O. Implementing IPv6 Segment Routing in the Linux Kernel. In Proceedings of the Applied Networking Research Workshop on - ANRW ’17 (Prague, Czech Republic, 2017), ACM Press, pp. 35–41. [30] L EBRUN , D., J ADIN , M., C LAD , F., F ILSFILS , C., AND B ONAVENTURE , O. Software Resolved Networks: Rethinking Enterprise Networks with IPv6 Segment Routing. In Proceedings of the Symposium on SDN Research (New York, NY, USA, Mar. 2018), SOSR ’18, ACM, pp. 6:1–6:14. [31] M ORTIER , R. PyRT: Python Routeing Toolkit, Aug. 2018. [32] M UNIR , A., B AIG , G., I RTEZA , S. M., Q AZI , I. A., L IU , A. X., AND D OGAR , F. R. Friends, Not Foes: Synthesizing Existing Transport Strategies for Data Center Networks. In Proceedings of the 2014 ACM Conference on SIGCOMM (New York, NY, USA, 2014), SIGCOMM ’14, ACM, pp. 491–502. [33] N ETVIZURA. Manual Deduplication. https://confluence.netvizura.com/display/ NVUG/Manual+Deduplication. [34] N GUYEN , B. Traffic Engineering using Segment Routing - Demo. http://www.cs.utah.edu/˜binh/archive/segment routing/segment-routingtutorial.html, Nov. 2017. [35] N GUYEN , B INH. Ovs-srv6. University of Utah Flux Research Group, Dec. 2017. [36] N ICHOLS , K., B LAKE , S., B AKER , F., AND B LACK , D. L. Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers. https://tools.ietf.org/html/rfc2474, Dec. 1998. [37] PAKZAD , F., P ORTMANN , M., TAN , W. L., AND I NDULSKA , J. Efficient Topology Discovery in Software Defined Networks. In 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS) (Dec. 2014), pp. 1–8. [38] PAPAGIANNAKI , K., TAFT, N., AND L AKHINA , A. A Distributed Approach to Measure IP Traffic Matrices. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement (New York, NY, USA, 2004), IMC ’04, ACM, pp. 161–174. [39] R OUGHAN , M., T HORUP, M., T HORUP, M., AND Z HANG , Y. Traffic Engineering with Estimated Traffic Matrices. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement (New York, NY, USA, 2003), IMC ’03, ACM, pp. 248–258. [40] RYU P ROJECT T EAM. RYU SDN Framework, 1 ed. [41] S HAIKH , A., AND G REENBERG , A. OSPF Monitoring: Architecture, Design and Deployment Experience. 15. [42] S PRING , N., M AHAJAN , R., W ETHERALL , D., AND A NDERSON , T. Measuring ISP Topologies with Rocketfuel. IEEE/ACM Trans. Netw. 12, 1 (Feb. 2004), 2–16. [43] V OYER , D., M ATSUSHIMA , S., P REVIDI , S., F ILSFILS , C., AND L EDDY, J. IPv6 Segment Routing Header (SRH). https://tools.ietf.org/html/draft-ietf-6man-segmentrouting-header, Apr. 2019. 39 [44] WARD , A. FirstNet Momentum: More Than 2,500 Public Safety Agencies Subscribed — First Responder Network Authority. https://firstnet.gov/news/firstnetmomentum-more-2500-public-safety-agencies-subscribed, Aug. 2018. [45] W HITE , B., L EPREAU , J., S TOLLER , L., R ICCI , R., G URUPRASAD , S., N EWBOLD , M., H IBLER , M., B ARB , C., AND J OGLEKAR , A. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Boston, MA, Dec. 2002), OSDI ’02, USENIX Association, pp. 255–270.
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6qz8c8v