Packet scheduling in the FreeBSD kernel

wfq_descrip.dvi

Implementation of Weighted Fair Queuing including

􏰅
􏰁

Supp orted by Ericsson Telecom AB􏰈

RSVP in a BSD kernel

Ian Marsh􏰅 Octob er 􏰂􏰁􏰀 􏰁􏰄􏰄􏰃

Contents
􏰁 Intro duction 􏰗 􏰂 Basic Architecture 􏰗 􏰗 Example Session 􏰘

􏰘 Scheduler 􏰘 􏰘􏰈􏰁 Platform􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰘 􏰘􏰈􏰂 Duality 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰙 􏰘􏰈􏰗 Userperspective 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰙

􏰘􏰈􏰗􏰈􏰁 Open 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰙 􏰘􏰈􏰗􏰈􏰂 Ioctl􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰙 􏰘􏰈􏰗􏰈􏰗 Read􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰊

􏰙 Kernel p ersp ective 􏰊 􏰙􏰈􏰁 FilesystemInterface􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰊 􏰙􏰈􏰂 Init􏰖Delete 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰃 􏰙􏰈􏰗 AddingFlowInformation 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰚 􏰙􏰈􏰘 DeletingFlowInformation􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰚

􏰊 Int􏰇serv􏰖RSVP parameter handling 􏰄 􏰊􏰈􏰁 ExplanationofInt􏰇servparameters􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰄

􏰃 Reading data

from the

kernel 􏰁􏰉

􏰚 Stand alone mo de

􏰁􏰉

􏰄 Weighted Fair Queuing

􏰁􏰉 􏰄􏰈􏰁 Theory􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰁 􏰄􏰈􏰂 WFQImplementation 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰁

􏰄􏰈􏰗 Futurework􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈

􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰂

􏰁􏰉 Practicalities
􏰁􏰉􏰈􏰁 Installing􏰀 compiling and debugging 􏰁􏰉􏰈􏰂Installation 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰉􏰈􏰗Debugging􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈

􏰁􏰂 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰗 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰗 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰘 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰘 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰘

􏰁􏰉􏰈􏰗􏰈􏰁Daemon􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰉􏰈􏰗􏰈􏰂Kernel􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈

􏰁􏰁 Using RSVP
􏰁􏰁􏰈􏰁Errors 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈

􏰁􏰙 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰙

􏰁􏰂 Internals

􏰁􏰙

􏰁􏰗 Caveats
􏰁􏰗􏰈􏰁KnownBugs 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰗􏰈􏰂Notimplemented􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈

􏰁􏰊 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰊 􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈􏰈 􏰁􏰊

􏰂

􏰁 Intro duction
The main goal of this implementation is to make a networking device that can

schedule IP tra􏰏c
is conducted according to

The device understands the

RSVP proto col 􏰠􏰗􏰡􏰀 interprets it

uler is also evident in Interactions b etween the

two mo dules is

the sched􏰇 the device driver􏰈

on existing network interfaces􏰈 Currently scheduling of packets

a scheme known as Weighted Fair Queuing 􏰠􏰂􏰡􏰈

and uses information In order to make clear which terms refer to the architecture the following diagram

from the proto col to

set parameters in

the WFQ scheduler􏰈

should b e used

as reference
RSVP 􏰒or protocol􏰓

user kernel

􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇 Scheduler 􏰒or Traffic Control Module􏰓

RSVP and proto col interchangeably􏰀 likewise so for scheduler

how to use the system are given followed by how

In this text
and Tra􏰏c control mo dule􏰈 The b oundary b etween the

we use

proto col and
their separation across user and kernel spaces resp ectively􏰈

discussed later􏰈
is to without any mo di􏰍cations to

An imp ortant secondary goal
Other queuing schedulers 􏰠􏰁􏰡 have b een written but require mo di􏰍cations to

the

driver or the IP stack in
This rep ort will start with

􏰂 Basic Architecture The basic architecture of the system is

as shown in

some way􏰈

a basic intro duction to the system demonstrated with a simple 􏰗 entity system􏰀 􏰂 end systems and a single router􏰈 In subsequent

sections the
the proto col􏰈 Some
to extend􏰖change and if need be debug the system􏰈

internals of examples of

HOST Router HOST

the scheduler are explained followed by some details ab out

the illustration b elow􏰀 all com􏰇 the router is resp onsible for making for resources􏰈 The presence of RSVP dae􏰇 mons on the hosts are to enable the communication with the RSVP on the router􏰈 The hosts can also use an application program RTAP supplied with the public

p onents run the rsvp d daemon􏰀

􏰁

􏰧􏰧 vv

however only the Op erating System 􏰒OS􏰓

requests to
domain implementation of RSVP from

􏰁

ISI􏰈

RSVP 􏰜􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰞 RSVP

􏰜􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰞 RSVP Currently the router must run rsvp in order to make reservations on the router

􏰧V

Traffic
Control

􏰗

􏰗 Example Session

what an RSVP session can lo ok like driven from

The following is given as to show a user supplied program 􏰒RTAP􏰓􏰈

Sender
T􏰁􏰞 session udp 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰘􏰁

T􏰁􏰛 rapi􏰥session 􏰝􏰞
T􏰁􏰞 sender 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰙􏰙 􏰠t 􏰁􏰁􏰁􏰁􏰁

sid􏰝 􏰁􏰀

fd􏰝 􏰗

􏰂􏰂 􏰡

rapi􏰥sender􏰒􏰓 OK 􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇

T􏰁􏰛 Resv Event 􏰇􏰇 Session􏰝 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰘􏰁􏰛􏰉

􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰙􏰙􏰛􏰔 􏰠G 􏰠􏰁􏰁􏰁􏰒􏰂􏰉􏰓 p􏰝􏰁K m􏰝􏰂􏰉􏰉 􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇

Receiver
T􏰁􏰞 session udp 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰘􏰁

M􏰝􏰁􏰈􏰙K􏰡 R􏰝􏰂􏰂 S􏰝􏰁􏰈􏰁􏰁K􏰡

T􏰁􏰛 rapi􏰥session 􏰝􏰞 􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇

sid􏰝 􏰁􏰀
T􏰁􏰛 Path Event 􏰇􏰇 Session􏰝 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰘􏰁􏰛􏰉

fd􏰝 􏰗
sicsgen􏰛􏰔 􏰠T 􏰠􏰁􏰁􏰈􏰁K􏰒􏰂􏰂􏰓 p􏰝Inf

m􏰝􏰉 M􏰝􏰊􏰙􏰈􏰙K􏰡􏰡

􏰠T 􏰠􏰁􏰁􏰈􏰁K􏰒􏰂􏰂􏰓 p􏰝Inf 􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇

m􏰝􏰉 M􏰝􏰊􏰙􏰈􏰙K􏰡􏰡 G􏰝􏰦􏰉
T􏰁􏰞 reserve ff 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰙􏰙 􏰠g 􏰂􏰂 􏰁􏰁􏰁􏰁 􏰁􏰁􏰁 􏰂􏰉 􏰁􏰉􏰉􏰉 􏰂􏰉􏰉 􏰁􏰙􏰉􏰉 􏰡

The user input commands are shown pre􏰍xed
by T􏰁􏰛􏰈 The sender and receiver agree on a 􏰋session identi􏰍er􏰆 which is normally the receivers address and a p ort numb er􏰈 Thus b oth enter the session command􏰈 Next

􏰩 tra􏰏c sp ec

the network con􏰍guration􏰓 see

Finally the sender may 􏰒dep ends on

a message as the Integrated Services

b e found in
The tra􏰏c sp eci􏰍cations are explained in

section􏰈

man 􏰚 rtap􏰈

more detail in

􏰘 Scheduler

􏰘􏰈􏰁 Platform

by T􏰁􏰞
sp ec with the sender command and

the sender advertises it􏰑s tra􏰏c
address 􏰒optionally a p ort as well
receiver should see this as a Path Event􏰀 shown b etween in the receivers session􏰈

The receiver make a reservation to 􏰩 command reserve
􏰩 style Fixed Filter 􏰒FF􏰓

the sender sp eci􏰍ed thus􏰛

􏰩 typ e of

service􏰀 􏰋g􏰆 guaranteed
a result of this reserve action􏰈 More details on how to drive an RSVP session can

The co de will as far is as known􏰀
Op enBSD and NetBSD􏰀 little e􏰌ort should b e needed to p ort it to systems which

will work on 􏰘

all BSD systems􏰈 This includes

􏰉 􏰉 􏰉􏰨

and the resp onse by RTAP

it􏰑s interface if the sender is sending multiple sessions􏰓􏰀 The

have TCP􏰖IP implementations based on

or the kernel􏰀 each p ersp ective is describ ed in
familiar with devices will b e able to comprehend the devices use

no change has b een made
so constructing the driver to work with systems that use STREAMS for example

should not b e di􏰏cult􏰈

to the internal

􏰘􏰈􏰂 Duality

a normal character device to

The ob jective to
user space programs and a normal networking device to the device exhibits a 􏰋duality􏰆dep ending on whether the

the kernel and

use such a device is it lo oks like

and likewise for

􏰋ifcon􏰍g􏰆 command􏰈

manual command similar to the one shown

driver develop ers􏰈

􏰘􏰈􏰗 User p
􏰘􏰈􏰗􏰈􏰁 Op en
For the user space programmer the scheduler lo oks like

ersp ective
two ways in which to op en􏰖close the device􏰀 from

a normal device􏰈 the RSVP proto col and

in the Kernel􏰖RSVP interface section􏰈

A 􏰍le descriptor is returned by

system call

read􏰖write􏰂 􏰖close􏰈 Should the op en fail for some reason􏰀

will b e returned􏰀

this should b e

then 􏰇􏰁 checked for and rep orted to the application program􏰈

We use the
the kernel called from

􏰘􏰈􏰗􏰈􏰂 Io ctl
It is intended that

the RSVP proto col􏰀 describ ed next􏰈

be

􏰂

Not needed in this implementation 􏰙

via RSVP􏰀 however there might

BSD systems 􏰒e􏰈g􏰈 SunOS􏰘􏰓􏰈 As stated

ifconfig 􏰧􏰧 􏰧􏰧 VV

structures of the TCP􏰖IP

or the driver

􏰕􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰕
􏰧 open 􏰧 System call

􏰧􏰧 􏰕􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰕

In order for the user to op en the device a
b elow should b e issued􏰀 how to attach a command like this to

RSVP is discussed

wfqfd 􏰝 open􏰒􏰢􏰢􏰖dev􏰖wfq􏰉􏰑􏰑􏰀 RDWR􏰓􏰤

the system as a result of
the device should b e sp eci􏰍ed as a string and

path and name of
how the device should b e op ened􏰈 Write is necessary

the the 􏰎ags indicating 􏰋IO control􏰆 􏰒io ctl􏰓

􏰒man 􏰂 io ctl􏰓􏰈
Thereafter the wfqfd 􏰍le descriptor can b e used

to use the
as with normal 􏰍le op erations􏰀

the the kernel co de􏰈 Therefore view is from user space following sections􏰀 where develop ers

􏰕􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰕
􏰧 wfqopen 􏰧 Kernel

􏰍le descriptor particularly for io ctls allowing parameters to set

in

most communication will b e done
cases where the user would like to control the interface without issuing RSVP

RSVP

issuing the op en call􏰀

There are

commands􏰀 i􏰈e􏰈 to reset the interface for example􏰀 for this see the Stand alone

section􏰈
The following co de shows how

to call
in this case 􏰋WFQ ENABLE􏰆 a 􏰐de􏰍ned value that can b e matched

when the io ctl is executed􏰈 Additionally a p ointer to a structure is contains the data to b e passed to the scheduler􏰀 for example􏰛

in the kernel passed that

􏰘􏰈􏰗􏰈􏰗 Read

struct interface 􏰦
char eth􏰥name􏰠IFNAMSIZ􏰡􏰤

􏰨 u􏰥int eth􏰥len􏰤

bzero􏰒􏰒void 􏰔􏰓interface􏰀 IFNAMSIZ􏰓􏰤 strcpy􏰒interface􏰈eth􏰥nam􏰀 􏰢􏰢ep􏰉􏰑􏰑􏰓􏰤 interface􏰈eth􏰥len 􏰝 strlen􏰒􏰢􏰢ep􏰉􏰑􏰑􏰓􏰤

log􏰒LOG􏰥DEBUG􏰀 􏰢􏰢Error

the scheduler to p erform a

certain function􏰀

if􏰒ioctl􏰒wfqfd􏰀 WFQ􏰋􏰥ENABLE􏰀 􏰣interface􏰓 􏰜 􏰉􏰓 􏰦

in setting up scheduler􏰑􏰑􏰓􏰤 Reading from the device is accomplished through the read system call􏰈 Typically a

􏰨 return TC􏰥ERR􏰤

return TC􏰥OK􏰤

call would lo ok like
bytes􏰥read 􏰝 read􏰒wfqfd􏰀 buf􏰀 sizeof􏰒buf􏰓􏰓􏰤

􏰛
The device reads the requested numb er of bytes 􏰒sizeof􏰒bytes􏰓􏰓 and places them

in an array of characters􏰀 named buf

in this example􏰈 A
The following section describ es how the kernel implements requests from

the numb er of bytes output in

a 􏰎ow􏰈

p ossible use is to return

user for more background see
􏰙 Kernel p ersp ective

calls are done
􏰙􏰈􏰁 File system Interface

the

the following section is

one for handling the proto col and

and the kernel􏰀 proto col is dealt

the user space RSVP

daemon to

􏰠􏰘􏰡􏰈

Describ ed in
rationale b ehind the implementation􏰈 􏰂 􏰎ows of control are p ossible in

􏰍le system interface description􏰈

an explanation of
one for the data stream􏰈 The

with 􏰍rst in the
The skeleton 􏰍le tc test􏰈c in the RSVP distribution provides the main interface

functions from
are found in rsvp var􏰈h􏰈 If the RSVP daemon is compiled with the SCHEDULE 􏰎ag the routines in tc test􏰈c will be called automatically from the daemon 􏰒all the

the kernel􏰈
The co de is lo osely based on a network pseudo device that has a character device

from the rsvp resv􏰈c 􏰍le􏰓􏰈

interface 􏰒cdevsw􏰓􏰗 􏰈 We use the character device interface to pass

and obtain in􏰇 to read and write

the co de implemented

The function prototyp es

􏰗 User space implementations of PPP typically use a similar device device

packets from the kernel

􏰊

formation to and from the kernel􏰈 Packets do not leave the kernel in the 􏰎ow of

data􏰈
The sequence of events from the user to the kernel is shown in the illustration􏰛

RSVP open
ifconfig 􏰇􏰞 read 􏰇􏰞 􏰖dev􏰖wfq􏰉

KERNEL
scheduler 􏰜􏰇

v wfqdelete 􏰜􏰇 wfqioctl

kernel􏰀 essentially the same􏰈
The dispatcher on the kernel side

ioctl
close 􏰧

USER 􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝􏰝

wfqinit

etc􏰈

The ab ove shows the 􏰎ow
command to the scheduler􏰈 Once interpreted by the system call they are􏰀 to the

of control from the RSVP daemon or

a user 􏰋ifcon􏰍g􏰆

is a function called wfqio ctl􏰈 Character device drivers in UNIX are exp ected to provide an interface to user space programs which

implement common UNIX 􏰍le system commands􏰀 read􏰀 write􏰀 select etc􏰈
call is used to control a device and manipulate device parameters of the device and

it is this system call that Once a io ctl is called

The io ctl

interprets the RSVP proto col􏰈
by a user space pro cess the OS will dispatch it to the

correct device 􏰒via the 􏰍le descriptor􏰓 and and hence call the io ctl for the device

sp eci􏰍ed􏰈

􏰙􏰈􏰂 Init􏰖Delete

Within the kernel each 􏰎ag in command call can be matched􏰛

the io ctl system call is replicated so user sp eci􏰍ed

ioctl􏰒wfqfd􏰀 WFQ􏰥ENABLE􏰀 􏰣interface􏰓 User 􏰐define WFQ􏰥ENABLE 􏰥IOW􏰒􏰑t􏰑􏰀 􏰄􏰘􏰀 struct interface􏰓 Kernel

Exact details of how
in the wfqio ctl􏰒􏰓 function each of requests is matched in

this is done is outside the scop e of this rep ort􏰈 However

corresp onding function is called to
Setting the interface up involves the following steps􏰛

􏰩 Check to see if the interface is already up

􏰩 Substitute the output of
􏰩 Globally mark the interface as

􏰩 Initialise a

Best E􏰌ort

up 􏰎ow 􏰒􏰎ow 􏰉􏰓

a switch statement and carry out this request􏰈

a

the given interface for a scheduling one

Similarly setting the interface down􏰀 either by 􏰋ifcon􏰍g down􏰆 or by killing the RSVP daemon with the 􏰋kill􏰆 command will close the device by as follows􏰛

  • 􏰩  Check to see if the interface is
  • 􏰩  Put back the original output handler
  • 􏰩  Globally mark the interface as dow􏰃n

already down

􏰩 Delete the Best E􏰌ort 􏰎ow 􏰒􏰎ow 􏰉􏰓
Note􏰛 the events ab ove apply only to setting

up and down the device􏰈 Adding

and removing 􏰎ows of
well as handling the data is describ ed in

data is the resp onsibility

of the proto col􏰈 This sections􏰈

the next 􏰙􏰈􏰗 Adding Flow Information

pro cess as the reserve command generates

Referring back
􏰂 messages which result in
proto col 􏰒arguments not shown􏰓􏰈 No co de in the TC mo dule is called for or sender messages􏰈

to the example session section 􏰂􏰀
the following functions b eing called within

the RSVP the session

Update􏰥TC􏰒􏰓 Update􏰥TC􏰒􏰓

􏰧􏰧

vv TC􏰥AddFlowspec􏰒􏰓 TC􏰥AddFilter􏰒􏰓

􏰧􏰧

vv kernel kernel

the kernel co de􏰀 a the originally sp eci􏰍ed Sender lower of these are used to avoid over􏰇reservations or unnecessary

The TC AddFlowsp ec function passes two structures to

FlowSp ec 􏰒sp eci􏰍ed by the reserve command􏰓 and

Tra􏰏c spec􏰘 􏰈 The failures taking place􏰈

􏰙
the Int􏰇serv sp ec ob ject

As describ ed ab ove deleting a
example􏰀 will result in 􏰂 functions b eing called to delete the 􏰎owsp ec information and the 􏰍lter sp ec that was instantiated in the Add calls􏰈

􏰚

The Flowsp ec and Filter sp eci􏰍cations are given in

and 􏰁􏰂 resp ectively􏰈
ipants to the kernel􏰈 The TC AddFilter􏰒􏰓 function 􏰍rst passes the Session to the

numb ers 􏰄
The TC AddFilter􏰒􏰓 function is resp onsible for communicating the

the partic􏰇

kernel􏰀 so for example􏰀 mo di􏰍cations to

the FlowSp ec can

the kernel􏰀
Filter sp eci􏰍cations are given in

a Session􏰈

Secondly􏰀 a FiltSp ec is passed to
extra participants join the session􏰀 for example where WF is knows b ecomes aware that new parties have joined􏰈

sp eci􏰍ed􏰀 the

Should router

the RSVP
Filtering based on source address is not used in the current version􏰀 however

The Session and
􏰁 and 􏰁􏰉 resp ectively􏰈

supp ort is
􏰙􏰈􏰘 Deleting Flow Information

implemented􏰈

b e matched to which is the sender address􏰈

􏰊
sp ec ob ject numb ers

􏰘

TC􏰥DelFlowSpec􏰒􏰓 TC􏰥DelFilter􏰒􏰓

􏰋smaller􏰆 of the two passed to

  • 􏰙  draft􏰇ietf􏰇intserv􏰇rsvp􏰇use􏰇􏰉􏰂􏰈txt
  • 􏰊  draft􏰇ietf􏰇rsvp􏰇sp ec􏰇􏰁􏰊􏰈txt

􏰎ow􏰀 triggered by a 􏰋close􏰆 command from RTAP􏰀
sp eci􏰍ed by the Up date TC􏰒􏰓 function to delete the information

for This is ignored in this implementation􏰀 but should b e compared to the FlowSp ec and the

Only a handle is needed in the kernel􏰈

Update􏰥TC􏰒􏰓 Update􏰥TC􏰒􏰓 􏰧􏰧 vv

the kernel

kernel Parameters sp eci􏰍ed from

kernel
statically allo cated in the kernel􏰈

A parameter 􏰒rhandle􏰓 is

RSVP are stored in
used to index into the table to

lo cate appropriate entries

for each session􏰈

􏰧􏰧 vv

􏰊 Int􏰇serv􏰖RSVP parameter handling

A do cument􏰃 describ es these parameters in full detail􏰈

As describ ed the TC mo dule will
one from the sender as a SENDER TSPEC 􏰒ob j class 􏰁􏰂􏰓
In the following the units and meaning as well as some example values􏰈

receive 􏰂 messages with FLOWSPECS in

and one
Some words are needed on the units of the parameters􏰈The Sender should advertise

􏰊􏰈􏰁 Explanation of

Int􏰇serv parameters

they are going to
router compares the 􏰎owsp ecs from b oth the

what kind rate of tra􏰏c
this allows optimal resource reservation in the router􏰀 as the receiver could over

reserve􏰈 The
and selects the lower for

generate 􏰒vic

􏰂􏰈􏰚 for example do es this􏰓

sender and the receiver the Tra􏰏c Control mo dule to reserve􏰈

them􏰀 from the sender􏰈

Tsp ecs 􏰒tra􏰏c sp ecs􏰓􏰛
􏰩 r token bucket rate 􏰒bytes􏰖sec􏰓

􏰩 b bucket depth 􏰒bytes􏰓

􏰩 p p eak rate
􏰩 m minimum p oliced unit 􏰒bytes􏰓 􏰩 M maximum packet size 􏰒bytes􏰓

senders􏰖receivers share use one sp eci􏰍ed the Sender Tsp ec or

Session􏰈 These parameters required􏰀 if

packet size which could􏰀

for example b e set RSp ec 􏰒desired service􏰓􏰛

to the maximum IP packet size 􏰊􏰙􏰙􏰗􏰊

bytes􏰈

􏰩 R Required rate

􏰒bytes􏰖sec􏰓

The r term describ es the Rate at which the sender sends􏰀 it is normally an 􏰋average􏰆 rate􏰈 Any bursts in the data should b e absorb ed by b􏰈 Both these parameters can b e used to set bu􏰌ers in the scheduler􏰀 particularly where multiple

Reservation fails􏰈 The p eak rate known b etween the sender and

rate of the fastest rate
could b e sp eci􏰍ed􏰀 if unknown 􏰒indicated by 􏰉􏰓 in􏰍nity is assumed􏰈 The

minimum can also b e sp eci􏰍ed􏰀 which may b e set as the IP􏰕UDP􏰕RDP packet sizes􏰈 If this is not given 􏰉 is taken􏰈 The M parameter is maximum packet size􏰀

in 􏰒bytes􏰖sec􏰓
the same unit􏰀 except it is user sp eci􏰍ed􏰈 Normally it should match the Sender􏰑s

􏰩 S Slack term 􏰒microseconds􏰓

The R in RSp ec parameters is the same as the Tsp ec m parameter and uses

Tsp ecs􏰈 The to􏰈

R term is essentially the share of a links bandwidth the 􏰎ow is entitled

􏰃 draft􏰇ietf􏰇intserv􏰇rsvp􏰇use􏰇􏰉􏰂􏰈txt

􏰄

they are not can b e set as the the receiver􏰀 an Ethernet rate

The S term􏰀 Slack refers to the delay the reserver is willing to incur and is an

indicator to the router how

delay􏰈 If not sp eci􏰍ed􏰀 it
that a reservation request will b e met􏰈

􏰩 R 􏰁􏰂􏰙Mbytes as

ab ove 􏰒bytes􏰖sec􏰓

much bu􏰌er space needs to reserved to supp ort this is taken to b e 􏰉􏰈 By supplying some Slack it is more likely

Note􏰛 From the R
exp ected to derive bu􏰌er
each scheduling element in
pass downstream􏰀 known as C and D􏰀 this is not done in this implementation􏰈 More on this can be found in 􏰠􏰙􏰡􏰈

and S terms􏰀
requirements􏰈 Additionally􏰀 it is also

􏰩 S 􏰁􏰉􏰉􏰉􏰉
􏰃 Reading data from the kernel

􏰒microseconds􏰓 􏰁􏰉ms

kernel􏰈 As b efore this is

a path

Tsp ecs the router is the resp onsibility of

Typical examples􏰛
􏰩 r 􏰁􏰂􏰙 Mbytes􏰖sec 􏰒􏰁Mbits􏰖sec􏰓 CellB􏰖Matrox card 􏰩 b 􏰂􏰉􏰘􏰚 bytes 􏰒􏰂k􏰖bytes􏰓􏰓
􏰩 p 􏰙􏰉􏰉􏰉􏰉 􏰒bytes􏰖sec􏰓 Video source
􏰩 m 􏰘􏰉 IP􏰕UDP􏰕RTP headers no data
􏰩 M 􏰁􏰙􏰉􏰉 Ethernet frame􏰒bytes􏰓

p as well as the Senders
to calculate parameters ab out it􏰑s capabilities to

Also implemented is a metho d to
implemented as standard 􏰍le 􏰋reads􏰆􏰈 Currently the numb er of bytes p er 􏰎ow is

read data

from the

user space􏰈

The frequency of this op eration is

rep orted on a read request from
determined by the timeval structure passed to the select in the user space program􏰈

An example is included in

bandwidth services􏰈 The

the distribution􏰈

􏰚 Stand alone mo
Should the user sp ecify parameters to

􏰜iface􏰞 􏰜cmd􏰞 command this

a generalization of

shell ifcon􏰍g by the wfqi􏰇

􏰍o ctl􏰒􏰓 function in

the kernel􏰈

running in Weighted Fair Queuing 􏰒WFQ􏰓

correctly when
􏰄 Weighted Fair Queuing

de
is p ossible􏰈 In

stand alone mo de􏰈

scheduling device via the the kernel this is handled

Pro cessor Sharing 􏰒PS􏰓 􏰠􏰊􏰡􏰈 In PS each

the
It is up to the user to sp ecify the int􏰇serv parameters

is a packet scheduling technique allowing guaranteed

purp ose of
an approximation of Generalized Pro cessor Sharing 􏰒GPS􏰓 which􏰀 as

queue􏰀 an ill􏰇b ehaved session 􏰒who are sending a

lot of data􏰓

session has its will only punish

WFQ is to let several sessions share the same

link􏰈 WFQ is
the name suggest􏰀 is
session has a separate FIFO queue􏰈 At any given time the N active sessions 􏰒the ones with non􏰇empty queues􏰓 are serviced simultaneously􏰀 each at a rate of 􏰁􏰖N􏰛th

of the link sp eed􏰈 Contrary to
service shares􏰈 GPS have several nice prop erties􏰈 Since each

PS􏰀 GPS allows di􏰌erent sessions to

have di􏰌erent

own itself and not other sessions􏰈 Further􏰀 GPS allows sessions to have di􏰌erent guaranteed

bandwidths allo cated to them􏰈 In 􏰠􏰃􏰡 Parekh showed that when using a
with GPS switches and a session that is leaky bucket constrained an end􏰇to􏰇end

delay b ound can b e guaranteed􏰈

􏰁􏰉

network

􏰄􏰈􏰁 Theory

GPS is an idealized 􏰎uid mo del not practically realizable􏰈 WFQ􏰀 􏰍rst presented in 􏰠􏰚􏰡􏰀is a packet approximation of GPS􏰈 In WFQ a packet at a time is selected and outputed among the active sessions􏰈 This works as follows􏰈 Each arriving packet is given virtual start and 􏰍nish times􏰈 The virtual start time S􏰒k􏰀i􏰓 and the virtual 􏰍nish time F􏰒k􏰀i􏰓 of the k􏰛th packet in session i are computed as follows􏰛

S􏰒k􏰀i􏰓 􏰝 max􏰒F􏰒k􏰇􏰁􏰀i􏰓􏰀 V􏰒a􏰒k􏰀i􏰓􏰓􏰓 F􏰒k􏰀i􏰓
with F􏰒􏰉􏰀i􏰓 􏰝 􏰉􏰀 a􏰒k􏰀i􏰓 and L􏰒k􏰀i􏰓 are the arrival time and the length of the

packet resp ectively􏰈 V􏰒t􏰓
of virtual time in the simulated GPS mo del and is de􏰍ned as this􏰛

is the virtual time
dV􏰒t􏰓􏰖dt 􏰝 􏰁􏰖􏰒Sum of active sessions shares at time t􏰓

This means that when there are

function representing the progression inactive sessions the virtual time progresses

faster􏰈 In the corresp onding GPS

case this
The packet selected for output is the packet with the smallest virtual 􏰍nish time􏰈

active sessions get

more service􏰈

can b e viewed as

that the remaining

b ehind the GPS system by

more than a maximum sized packet􏰈

Both WFQ and WF􏰂Q
exp ensive􏰈 In 􏰠􏰁􏰉􏰡 an algorithm􏰀 dubb ed WF􏰂Q􏰕

is describ ed􏰈
V􏰒t􏰕T􏰓 􏰝 max􏰒V􏰒t􏰓 􏰕

which is computationally with a new virtual time function

􏰝 S􏰒k􏰀i􏰓

􏰕 L􏰒k􏰀i􏰓􏰖r􏰒i􏰓

Packet GPS 􏰒PGPS􏰓 algorithm which is identical to the

In 􏰠􏰃􏰡 Parekh describ es a
WFQ algorithm describ ed here􏰈 He derives several relationships b etween a 􏰎uid

GPS system and a corresp onding packet WFQ
􏰁􏰈 􏰍nish time of a packet in the WFQ system compared to the GPS system will

not b e later than
􏰂􏰈 The numb er of bits serviced in a session by the WFQ system will

system􏰛

at most the transmission time of a maximum sized

packet􏰈 not fall

In 􏰠􏰄􏰡 an improved algorithm􏰀 called Worst􏰇case Fair
􏰒WF􏰂Q􏰓􏰀 is presented􏰈 In WF􏰂Q only packets who have a virtual start time that

has b een passed are considered for GPS and has lower delay b ounds but

Weighted Fair

Queuing

output􏰈 WF􏰂Q

is a b etter approximation of

increases implementation complexity􏰈

must simulate the virtual time

T􏰀 min􏰒S􏰒h􏰒i􏰀t􏰓􏰀i􏰓􏰓
is the packet numb er of the 􏰍rst packet in sessions i􏰛s queue at

where h􏰒i􏰀t􏰓
t􏰈 Also WF􏰂Q􏰕 can guarantee delay b ounds given a session that is constrained

time three routines􏰛 getSessionIdentity􏰀

packets are
deviation from true WFQ􏰈 Virtual time is up dated every time a packet is dequeued since it is then time is altered and since arrival time of packets are set to these times

with a leaky bucket􏰈

􏰄􏰈􏰂 WFQ
The WFQ implementation primarily consists of

Implementation

dequeue􏰈 The data structures are

form

an sorted array with session identities􏰈 The

to o􏰈 Virtual time
bandwidth􏰀 therefore we need to 􏰍nd out at

one FIFO queue p er session in getSessionIdentity function p erforms a

di􏰌erent sp eeds dep ending on

enqueue and
of a circular bu􏰌er as well as header information for each queue􏰈 There is also

binary search in
Only the virtual 􏰍nish time 􏰒F􏰓 of the 􏰍rst packet in each queue is of interest at

the array􏰈
any given time􏰈 So􏰀 instead of storing F for each packet a F for the 􏰍rst packet in each

queue is
F is up dated accordingly􏰈

stored in
Time is increased in steps each time a packet is dequeued􏰈 The arrival time of

the queue header􏰈

When a packet is

dequeued the

corresp onding

set to the depart time

of the packet currently b eing outputed􏰀 which is

a

progresses at

􏰁􏰁

the amount of active which times sessions b ecome inactive􏰈

The virtual time a session b ecomes inactive is the F of the last packet in the session􏰀 the session virtual 􏰍nish time 􏰒SF􏰓􏰈 The smallest SF among active sessions is found

and the corresp onding time

is calculated􏰈 If

then this session is
This is done iteratively until the corresp onding time is more When this happ ens the session in question is not inactive yet

time that corresp onds to

the current time is calculated􏰈

enqueue􏰒packet􏰀 i􏰓
􏰁 if not active􏰒i􏰓

􏰁􏰁 prev􏰥t 􏰝 t
􏰁􏰂 return packet 􏰁􏰗 prev􏰥t 􏰝 tmp􏰥t 􏰁􏰘 V􏰒t􏰓 􏰝 SF􏰒j􏰓
􏰁􏰙 deactivate􏰒j􏰓
􏰁􏰊 active􏰥r 􏰇􏰝 r􏰒j􏰓

deactivated and

than the current time􏰈 and instead the virtual

􏰂 activate􏰒i􏰓
􏰗 active􏰥r 􏰕􏰝
􏰘 if queue􏰒i􏰓 is empty
􏰙 F􏰒i􏰓 􏰝 SF􏰒i􏰓 􏰝 max􏰒F􏰒i􏰓􏰀 V􏰒t􏰓􏰓 􏰕 L 􏰖 weight 􏰊 else
􏰃 SF􏰒i􏰓 􏰕􏰝 L 􏰖 weight
􏰚 put􏰒packet􏰀 queue􏰒i􏰓􏰓

r􏰒i􏰓
􏰁 i 􏰝 min􏰒active queues F􏰒i􏰓􏰓

dequeue􏰒􏰓

􏰂 packet 􏰝
􏰗 t 􏰕􏰝 L 􏰖 r
􏰘 if active􏰒i􏰓
􏰙 F􏰒i􏰓 􏰕􏰝 Lnext 􏰖 r􏰒i􏰓
􏰊 for ever
􏰃 j 􏰝 min􏰒active queues SF􏰒j􏰓􏰓

get􏰒queue􏰒i􏰓􏰓

􏰚 tmp􏰥t 􏰝 prev􏰥t 􏰕 􏰒SF􏰒j􏰓

􏰇 V􏰒t􏰓􏰓 􏰔 active􏰥r 􏰖 r

􏰄 if tmp􏰥t 􏰞 t
􏰁􏰉 V􏰒t􏰓 􏰕􏰝 􏰒t 􏰇 prev􏰥t􏰓 􏰔 r 􏰖 active􏰥r

􏰄􏰈􏰗 Future work
Future improvements may include􏰛

􏰩 Better data structures and

tables􏰈
􏰩 Better data structure for

is used􏰈
􏰩 An SCFQ 􏰒Self Clo cked Fair

Queuing􏰓 implementation􏰈 SCFQ is

a simple but

not so go o d approximation of 􏰁􏰉 Practicalities

GPS􏰈

The next few supp ort􏰀 also co de􏰈

subsections explain
explained is how to make changes and some hints on debugging the

this time
the active bandwidth is decreased accordingly􏰈

is less than the current time

optimized co de like calendar queues and hash time representation􏰀 right now a 􏰎oating p oint value

how to re􏰇build the kernel with 􏰁􏰂

WFQ􏰖RSVP

􏰁􏰉􏰈􏰁 Installing􏰀 compiling and debugging

The following 􏰍les 􏰩 if wfq􏰈c

are needed in addition to the kernel sources􏰛

􏰩 if wfq􏰈h
􏰩 rsvp􏰈h
􏰩 rsvp typ es􏰈h
􏰩 rsvp intserv􏰈h
􏰩 rapi lib􏰈h
Also the follwing user􏰇space programs are needed􏰛

􏰩 tc test􏰈c
􏰩 􏰎ow read􏰈c

􏰁􏰉􏰈􏰂 Installation
is sysi􏰗􏰚􏰊conf there should b e a kernel build 􏰍le􏰀 originally called GENERIC􏰀 here

a pseudo􏰇device should be added with the name wfq􏰛

pseudo􏰇device wfq 􏰁
Add the numb er of devices you would like

simultaneously􏰈 If you intend to the numb er you need􏰈

the machine

to have running on
have scheduling on multiple interfaces then add

Also the 􏰍les􏰈i􏰗􏰚􏰊 needs to include the 􏰍le􏰛
net􏰖if􏰋􏰥wfq􏰈c optional wfq device􏰇driver

The header 􏰍le is not needed􏰀

this is found with the make dep end command􏰈

Finally a ma

jor device numb er is needed for
wfq Scheduling pseudo device driver

􏰚􏰁

the character device􏰛
as the one sp eci􏰍ed in the co de itself

The ma jor device should b e the same
􏰒if wfq􏰈c􏰓􏰀 in order to build the con􏰍g 􏰍le typ e􏰛

config 􏰇g
this will give you a kernel with symb ols for debugging􏰈 Change dir to

dep end􏰆􏰀 this
want to debug the kernel co de it is recommended that

􏰈􏰈􏰖􏰈􏰈􏰖compile􏰖BUILDNAME􏰈 Typ e 􏰋make

will create a

Should you
Make􏰍le and remove the 􏰇O
the co de􏰈 Trivial variables are optimised away making GDB step o ddly􏰈

you edit the 􏰎ag as this can give some o dd b ehavior on debugging

Having the symb ols available do es not mean
cutable 􏰒taking time to b o ot and link􏰓 as separate symb ol tables can b e generated

the 􏰈dep end 􏰍le􏰈

with gdb􏰀 describ ed in
This should generate a

that there has to b e a large exe􏰇

mo di􏰍ed kernel is ready to

to b e installed with 􏰋make install􏰆􏰈 The 􏰁􏰗

the GDB manual􏰈

kernel 􏰍le
b e run􏰀 sync the system and typ e 􏰋reb o ot􏰆􏰈

􏰁􏰉􏰈􏰗 Debugging

As the software runs

in b oth user space and in the kernel di􏰌erent metho ds of

debugging may

b e needed􏰈

􏰁􏰉􏰈􏰗􏰈􏰁 Daemon
Debugging a program that is running involves 􏰋attaching􏰆 to the

pro cess􏰈 Ensure

the daemon has b een compiled with

the 􏰇g option and

load the symb ols

with the

􏰋􏰍le􏰆 command􏰈
Attach to the pro cess 􏰋attach

􏰜pid􏰞􏰆 from within gdb􏰀
can add breakp oints􏰖watchp oints and then continue􏰈 Asynchronous

this will stop
Some kernel debugging can be done from user space with the 􏰋gdb 􏰇k command􏰆

cess􏰀 where you events will stop

the pro􏰇

the pro cess and

available􏰈
Finally the kernel co de can b e examined remotely by

􏰁􏰉􏰈􏰗􏰈􏰂 Kernel

then use gdb

as normal􏰈

access to
ined􏰖changed 􏰒ifnet for example􏰓􏰈

the 􏰖kernel 􏰍le
A kernel debugger DDB is also included in the FreeBSD distribution 􏰒enable by

and 􏰖dev􏰖mem is needed􏰈 Global variables can b e exam􏰇 This can b e invoked by a 􏰋hot key􏰆 sequence Ctrl􏰇Alt􏰇Esc

DDB in the con􏰍g 􏰍le􏰓􏰈
which will give a ddb􏰞 prompt􏰀 breakp oints can b e set􏰀 however no

a remote host􏰈 Attaching the
The remote machine is then able to b oth instruct the debugged machine to

b eing debugged and GDB running on
with a serial cable is p ossible and then issue commands from the remote terminal􏰈

step is necessary to have a copy of the kernel on the remote machine and the same source 􏰍les 􏰒if

whilst it lo oks up
accurate symb ol lo okup is required􏰓􏰈

Start a

the symb ols on the remote machine􏰈 Therefore it

debugger session on

the debugging machine􏰛

gdb 􏰇k 􏰜KERNEL􏰞
set the target remote 􏰖dev􏰖cuaa􏰉

On the machine being debugged enter the sequence

ctrl􏰇alt􏰇esc 􏰖􏰔 break to
􏰞gdb 􏰖􏰔 give control to remote gdb

􏰞s 􏰖􏰔 break into

debugger 􏰔􏰖

Now the remote machine has control over the

machine b eing debugged􏰈 From is p ossible to use all of GDB􏰑s functionality as if it were lo cal􏰀 set

the prompt it
break p oints􏰀 examine variables etc􏰈

􏰁􏰘

􏰔􏰖 variable

􏰞 break wfqioctl 􏰖􏰔 Make the machine stop here 􏰞 c 􏰖􏰔 start it running 􏰔􏰖

using 􏰈gdbinit to automate multiple breakp oints􏰀 commands􏰀 display of

makes life

easier􏰛

file 􏰖home􏰖ianm􏰖sys􏰖compile􏰖SICSRTR􏰖kernel

directory 􏰖home􏰖ianm􏰖sys􏰖net directory 􏰖home􏰖ianm􏰖sys􏰖netinet directory 􏰖home􏰖ianm􏰖sys􏰖i􏰗􏰚􏰊􏰖isa directory 􏰖home􏰖ianm􏰖sys􏰖kern target remote 􏰖dev􏰖cuaa􏰉

DDB 􏰔􏰖

􏰔􏰖

source listing is DDB running on the host

two machines

􏰁􏰁 Using RSVP Make sure the rsvp d daemon

is running on b oth end hosts and the router􏰈 􏰒It can These hosts do not need to have the 􏰇DSCHEDULE

b e added to

the rc􏰈lo cal 􏰍le􏰓

􏰎agW􏰈henthedaemonstartsitcreates􏰂􏰍les􏰛
􏰩 􏰖var􏰖log􏰖rsvp d􏰈log 􏰖􏰔 logging p ertained to rsvp d 􏰔􏰖

􏰩 􏰖var􏰖run􏰖rsvp d􏰈pid 􏰖􏰔
The log 􏰍le contains information output by the daemon itself􏰀 if extra debug

it􏰑s current PID 􏰔􏰖
info is added 􏰒for example to the tc test􏰈c 􏰍le􏰓 it will app ear in

The rsvp d􏰈pid enables the the daemon to
kill 􏰢cat 􏰖var􏰖run􏰖rsvp d􏰈pid􏰢􏰈 It is also useful for

􏰍le􏰈 example

attaching the debugger to
It is necessary to have the MROUTED option compiled into the kernel as well as

daemon in a 􏰈gdbinit script􏰈

the Path messages b etween di􏰌erent sub􏰇

to have a mrouted running to forward RSVP
nets􏰀 one esp ecially built for RSVP on BSD can b e found at

ftp􏰛ftp􏰈parc􏰈xerox􏰈compub􏰖net􏰇

researchipmulti􏰈 􏰁􏰁􏰈􏰁 Errors

the log
b e manipulated in scripts􏰈 For

􏰁􏰞 session udp 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰘􏰁 connect􏰛 Connection refused

Could not connect to
RAPI􏰛 rsvp􏰋􏰥session􏰒􏰓 err 􏰄 􏰛

rsvp server

RSVP daemon doesn􏰑t respond Then the daemon is probably not running􏰀 start it with􏰛

rsvpd
if you get􏰛
T􏰁􏰞 􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇

Also you are not allowed to
host􏰈 This confuses RSVP as it do es not know which thread to deliver a particular

T􏰁􏰛 sid􏰝􏰁 Session􏰝 􏰁􏰄􏰂􏰈􏰁􏰊􏰚􏰈􏰁􏰙􏰈􏰁􏰘􏰁􏰛􏰉 􏰇􏰇

RSVP error􏰛

not my interface 􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇􏰇

Code􏰝􏰂􏰉 Val􏰝􏰁􏰘 Node􏰝

Sender addr

􏰒 􏰓 􏰠T 􏰠􏰉􏰒􏰉􏰓 RAPI rapi􏰋􏰥dispatch􏰒􏰓􏰛 err

􏰁􏰗 􏰛 􏰒Unused􏰓

Then the could b e the

sender􏰑s sp eci􏰍cation do es not

sender has
started􏰀 hence rtap ans rsvp d di􏰌er􏰈

interface􏰈 A further situation

message when
p orts are allowed􏰈

have b oth p ort 􏰉 􏰒wildcard􏰓 and p ort X on the same

􏰁􏰂 Internals

􏰉􏰈􏰉􏰈􏰉􏰈􏰉
p􏰝􏰉 m􏰝􏰉 M􏰝􏰉􏰡􏰡

match its switched interfaces􏰀 after the

it􏰑s destined for the named p ort􏰀 􏰒error

numb er is

􏰃􏰓 Two di􏰌erent

Clo ck timer 􏰒ticks􏰓
􏰒or full sp eed􏰓􏰈 I use ping 􏰇f with max􏰈 packet size and when the software

setting can b e determined by sending data as 􏰁􏰙

fast as

is needed interrupt

RSVP daemon has b een

routine is called but there is nothing to output then the routine is b eing uncalled

unnecessarily􏰈 Rationale􏰛 Send

No attempt has b een made
on src address 􏰒􏰍ltering􏰓 WF and SE 􏰍lters

􏰁􏰗 Caveats
􏰁􏰗􏰈􏰁 Known Bugs

Deleting a 􏰎ow 􏰇
􏰁􏰗􏰈􏰂 Not implemented

as fast is as needed and still timer b eing used􏰈

RSVP rhandle sometimes wrong 􏰟

to calculate the C and D 􏰠􏰁􏰡 Kenjiro Cho􏰈 Alternate Queueing for BSD Unix

values for our no de􏰈 Selecting

References

From the altq􏰇􏰉􏰈􏰂 distribution􏰈 http􏰛􏰖􏰖www􏰈csl􏰈sony􏰈co􏰈jp􏰖p erson􏰖kjc􏰖programs􏰈html

Available from

􏰠􏰂􏰡 A􏰈 Demers􏰀 S􏰈 Keshav􏰀 and S􏰈 Shenker􏰀 Analysis and Similation of a Fair Queueing Algorithm􏰈 Pro c􏰈 SIGCOMM 􏰑􏰚􏰄􏰀 􏰁􏰄􏰒􏰘􏰓􏰛􏰁􏰇􏰁􏰂􏰀 Septemb er 􏰁􏰄􏰚􏰄􏰈

􏰠􏰗􏰡 L􏰈 Zhang􏰀 S􏰈 Deering􏰀 S􏰈 Estrin􏰀 S􏰈 Shenker􏰀 and D􏰈 Zappala􏰀 RSVP􏰛 A New

Resource ReSerVation Protocol IEEE Network􏰀 Septemb er 􏰁􏰄􏰄􏰗􏰈
􏰠􏰘􏰡 M􏰈 McKusick et􏰈 al The Design and Implementation of the 􏰘􏰈􏰘 BSD Operating

System􏰈 Addison Wesley 􏰁􏰄􏰄􏰊􏰈

􏰠􏰙􏰡 S􏰈 Shenker􏰀C􏰈 Partridge􏰀R􏰈 Guerin􏰈
IETF Draft􏰀 􏰁􏰄􏰄􏰃􏰈 available from ftp􏰛􏰖􏰖ds􏰈internic􏰈net

draft􏰇ietf􏰇intserv􏰇guaranteed􏰇svc􏰇􏰉􏰚􏰈txt 􏰠􏰊􏰡 L􏰈 Kleinrock􏰈 􏰆Queueing Systems􏰀 Volume 􏰂􏰛 Computer Applications􏰈 Wiley􏰀

􏰁􏰄􏰃􏰊􏰈
􏰠􏰃􏰡 A􏰈 Parekh􏰈 􏰆A Generalized Processor Sharing Approach to Flow Control in In􏰇

tegrated Services Networks􏰈 PhD

dissertation􏰀 Massachusetts Institute of
􏰠􏰚􏰡 A􏰈 Demers􏰀 S􏰈 Keshav􏰀 and S􏰈 Shenker􏰈 􏰆Analysis and simulation of a fair

nology􏰀 February 􏰁􏰄􏰄􏰂􏰈
queueing algorithm􏰈 In Journal of

Internetworking Research and

Tech􏰇 Exp erience􏰀

pages 􏰗􏰇􏰂􏰊􏰀
􏰠􏰄􏰡 J􏰈C􏰈R􏰈 Bennet and H􏰈 Zhang􏰈 􏰆WF􏰂Q􏰛 Worst􏰇case Fair Weighted

Octob er 􏰁􏰄􏰄􏰉􏰈

Also in Pro ceedings of

ACM SIGCOMM􏰚􏰄􏰀 pp

􏰗􏰇􏰁􏰂􏰈

Fair Queueing􏰈 In ftp􏰛􏰖􏰖ftp􏰈cs􏰈cmu􏰈edu􏰖user􏰖hzhang􏰖INFOCOM􏰄􏰊􏰈ps􏰈Z

IEEE INFOCOM􏰄􏰊􏰀 San
􏰠􏰁􏰉􏰡 J􏰈C􏰈R􏰈 Bennet and H􏰈 Zhang􏰈 􏰆Hierarchical packet fair queueing algorithms􏰈 In

Pro ceedings of
􏰁􏰄􏰄􏰊􏰈 ftp􏰛􏰖􏰖ftp􏰈cs􏰈cmu􏰈edu􏰖user􏰖hzhang􏰖TON􏰇􏰄􏰃􏰇Oct􏰈ps􏰈Z

the ACM SIGCOMM􏰄􏰊 pages 􏰁􏰘􏰗􏰇􏰁􏰙􏰊􏰀 Palo Alto􏰀 CA􏰀 August 􏰁􏰊

Fransisco􏰀 March 􏰁􏰄􏰄􏰊􏰈