Ghaderi, MajidHosseini, Seyed Morteza2024-06-052024-06-052024-06-03Hosseini, S. M. (2024). Flow size prediction with short time gaps (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.https://hdl.handle.net/1880/118902https://doi.org/10.11575/PRISM/46499Having a priori knowledge about network flow sizes is invaluable in network traffic control. Previous efforts on estimating flow sizes have focused on long flows, where each flow is identified by a large time gap in the sequence of packets. However, many network control mechanisms such as load balancing and rate control achieve better performance when operating over flowlets, short flows that are separated by small time gaps in the sequence of packets. In this work, using extensive measurements, we investigate the feasibility of predicting the size of short flows, where the flow duration can be in the order of microseconds. Specifically, we deploy several popular workloads in a public cloud testbed, and collect both network and host traces for each workload. The network trace contains standard packet metadata, while the host trace contains high-level host statistics (e.g.,memory usage and disk I/O) and low-level function call traces (e.g.,malloc(), send()) that are captured during the execution of each workload via host instrumentation using eBPF. These traces are then used to train machine learning models for flow size prediction with varying time gaps ranging from microseconds to milliseconds. Our results indicate that: (1) It is feasible to predict short flow sizes with high accuracy, i.e., percentage error in 0-12% range, (2) the low-level traces lead to 10-20% improvement in prediction accuracy compared to using the network and high-level traces.enUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Computer NetworksMachine LearningFlowNetwork OptimizationComputer ScienceFlow Size Prediction With Short Time Gapsmaster thesis