This project establishes a Research Experiences for Undergraduates (REU) site in Computer Science and Engineering at the University of Louisville (UofL) specifically focusing on Computer Systems Research. This site hosts 10 students in 10-week summer sessions every year for the project duration of three years with the ultimate goal of encouraging 30 undergraduate students to pursue graduate study and careers in computer systems. The activities include training in computer systems concepts and tools, introducing computer systems research, multi-level mentoring through weekly meetings with the PIs, daily meetings with the mentors, and continuous interaction with graduate students, as well as field trips and social events. Eight faculty from the Computer Science and Engineering Department with diverse expertise in computer systems research mentor the students and involve them in their research labs and ongoing projects. Specialized computer systems training provides the students with theoretical, technical, and analytical skills in general computer systems concepts including computer architecture fundamentals, operating systems basics, crucial systems programming methods, Linux command line, shell & python scripting, and HPC cluster access and usage. In addition, provided research training introduces the general computer systems research process including literature review, understanding prior work, formulating new problems, designing and conducting experiments, and preparing presentations and publications of results. At the end, the students are encouraged to publish their research work and present it in a professional conference.
The research focus of this REU Site is Computer Systems Research (CSR), which broadly concentrates on the design and development of hardware and software necessary to create platforms that meet users’ computing requirements. In NSF, CSR has its own cluster under CISE’s CNS division, and its scope is broadly defined as embedded and multicore systems and accelerators; mobile and extensible distributed systems; cloud and data-intensive processing systems; and memory, storage, and file systems. Aligned with NSF’s definition, the main research pillars of our CSR focus are Operating Systems; Parallel and Distributed Systems; Embedded Systems; Mobile Systems; and Computer Architecture. Therefore, our research projects include a broad range of topics, spanning from processor architecture to storage systems, and from the small-scale mobile and edge devices to the exascale data centers. Nevertheless, all the research projects in this REU site share the common computing systems theme, enabling students to understand each other’s work and broaden their perspectives of different computer systems. Within this CSR scope, our site’s general research goals are to advance energy efficiency, time predictability, performance, as well as security & privacy, scalability, and sustainability of computer systems.
IN SUMMER 2022, ALL ACTIVITIES WILL BE ONLINE DUE TO THE IMPACT OF THE OMICRON VARIANT OF COVID-19!
May 15, 2022
Students check in electronically.
May 16-20, 2022 (Week 1)
May 23 – May 27, 2022 (Week 2)
May 30 – July 15, 2022 (Weeks 3-9)
Same as Week 2, with additional virtual social events.
July 18 – July 22, 2022 (Week 10)
Final Research Poster Competition Week: Same as Week 2, plus a final technical report and a final research presentation by the students. REU students will participate in the virtual poster competition. End-of-program survey for both students and mentors and exit interviews for students.
Sample Project Description:
Energy-Aware Optimization of Operating System Kernels for Ultra-Low Latency Storage Performance
The operating system kernel is a complex piece of software that sits between the applications of a computer and its hardware, with the main goal of providing convenient access to computer hardware in an efficient and fair manner. From personal computers to servers and supercomputers, embedded systems and robots to mobile devices like phones and tablets, operating systems are heavily used to manage a variety of computer systems in today’s world.
Since computer hardware is constantly improving and innovating, operating system kernels need continuous optimizations in order to maintain their efficiency. Today’s operating systems urgently require optimization in their storage stack, which manages access to data storage devices. Various innovations have taken place in the storage subsystem within the past few years, including wide adoption of the Non-Volatile Memory Express (NVMe) interface  and the emergence of new generation of Ultra-Low Latency (ULL) SSDs, which are broadly defined as providing sub-10 μs data access latency . This new level of ULL storage device performance questions the suitability of existing kernel storage stack designs that are primarily optimized for older storage generations with higher data access latencies, such as flash-based SSDs and HDDs, with one and three orders of magnitude slower data access latencies, respectively.
Efficiency of an operating system is generally measured in terms of its performance, where the energy impact of the optimizations is commonly neglected. In other words, energy efficiency is typically regarded as a second class citizen in operating system design. However, today’s data centers consume as much electricity as a city . In addition, the energy efficiency of an operating system managing a battery-operated device such as a mobile phone or tablet can significantly affect the battery life of that device. Therefore, any optimization that is performed on an operating system kernel should be performed in an energy-aware manner.
In this REU project, the undergraduate student(s) will investigate, analyze, and optimize the Linux kernel’s storage stack for ULL storage performance in an energy-aware manner. Each student will be assigned a specific storage stack section to investigate, such as submission, scheduling, or completion, and perform empirical experiments using real hardware: an Intel Optane SSD  (a type of ULL SSD) in a dedicated server with an Onset HOBO Power meter  to measure energy consumption. Using this experimental setup, students will first investigate and understand the working principles of their assigned storage stack layer. Next, they will experimentally analyze the efficiency of existing methodologies within their layers in an energy-aware manner, and make observations about their deficiencies. Finally, based on their observations, students are expected to propose and implement possible optimizations to eliminate these observed deficiencies. Students will be trained on kernel development and will work in collaboration with other lab students experienced in operating system design and development.
Sample Project Description:
KNOWLEDGE BASE ANALYSIS
Knowledge Bases (KBs) store statements about the world in a format that can be used for reasoning, like RDF. Moreover, KBs organize all their information into hierarchies, relying on some taxonomy or ontology. YAGO is an example of modern Knowledge Base, with facts extracted from Wikipedia and organized using the schemas from WordNet; it has recently been expanded to use schema.org. YAGO can be downloaded from http://www.yago-knowledge.org; the YAGO taxonomy is stored in a separate file.
The purpose of this project is to analyze YAGO’s taxonomy. Several steps are involved: first, we want to create a file with simplified format, i.e. to go from
(in yago-wd-full-types) and
(in yago-wd-class) to
Germany type Country
Country subclass AdministrativeArea
Then, using this simplified file, we want to implement an algorithm to check for two classes ‘a’ and ‘b’ whether ‘a subclassOf b’ holds, directly or indirectly (this means applying transitivity and reflexivity to the original relation) and to determine (using the previous step), for entity ‘c’ and class ‘b’, whether ‘a type c’ holds (for instance, above we conclude that
‘Germany type AdministrativeArea’).
In addition, we want to relate this to the information in the rest of YAGO, where type is sometimes implicit, as in
<1st_Air_Corps_(Germany)> <foundingDate> <1939>
by finding all the cases where the type is included in the tag. Finally, we want to check how the hierarchy deals with events and actions; for instance, the RDF tuple
is included in YAGO, but not the fact that the 1887 U.S. National Championship is a type of ‘tennis tournament’.
SQL ON THE COMMAND LINE
The Unix/Linux command line has many single-purpose operators that can be combined using pipe or input/output redirection to implement complex operations. A student of mine was able to implement basic SQL queries (SELECT … FROM … WHERE … statements) to run directly on files by translating the query to a pipeline of command line operations. However, the student did not have the time to test the command thoroughly or evaluate its performance. The purpose of this project is to do exactly this. The implementation is in Python. A second, more ambitious part is to prepare the program so that it can be incorporated into any Linux distribution that wants to do so.
DATA ANALYSIS ON THE COMMAND LINE
The Unix/Linux command line is a great tool for analyzing files; however, it has a few shortcomings that make it unable to tell us much about the data inside those files. One of them is the fact that, in a text file, numbers are treated as strings, so no arithmetic is possible. Hence, we cannot ask for the average of a column of numbers (we can simulate this with some effort). In R, a data file can be loaded and the system automatically recognizes numbers; the ‘summary’ command produces an analysis of the data according to its type. A student of mine developed a ‘summary’ command for the Unix/Linux command line. However, the command needs further testing and refinement. The goal is to have a command that will take a best guess as the type of data in a text file, classifying different variables (attributes, features) as numerical, dates, factors (categorical), and provide a summary for each. The command should be as efficient as possible and deal with files of all sizes.
Sample Project Description:
Compressed Neural Networks for Multiscale Computing in Resource-constrained Systems
Modern intelligent and mobile autonomous systems are equipped with many sensors which are processed on a single-board embedded computing platform. These systems often are battery powered with a finite energy budget and they communicate with other devices for data/computation sharing over wireless networks. The constraint in computing, communication and energy, has led to the development of multiscale computing where data or computation can be opportunistically offloaded to an edge/fog server or to a cloud server . However, if communication resources are insufficient then offloading data or computation in raw form might not work and it necessitates compression of the data and/or computing algorithm. Additionally, with the boom of the Internet-of-things (IoT), devices are becoming lightweight which led to the development of TinyML  framework to support machine learning algorithms in small embedded boards and even on microcontrollers. However, depending on the application needs, the resource availability in the system and the constraints of the available machine learning framework, the deep learning algorithms need to be compressed to optimize the quality-of-service (QoS) of the applications.
In this REU project, the undergraduate student will work on several Deep Neural Networks based applications for various sensor processing and investigate the following:
Sample Project Description:
Using Python and its Libraries (e.g., PyTorch) for Big Data and High-Performance Computing Applications
There are two data sets that can be used for such projects as described below:
It is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including:
The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face recognition, face detection, landmark (or facial part) localization, and face editing & synthesis.
The data set includes 888 lung CT images and Annotation and Candidates csv files about the nodules present in the CT images. Sample projects of using the data set include:
Graph Databases and Graph Analytics
Based on graph theory and algorithms, graph databases and graph analytics are becoming popular and effective tools to analyze connected data and link them to machine learning algorithms . A typical graph database and graph analytics project is given a dataset and study its applications by following this workflow:
There are two specific datasets identified here for this REU project, which are briefly described in the following sections.
MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital” (). This dataset is public accessible and can be used to study health care related issues.
The Yelp dataset is a subset of our businesses, reviews, and user data for use in personal, educational, and academic purposes. Available as JSON files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps” (https://www.yelp.com/dataset). For example,  provides a study of Yelp 2018 open dataset using graph database and graph analytics.
The complete Covid-19 dataset is a collection of the Covid-19 data maintained by Our World in Data (Our World in Data). It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest. The complete Covid-19 dataset can be downloaded in three file formats, CSV, XLSX, and JSON, from this link: covid-19-data/public/data at master · owid/covid-19-data · GitHub.
Sample Project Description:
Cybersecurity Challenges in Internet of Things (IoT) Applications
The internet of things has been a revolutionizing technology in the provision of healthcare, smart city, manufacturing and many other applications/services around the world. Enabled by a host of different technologies, the internet of things has spearheaded the development of many technologies for sustainable living, increased comfort and productivity for citizens as well as efficient operations for urban centers. IoT applications are enabled by collecting data through sensors within an application domain. The collected data is then processed followed by data mining to extract domain knowledge. However, there are several issues that pertain to how this data is sent and used. These issues include integrity, protection, and confidentiality of the data. In fact, this concern is not unwarranted as illustrated in the 2015 attack on the Ukranian power grid which left 225,000 people without power. Data gathered in IoT applications can be used to perform many undesired acts. This information may be used by bad actors to carry out unlawful acts causing risk to life and property. To secure the Internet of Things data and applications, typical security schemes might not be as effective and new approaches may need to be developed. In order to provide a standardized framework and terminology for discussing security attacks, attack incident taxonomy is suggested by the Computer Emergency Response Team (CERT) which was established by DARPA. There are different types of security and privacy issues in IoT applications. They exist on each of the three levels of the IoT architecture including application software layers, network layer and the perception layer. This research will map CERT attack taxonomy to IoT applications. In addition, the research provides holistic security solution approaches at all the three layers of IoT applications.
Sample Project Description:
Embedded Systems and Accelerators
Embedded systems and microcontrollers are so ubiquitous in their application breadth that virtually all categories of devices and components seem to carry them. From disposable conveniences to durable goods and control systems, microcontrollers and embedded-class devices are everywhere. Thanks to significant efforts by microcontroller core designers such as Espressif, Atmel/Microchip, and ARM, anyone with a minimal amount of programming skill can implement basic system programming architectures with easy-to-use and friendly Integrated Development Environments (IDEs), some of which can even be run in a browser, short-circuiting traditional frustrations of importing libraries and dependencies . However, not all IDEs and basic implementations are created equal, especially with respect to understanding power consumption and sleep states. As embedded devices are produced by the billions each year , even infinitesimally-minute differences can scale to produce enormous consequences when considering the collective carbon footprint of these devices. In respect to this, the student will:
Additionally, these microcontroller architectures now feature fully-integrated communications modules, also easily integrated into an IDE through manufacturer-provided libraries. Examples include Espressif ESP 8266 and ESP32-based microcontroller implementations that feature WiFi, Bluetooth, BLE, and LoRa radios [3-5]. It is not uncommon for prototyping boards and even full-fledged devices to feature multiple radio types. It is also increasingly common that device-to-device mesh networks can provide some relief to infrastructure-based networks by distributing communications distances and loads between nearby devices . With reference to carbon footprint and energy consumption, the student will be provided with an Espressif ESP32-based development board with Bluetooth, BLE, WiFi, and LoRa radios to perform the following tasks:
This work will assist in developing metrics and guidelines that can have a large cumulative impact by generating an energy consumption mapping of how microcontroller devices contribute to energy consumption based on configuration and radio usage.
Sample Project Description:
Explainable Machine Learning Algorithms
The most accurate Big Data predictive methods tend to rely on black-box machine learning models, such as Deep Learning, which lack interpretability and do not provide a straightforward explanation for their outputs. Yet explanations can improve the transparency of a predictive system by justifying predictions, and this in turn can enhance the user’s trust in the system. Hence, one main challenge in designing a machine learning model is mitigating the trade-off between an explainable technique with moderate prediction accuracy and a more accurate technique with no explainable predictions.
Explanation mechanisms play a critical role in building trust into human-machine learning algorithm interaction and can therefore contribute to building ‘fair’ machine learning algorithms. Fair algorithms make predictions that are not biased against any group of people. In general most unfair models are the result of biased data. The rationale for ensuring fairness through explanations is that an unfair or biased prediction would be easier to detect when an explanation is provided along with the prediction.
In this project, an REU student will investigate a set of white box and black box machine learning predictive models and methods to build in explanation ability and fairness in black box methods. Experimental results should demonstrate that the developed method is effective in generating accurate and explainable predictions.
Sample Project Description:
Large Scale Document Inversion using GPU Parallelization
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Here I propose to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. The proposed parallel document inversion system will be implemented and evaluated using different test datasets from PubMed abstract and e-commerce product reviews. Junior level or senior level of undergraduate students with parallel programming experience will be a good fit for this project.
Sample Project Description:
Enabling Data-Intensive Computing on the Web with NSF XSEDE and NSF MRI
As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of data-intensive computing and analytics. Dr. Hui Zhang in NSF #1726532 has partnered with Texas Advanced Computing Center (TACC) to investigate IDOLS, a web application framework that allows users to customize the data-intensive workflows through simple configuration files as shown in Figure 1.
To enhance the accessibility of large cyberinfrastructure to users from diverse domain fields, the IDOLS framework includes a set of pre-built task modules built at the University of Louisville and TACC to help bridge national researchers with remote hardware and software resources for data-intensive analytics and data-intensive computing. Utilizing this platform, REU students are expected to:
Intellectual merit of the proposed project is embodied in a well-designed and implemented cloud-based virtual laboratory (CVE) to introduce useful software tools for data-intensive computing and high-performance computing, and to lower the access barrier of state-of-the-art computing resources for students in STEM program and others whose research requires data-intensive analytics. Through this REU project, students will learn how to use data science tools and resources in their own projects. The summer research activity will give students a taste of the most compelling parts of a graduate school experience as well as an immersive experience in the field of high-performance computing and data-intensive computing. This research project will cover a breadth of skill sets in computing science. The students will gain experience in all aspects of the process of conducting research in a research lab. The summer project will help to address national interests by making state-of-the-art computing resources more accessible to students and teachers, supporting their development of critical workforce skills.
Sample Project Description:
Design Space Exploration for Computer Architecture Support for Deep Neural Networks
Deep neural networks (DNNs) have been widely used for many AI applications including computer vision, speech recognition, self-driving cars, smartphones, drones, etc. While DNNs deliver high accuracy to enable AI in our daily lives, it comes at the cost of high computational complexity and intensive resource requirement . These make it especially challenging to apply DNNs to edge and mobile devices, which have relatively limited hardware resources with stringent cost and/or battery constraints. While there are general-purpose and specialized computing architectures available for DNN such as TPU and GPU, they need to be customized to be used efficiently in edge and mobile domains for meeting the stringent performance, energy, and cost goals.
In this project, the student will explore the architectural design space of computing platforms for supporting DNNs. A cycle-accurate architectural simulator  and/or analytic models will be used to evaluate the compute and memory bandwidth characteristics of different DNN applications and models. The student will evaluate the impact of different architectural designs such as on-chip network and memory hierarchy on performance, energy, and cost, and explore the most efficient computing architectures for running DNNs in a resource-constrained environment.
|Application Deadline||Friday, March 18, 2022|
|Notification of Selection||Friday, April 1, 2022|
|REU Program Starts||Monday, May 16, 2022|
|Final Day of REU Program||Friday, July 22, 2022|
Women, underrepresented minorities, and students from institutions with limited research opportunities are especially encouraged to apply!