Research Overview

The Massively Parallel Systems Group (MPS) is dedicated to researching and teaching computer architecture, with a specific focus on massively parallel systems. This involves exploring the design, development, and optimization of computing systems that utilize a large number of processors to perform simultaneous computations. The group's research covers a wide range of topics, including parallel architectures (multi-/many-core processors), high-performance computing, edge computing, and the application of emerging technologies such as machine learning. The group has a special interest in GPU architecture. Additionally, MPS is committed to advancing the state-of-the-art in parallel computing through innovative hardware and software solutions that address the challenges of programmability, scalability, energy efficiency, and reliability.

If you are interested in doing research with our group, you may send an e-mail. If you are already at Hamburg University of Technology, you can also schedule an in-person meeting.

Broad Areas of Research

  • Computer architecture/GPU architecture
  • High-performance computing
  • Applied machine learning
  • Power and performance modeling
  • Memory system, memory compression, approximate computing
  • Heterogeneous computing (porting/optimizing applications)
  • Harnessing AI for post-silicon validation of SoCs
  • Energy-efficient remote sensing with edge AI
  • AI on edge using embedded GPUs
  • GPU security

Projects

Below is a list of projects we are currently working on and actively seeking funding for.

LEAP: Locality-Driven High-Performance and Energy-Efficient GPUs

Graphics Processing Units (GPUs) were initially designed for graphics applications, but their massive computational power has made them highly effective for general-purpose computing tasks such as scientific simulations and machine learning (ML). The high computational power of GPUs together with the recent explosion of data led to the rapid evolution of general-purpose computing on GPUs, making them a key computing device. Today, GPU-accelerated systems are integral to many advancements, including the success of generative artificial intelligence (AI). While GPU-accelerated systems with higher computational power and energy efficiency are desirable, the semiconductor industry is facing significant challenges in scaling performance and energy efficiency due to the end of Dennard Scaling and the slowdown of Moore’s Law. Until new technologies like quantum computing become practical, the onus is on system architects and programmers to optimize every aspect of GPU performance for sustainable computing. In this project, we aim to enhance GPU performance and energy efficiency by optimizing the memory hierarchy to better exploit data locality, particularly spatial locality, which is the key for accessing data with lower latency, lower energy, and higher bandwidth.

AutoGPUBoost: Automatic Optimization of Applications for GPUs

NVIDIA is the market leader in producing massively parallel GPUs, often called AI (artificial intelligence) chips. NVIDIA proposed many architectural improvements from the first GPU architecture (Tesla 2006) to the latest (Hopper 2023). Unfortunately, to exploit several of these improvements, applications need to be significantly rewritten. So, upgrading enterprise HPC (high performance computing) cluster with the new GPUs only yields partial performance/energy gains, which is not ideal for carbon footprint reduction. In this project, the goal is to create initial tools/software libraries for automatic application optimization, leveraging new architecture features for improved performance/ energy efficiency.

SynergeticPulse: Advancing Programmability, Scalability, and Energy Efficiency of HPC Systems

The project focuses on enhancing the programmability, scalability, and energy efficiency of high- performance computing (HPC) systems, a catalyst for enabling groundbreaking discoveries in many domains, including Artificial Intelligence (AI). While HPC systems provide massive computational power, they also consume a large amount of power ( ̃24 MW), and developing scalable, energy- efficient applications for these systems, especially those consisting of heterogeneous accelerators, remains a challenging task. The problem is exacerbated by the ever-evolving architectures of accelerators, requiring frequent optimization of applications to fully utilize available computational resources. Inefficient use of large-scale HPC systems results in the squandering of critical resources, such as computational power and energy consumption. Consequently, operating these powerful computing systems poses significant challenges in terms of energy demands and associated CO2 emissions. Thus, while HPC is part of the solution to combating many challenges, it is also part of the broader environmental problems. In this context, TUHH and UNISA will establish international cooperation, focusing on advancing the programmability, scalability, and energy efficiency of HPC systems. By leveraging the expertise at TUHH, an expert in massively parallel systems, especially Graphics Processing Units (GPUs), and cooperating with UNISA, an expert in HPC programming models and runtime support, a network will be established that will form the basis for joint research proposals.

GPU4OnboardAI: Onboard AI using Resilient-GPUs

With the increasing demands for onboard AI processing to enable several game-changing applications, researchers are looking to deploy commercial off-the-shelf (COTS) GPUs for processing compute-intensive applications in space. Although, traditionally space computing platforms are radiation-hardened (only a few vendors), the fault-tolerant requirements of low-earth orbit (LEO) where Earth observation (EO) satellites are deployed are significantly different than deep space. The COTS GPUs have a huge potential in low Earth orbit (LEO) for EO. As COTS GPUs are manufactured using newer process technology (3 nm) compared to the space technology (currently around 65 nm), it brings immense benefits in terms of processing power, energy efficiency, and availability of hardware. However, before we can deploy COTS GPUs, we need to evaluate their fault tolerance/performance for onboard AI applications. Unfortunately, no tool exists to accurately evaluate this. Therefore, in this project, we plan to develop a software tool with space-specific fault models that can run on COTS GPUs and report their resilience/performance for AI applications. Although, fault mitigation is out of the scope of this project, in the future we plan to develop software methods to also mitigate faults.

EdgeSense: Energy-efficient Remote Sensing with Edge AI

In this project, we aim to optimize hyperspectral image (HSI) classification and anomaly detection using edge GPUs, including the Jetson Nano, Xavier, and TX2 platforms. The datasets utilized in this research encompass Earth Observation images captured by airborne and satellite sensors. By leveraging advanced convolutional neural networks (CNNs) and transfer learning, we address the challenge of limited training datasets while ensuring energy efficiency, which is crucial for real-time satellite imaging, land cover mapping, and environmental monitoring. Our key methodologies include implementing various classification algorithms on edge devices, applying Principal Component Analysis (PCA) for noise reduction, and conducting a comprehensive analysis of hardware efficiency. Additionally, we integrate anomaly detection within HSI processing to identify unusual patterns and materials, thereby enhancing environmental monitoring and disaster response capabilities. This project aims to significantly improve the accuracy and energy efficiency of HSI processing on resource-constrained devices, marking a substantial advancement in remote sensing technology and environmental science.

AI4Validation: Harnessing AI for Post-Silicon Validation of SoCs

System-on-Chip (SoC) technology is a driving force behind the growth and advancement of various digital technologies, including smartphones, cyber-physical systems (CPS), and more. The demand for different kinds of SoCs has seen massive growth over the past decade. To ensure the reliable functionality of SoC devices, post-silicon validation plays a pivotal role. This is one of the most intricate and costly stages of the SoC design cycle, primarily because the post-silicon validation process generates a large amount of data such as oscilloscope images, log and trace data and there is pressure to reduce the post-silicon validation time amidst fierce market competition. As there is a large amount of data that needs to be processed, we study AI-powered methods to automatically detect anomalies in the post-silicon validation data, which provides several benefits including a reduction in validation time, errors, and accelerated time-to-market. This is a collaborative research between the Massively Parallel Systems Group, TUHH, the Smart Sensors Group, TUHH, and NXP Semiconductors Hamburg.

Past Projects