Optimization of Training Distributed AI Workloads on Multi-node Computing Systems

Description

This project addresses the need to manage the immense computational resources required by Large-Scale Language Models (LLMs). The central objective is to develop a framework that identifies the most appropriate parallelism configuration to maximize training performance. Using the AstraSim tool and its synthetic LLM generator, the research explores the search space to recommend optimal degrees of data, tensor, sequence, and pipeline parallelism. To increase simulation fidelity, the system integrates an optimized network model that accounts for the effects of congestion, providing critical insights into bottlenecks and design trade-offs for AI researchers and hardware engineers.

Background

Mohammad obtained his Master of Science in Computer Science and Engineering from the Indian Institute of Technology Roorkee (2021–2024), sponsored by the ICCR scholarship, where he specialized in computer architecture with a thesis on the development of a dynamic warp scheduler for GPGPUs. Previously, he graduated in Computer Engineering from Tishreen University (Syria), where he received the Al Bassel certificate for academic excellence and developed a machine translation application. Professionally, he has worked as a software engineer and web developer for companies located in Syria and Dubai.

Motivation

His research is driven by hardware limitations in terms of compute, memory, and interconnect bandwidth that hinder the development of more powerful LLMs. He is motivated by the challenge of improving infrastructure utilization and aligning model architectures with the underlying hardware. By designing this parallelism recommendation framework, Mohammad seeks to make artificial intelligence training more resource-efficient, cost-effective, and accessible to the scientific community.

Mohammad Nasser

Degree in Computer Engineering and Master in Computer Architecture, currently a PhD candidate

Host Organization

Supervisors

Sergi Abadal Cavalle

UPC Supervisor

The content of this website reflects only the views of the Catedra Chip Chair UPC project.

Optimization of Training Distributed AI Workloads on Multi-node Computing Systems

Optimization of Training Distributed AI Workloads on Multi-node Computing Systems

Description

Background

Motivation

Optimizing distributed AI workload training on multinode computing systems

Simulation-Based Recommendation Framework for Scalable Training of Distributed AI

Hierarchical floorplanning optimization algorithms for System-on-Chip architectures

High Predictability Global Routing during Floorplanning of Complex Chips

Mathematical Optimization Techniques for Hierarchical Floorplanning of Complex Chips

Optimization of Training Distributed AI Workloads on Multi-node Computing Systems.

Development of an Interactive Graphical Tool for Optimization and Editing of Floorplanning in Chip Design

Development of Mathematical and Heuristic Optimization Tools for Chip Floorplanning

Optimization of Training Distributed AI Workloads on Multi-node Computing Systems

Optimization of Training Distributed AI Workloads on Multi-node Computing Systems

Description

Background

Motivation

Related projects

Optimizing distributed AI workload training on multinode computing systems

Simulation-Based Recommendation Framework for Scalable Training of Distributed AI

Hierarchical floorplanning optimization algorithms for System-on-Chip architectures

High Predictability Global Routing during Floorplanning of Complex Chips

Mathematical Optimization Techniques for Hierarchical Floorplanning of Complex Chips

Optimization of Training Distributed AI Workloads on Multi-node Computing Systems.

Development of an Interactive Graphical Tool for Optimization and Editing of Floorplanning in Chip Design

Development of Mathematical and Heuristic Optimization Tools for Chip Floorplanning