Artificial Intelligence (AI) and Machine Learning (ML) techniques have become the de facto solution to drive human progress and more specifically, automation. In the last years, the world’s economy has been gravitating towards the AI/ML domain (from industrial and scientific perspectives) and the expectation of growth is not withering away. In more recent years, a new trend towards very large AI models has been gaining traction: Foundation Models. Large Foundation Models with billions or trillions of parameters exhibit astonishing emergent skills, especially in fields like natural language processing, vision, and human-machine interaction. Examples like BERT (110M parameters), GPT-2 (1.5B parameters), GPT-3 (175B parameters), BLOOM (176B parameters), and Wu Dao (1.75T parameters!) indicate that the size of these models will keep growing rapidly and substantially in the foreseeable future. This results in all sorts of challenges, especially with respect to provisioning the computing resources needed for training and inferencing, and calls for thorough co-design across algorithms, models, software, and hardware systems.
In this context, this edition of the CogArch workshop aims at bringing together the necessary know-how to address co-designed hardware-software architectures from a holistic point of view, tackling all their design considerations from the algorithms to platforms in all the different fields that large-scale cognitive systems will soon occupy, from natural language processing and speech to other applications like protein folding, drug discovery, computer vision or even music generation.
The CogArch workshop already had six successful editions, bringing together experts and knowledge on the most novel design ideas for cognitive systems. This workshop capitalizes on the synergy between industrial and academic efforts in order to provide a better understanding of cognitive systems and key concepts of their design.
Hardware and software design considerations are gravitating towards AI applications, as those have been proven extremely useful in a wide variety of fields, from edge computing in autonomous cars, to cloud-based computing for personalized medicine. Recent years have witnessed the emergence of AI at a very large scale: Foundation Models. Large Foundation Models with billions or trillions of parameters exhibit astonishing emergent skills, especially in fields like natural language processing, vision, content creation, and human-machine interaction. The unprecedented number of parameters in these models, though, generates all sorts of new challenges, especially with respect to provisioning the computing resources needed for training and inferencing, and calls for thorough co-design across algorithms, models, software, and hardware systems.
The CogArch workshop solicits formative ideas and new product offerings in the general space of AI systems that covers all the design aspects of cognitive systems, with particular focus this year on large-scale Foundation Models.Topics of interest include (but are not limited to):
The workshop shall consist of regular presentations and/or prototype demonstrations by authors of selected submissions. In addition, it will include invited keynotes by eminent researchers from industry and academia as well as interactive panel discussions to kindle further interest in these research topics. Submissions will be reviewed by a workshop Program Committee, in addition to the organizers.
Submitted manuscripts must be in English of up to 2 pages (with same
formatting guidelines as main
conference) indicating the type of submission: regular presentation or prototype
demonstration. Submissions should be submitted to the following
April 7th April 21st, 2023.
If you have questions regarding submission, please contact us: email@example.com
CogArch will feature a session where researchers can showcase innovative prototype demonstrations or proof-of-concept designs in the cognitive architecture space. Examples of such demonstrations may include (but are not limited to):
Large Language Models (LLMs) are shifting “what’s possible”, but require massive compute and massive complexity of distributed training across thousands of accelerators with traditional hardware. Cerebras Wafer Scale Clusters make training LLMs faster and easier compared to GPUs due to near-perfect linear scaling and simple data-parallel distribution strategy for models of any size. In this talk we will share our experience and insights from training various LLMs, including open-sourced family of Cerebras-GPT models, on the Cerebras hardware.
The rapid advancement of artificial intelligence (AI) has ushered in an era of unprecedented computational demands, necessitating continuous innovation in computing systems. In this talk, we will highlight how codesign has been a key paradigm in enabling innovative solutions and state-of-the-art performance in Google's AI computing systems, namely Tensor Processing Units (TPUs). We present several codesign case studies across different layers of the stack, spanning hardware, systems, software, algorithms, all the way up to the datacenter. We discuss how TPUs have made judicious, yet opinionated bets in our design choices, and how these design choices have not only kept pace with the blistering rate of change, but also enabled many of the breakthroughs in AI.
Foundation Model is an emerging inflection point in the creation of powerful, very high dimensional data representations, triggered by advances in AI. Foundation Models in AI are billion-parameter-scale neural networks, powered by novel architectures which are trained using a technique called self-supervision. This new paradigm imposes unprecedented opportunities and challenges across the full computing stack. Hear how IBM Research is expanding and realizing the value of Foundation Models, from building a cloud-native supercomputing infrastructure and a simplified, cloud-native common stack to train and deploy Foundation Models in an multicloud environment, to applying this full stack to enable advances in natural language domain and beyond, including time series and code generation.
The unprecedented success of large language models (LLMs) — such as Open AI's GPT-3 and GPT-4, Google's Bard, Meta's LLaMa , Cerebras-GPT and others — is emphasizing the ever-growing demand to efficiently train them. These models leverage billions to trillions of model parameters and this trend continues to increase at an unforeseen rate. The large model size makes it impossible for their parameters to fit within a single accelerator device, whose memory is usually capped at tens of GBs. Furthermore, even if we succeed to fit the model into a single device, their tremendous compute requirement leads to almost impractical training time. For example, GPT-3 consists of 175B parameters and takes 355 GPU-years to train with a single NVIDIA V100 GPU. This has led to a growing interest in distributed training, which is the idea of sharding model weights and/or data samples across multiple accelerator devices. However, this comes at the expense of communication overhead to exchange gradients and activations, and it has already become a key bottleneck for distributed training. We identify that the communication challenge will get exacerbated in future systems that are expected to leverage multi-dimensional networks with heterogeneous bandwidths due to diverse fabric technologies (e.g., chiplets, rack-scale, and scaleout). We present our recent works on (i) modeling future training platforms to identify such bottlenecks, and (ii) a novel runtime scheduling policy to enhance network bandwidth utilization.
|Sunday June 18th, 2023
(all times are Eastern Time)
|9:00 - 9:15 AM||Introduction and Welcoming Remarks|
|9:15 - 10:00 AM||Invited Talk: "Building a Cloud-Native Platform for the Future of AI: Foundation Models"
Carlos Costa (IBM Research)
|10:00 - 10:15 AM||"PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks"
Samaneh Javadinia and Amirali Baniasadi (University of Victoria)
|10:15 - 10:30 AM||"Bit Error Characterization in Fault-Prone Homomorphic Encryption Applications"
Matias Mazzanti and Esteban Mocskos (University of Buenos Aires)
|10:30 - 10:45 AM||"Object Detection and Classification on a Heterogeneous Edge SoC"
Gracen Wallace, Aporva Amarnath, Nandhini Chandramoorthy and Augusto Vega (IBM Research)
|10:45 - 11:00 AM||"Diagnosis of Sports Injuries: A Hardware-Optimized Deep Learning Solution"
|11:00 - 11:30 AM||Break|
|11:30 - 12:15 PM||Invited Talk: "Codesigning Computing Systems for Artificial Intelligence"
Suvinay Subramanian (Google)
|12:15 - 12:30 PM||"Qualitative Study of Facial Recognition Algorithm through Hardware-Software Acceleration"
Mohammed Samiulla, Prithvi Naidu, Prajwal Naidu and Advaith Jagannath (New York University)
|12:30 - 2:00 PM||Lunch|
|2:00 - 2:45 PM||Invited Talk: "Modeling and Mitigating Communication Bottlenecks for Large Model Training at Scale"
Tushar Krishna (Georgia Institute of Technology)
|2:45 - 3:30 PM||Invited Talk: "Training Large Language Models on Cerebras Wafer Scale Clusters"
Natalia Vassilieva (Cerebras Systems)
|3:30 - 4:00 PM||Break|
|4:00 - 5:00 PM||Panel: "AI Futures: Rosy, Scary or Blah?"
This panel will debate the future of AI from the view point of society in general and computer architects/scientists in particular. In many ways, the first indications of scary excitement (beyond normal science fiction giddiness) came in 2011, with Ken Jennings (human Jeopardy champion) uttering those famous words: «I for one welcome our new computer overlords!». Since then, with the steady and steep rise of AI/ML capabilities, the awe and excitement seems to be turning into downright panic, given the statements being made by some of the modern-day pioneers/catalysts of the AI/ML revolution as well as notable other scientists and intellectuals.
After the panelists provide their position statements, the floor will be open for Q&A, with questions posed from the audience.
|5:00 PM||Concluding Remarks|
Roberto Gioiosa is a senior researcher in the HPC group and lead of the Scalable and Emerging Technologies team at Pacific Northwest National Laboratory. His current research focuses on hardware/software co-design methodologies, custom AI/ML accelerator designs, and distributed software for heterogeneous systems. Currently, Dr. Gioiosa leads the DOE co-design center for AI and graph analytics (ARIAA) and leads several other co-design efforts at PNNL. In the past, Dr. Gioiosa worked at LANL, BSC, IBM Watson, and ORNL. Dr. Gioiosa holds a Ph.D. from the University of Rome “Tor Vergata”.
David Trilla is a post-doctoral Researcher at IBM T. J. Watson Research Center. He has worked on critical-embedded real-time systems and his current research interests include security and agile hardware development. He obtained his Ph.D. at the Barcelona Supercomputing Center (BSC) granted by the Polytechnic University of Catalonia (UPC), Spain.
Subhankar Pal is a Research Staff Member at IBM T. J. Watson Research Center. His research is focused on SoC design methodologies and hardware-software co-design for privacy-preserving machine learning. He holds a Ph.D. and M.S. from the University of Michigan. His Ph.D. thesis looked at designing a reconfigurable, software-defined hardware solution that balances programmability with energy efficiency. Prior to that, Subhankar was with NVIDIA, where he worked on pre-silicon verification and bring-up of multiple generations of GPUs.
Karthik Swaminathan is a Research Staff Member at IBM T. J. Watson Research Center. His research interests include power-aware architectures, domain-specific accelerators and emerging device technologies in processor design. He is also interested in architectures for approximate and cognitive computing, particularly in aspects related to their reliability and energy efficiency. He holds a Ph.D. degree from Penn State University.
Carlos Costa is a Principal Research Staff Member at IBM T. J. Watson Research Center, where he leads the effort to build a serverless, cloud-native platform for emerging AI/ML workflows. His research is mainly focused on system software, programming models and middleware for next-generation distributed systems, working at the intersection of traditional HPC and emerging distributed computing paradigms. He has been involved in multiple projects in the areas of HPC and analytics, including the BlueGene/Q system, the Active Memory Cube (AMC) architecture for in-memory processing, and DoE ORNL’s Summit and LLNL’s Sierra supercomputer systems, among other projects with clients and academic partners. He is currently the lead of IBM Research’s Foundation Model Stack.
Alper Buyuktosunoglu is a Research Staff Member at IBM T. J. Watson Research Center. He has been involved in research and development work in support of IBM Power Systems and IBM z Systems in the area of high performance, reliability and power-aware computer architectures. He holds a Ph.D. degree from University of Rochester.
Pradip Bose is a Distinguished Research Staff Member and manager of Efficient and Resilient Systems at IBM T. J. Watson Research Center. He has over thirty-three years of experience at IBM, and was a member of the pioneering RISC super scalar project at IBM (a pre-cursor to the first RS/6000 system product). He holds a Ph.D. degree from University of Illinois at Urbana-Champaign.
Augusto Vega is a Research Staff Member at IBM T. J. Watson Research Center involved in research and development work in the areas of highly-reliable power-efficient embedded designs, cognitive systems and mobile computing. He holds a Ph.D. degree from Polytechnic University of Catalonia (UPC), Spain.
Ananda Samajdar is a Research Staff Member at IBM T. J. Watson Research Center working on accelerator design and compilation/mapping strategies for DNN workloads on IBM’s RaPiD AI accelerator. He holds a Ph.D. from Georgia Tech.
CogArch will be held in conjunction with the 50th International Symposium on Computer Architecture (ISCA 2023). Refer to the main venue to continue with the registration process.
Orlando World Center Marriott
8701 World Center Dr Orlando, FL 32821