Artificial Intelligence (AI) and Machine Learning (ML) techniques have become the de facto solution to drive human progress and more specifically, automation. In the last years, the world’s economy has been gravitating towards the AI/ML domain (from industrial and scientific perspectives) and the expectation of growth is not withering away. In more recent years, a new trend towards very large AI models has been gaining traction: Foundation Models. Large Foundation Models with billions or trillions of parameters exhibit astonishing emergent skills, especially in fields like natural language processing, vision, and human-machine interaction. Examples like BERT (110M parameters), GPT-2 (1.5B parameters), GPT-3 (175B parameters), BLOOM (176B parameters), and Wu Dao (1.75T parameters!) indicate that the size of these models will keep growing rapidly and substantially in the foreseeable future. This results in all sorts of challenges, especially with respect to provisioning the computing resources needed for training and inferencing, and calls for thorough co-design across algorithms, models, software, and hardware systems.

In this context, this edition of the CogArch workshop aims at bringing together the necessary know-how to address co-designed hardware-software architectures from a holistic point of view, tackling all their design considerations from the algorithms to platforms in all the different fields that large-scale cognitive systems will soon occupy, from natural language processing and speech to other applications like protein folding, drug discovery, computer vision or even music generation.

The CogArch workshop already had six successful editions, bringing together experts and knowledge on the most novel design ideas for cognitive systems. This workshop capitalizes on the synergy between industrial and academic efforts in order to provide a better understanding of cognitive systems and key concepts of their design.

Call for Papers

Hardware and software design considerations are gravitating towards AI applications, as those have been proven extremely useful in a wide variety of fields, from edge computing in autonomous cars, to cloud-based computing for personalized medicine. Recent years have witnessed the emergence of AI at a very large scale: Foundation Models. Large Foundation Models with billions or trillions of parameters exhibit astonishing emergent skills, especially in fields like natural language processing, vision, content creation, and human-machine interaction. The unprecedented number of parameters in these models, though, generates all sorts of new challenges, especially with respect to provisioning the computing resources needed for training and inferencing, and calls for thorough co-design across algorithms, models, software, and hardware systems.

The CogArch workshop solicits formative ideas and new product offerings in the general space of AI systems that covers all the design aspects of cognitive systems, with particular focus this year on large-scale Foundation Models.

Topics of interest include (but are not limited to):
  • Hardware support for state-of-the-art AI models
  • Hardware-software co-design and acceleration of AI models
  • Parallelization strategies for AI models (e.g. transformers)
  • Accelerators and micro-architectural support for AI
  • Reliability and safety considerations, and security against adversarial attacks in cognitive architectures
  • Techniques for improving energy efficiency of AI applications, and battery life extension and endurance in mobile AI architectures
  • AI/ML for fast system modeling and AI/ML as design methodology
  • Leveraging 2.5D/3D chiplet designs, wafer scaling and other heterogeneous integration techniques for designing scalable architectures for Foundation Models
  • Privacy-preserving inference on AI models
  • Prototype demonstrations in specific application domains: e.g., natural language processing and speech, protein folding, drug discovery, computer vision, code generation, music making, as well as applications of interest to defense and homeland security

The workshop shall consist of regular presentations and/or prototype demonstrations by authors of selected submissions. In addition, it will include invited keynotes by eminent researchers from industry and academia as well as interactive panel discussions to kindle further interest in these research topics. Submissions will be reviewed by a workshop Program Committee, in addition to the organizers.

Submitted manuscripts must be in English of up to 2 pages (with same formatting guidelines as main conference) indicating the type of submission: regular presentation or prototype demonstration. Submissions should be submitted to the following link by April 7th April 21st, 2023.
If you have questions regarding submission, please contact us: info@cogarchworkshop.org

Call for Prototype Demonstrations

CogArch will feature a session where researchers can showcase innovative prototype demonstrations or proof-of-concept designs in the cognitive architecture space. Examples of such demonstrations may include (but are not limited to):

  • Custom ASIC or FPGA-based demonstrations of machine learning, cognitive or neuromorphic architectures.
  • Innovative implementations of state-of-the-art cognitive algorithms/applications, and the underlying software-hardware co-design techniques.
  • Demonstration of end-to-end cognitive systems comprising of edge devices backed by a cloud computing infrastructure.
  • Novel designs showcasing the adoption of emerging technologies for the design of cognitive systems.
  • Tools or frameworks to aid analysis, simulation and design of cognitive systems.
Submissions for the demonstration session may be made in the form of a 2-page manuscript highlighting key features and innovations of the prototype demonstration. Proposals accepted for demonstration during the workshop can be accompanied by a poster/short presentation. Authors should explicitly indicate that the submission is for prototype demonstration at submission time.

Important Dates

  • Paper submission deadline: April 7th April 21st, 2023
  • Notification of acceptance: May 8th, 2023
  • Workshop date: June 18th, 2023

Program Committee

  • Roberto Gioiosa, Pacific Northwest National Laboratory
  • David Trilla, IBM Research
  • Subhankar Pal, IBM Research
  • Karthik Swaminathan, IBM Research
  • Carlos Costa, IBM Research
  • Alper Buyuktosunoglu, IBM Research
  • Pradip Bose, IBM Research
  • Augusto Vega, IBM Research
  • Ananda Samajdar, IBM Research

YouTube Channel

Paper Submission Deadline
April 21st, 2023

Notification Date
May 8th, 2023

Workshop Date
June 18th, 2023

Invited Speakers:

Training Large Language Models on Cerebras Wafer Scale Clusters

Natalia Vassilieva (Sr. Director of Product, Machine Learning - Cerebras Systems)

Large Language Models (LLMs) are shifting “what’s possible”, but require massive compute and massive complexity of distributed training across thousands of accelerators with traditional hardware. Cerebras Wafer Scale Clusters make training LLMs faster and easier compared to GPUs due to near-perfect linear scaling and simple data-parallel distribution strategy for models of any size. In this talk we will share our experience and insights from training various LLMs, including open-sourced family of Cerebras-GPT models, on the Cerebras hardware.

Suvinay Subramanian (Senior Software Engineer - Google)

Talk details to be announced soon!

Building a Cloud-Native Platform for the Future of AI: Foundations Models

Carlos Costa (Principal Research Staff Member - IBM)

Foundation Model is an emerging inflection point in the creation of powerful, very high dimensional data representations, triggered by advances in AI. Foundation Models in AI are billion-parameter-scale neural networks, powered by novel architectures which are trained using a technique called self-supervision. This new paradigm imposes unprecedented opportunities and challenges across the full computing stack. Hear how IBM Research is expanding and realizing the value of Foundation Models, from building a cloud-native supercomputing infrastructure and a simplified, cloud-native common stack to train and deploy Foundation Models in an multicloud environment, to applying this full stack to enable advances in natural language domain and beyond, including time series and code generation.

Modeling and Mitigating Communication Bottlenecks for Large Model Training at Scale

Tushar Krishna (Associate Professor - Georgia Institute of Technology)

The unprecedented success of large language models (LLMs) — such as Open AI's GPT-3 and GPT-4, Google's Bard, Meta's LLaMa , Cerebras-GPT and others — is emphasizing the ever-growing demand to efficiently train them. These models leverage billions to trillions of model parameters and this trend continues to increase at an unforeseen rate. The large model size makes it impossible for their parameters to fit within a single accelerator device, whose memory is usually capped at tens of GBs. Furthermore, even if we succeed to fit the model into a single device, their tremendous compute requirement leads to almost impractical training time. For example, GPT-3 consists of 175B parameters and takes 355 GPU-years to train with a single NVIDIA V100 GPU. This has led to a growing interest in distributed training, which is the idea of sharding model weights and/or data samples across multiple accelerator devices. However, this comes at the expense of communication overhead to exchange gradients and activations, and it has already become a key bottleneck for distributed training. We identify that the communication challenge will get exacerbated in future systems that are expected to leverage multi-dimensional networks with heterogeneous bandwidths due to diverse fabric technologies (e.g., chiplets, rack-scale, and scaleout). We present our recent works on (i) modeling future training platforms to identify such bottlenecks, and (ii) a novel runtime scheduling policy to enhance network bandwidth utilization.


Sunday June 18th, 2023
(all times are Eastern Time)
9:00 - 9:15 AM Introduction and Welcoming Remarks
9:15 - 10:00 AM Invited Talk: "Building a Cloud-Native Platform for the Future of AI: Foundations Models"
Carlos Costa (IBM Research)
10:00 - 10:15 AM "PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks"
Samaneh Javadinia and Amirali Baniasadi (University of Victoria)
10:15 - 10:30 AM "Bit Error Characterization in Fault-Prone Homomorphic Encryption Applications"
Matias Mazzanti and Esteban Mocskos (University of Buenos Aires)
10:30 - 10:45 AM "Object Detection and Classification on a Heterogeneous Edge SoC"
Gracen Wallace, Aporva Amarnath, Nandhini Chandramoorthy and Augusto Vega (IBM Research)
10:45 - 11:00 AM "Diagnosis of Sports Injuries: A Hardware-Optimized Deep Learning Solution"
Ronak Das
11:00 - 11:30 AM Break
11:30 - 12:15 PM Invited Talk: to be announced
Suvinay Subramanian (Google)
12:15 - 12:30 PM "Qualitative Study of Facial Recognition Algorithm through Hardware-Software Acceleration"
Mohammed Samiulla, Prithvi Naidu, Prajwal Naidu and Advaith Jagannath (New York University)
12:30 - 2:00 PM Lunch
2:00 - 2:45 PM Invited Talk: "Modeling and Mitigating Communication Bottlenecks for Large Model Training at Scale"
Tushar Krishna (Georgia Institute of Technology)
2:45 - 3:30 PM Invited Talk: "Training Large Language Models on Cerebras Wafer Scale Clusters"
Natalia Vassilieva (Cerebras Systems)
3:30 - 4:00 PM Break
4:00 - 5:00 PM Panel: "AI Futures: Rosy, Scary or Blah?"
(details to be announced)
5:00 PM Concluding Remarks

Past Editions:


Roberto Gioiosa is a senior researcher in the HPC group and lead of the Scalable and Emerging Technologies team at Pacific Northwest National Laboratory. His current research focuses on hardware/software co-design methodologies, custom AI/ML accelerator designs, and distributed software for heterogeneous systems. Currently, Dr. Gioiosa leads the DOE co-design center for AI and graph analytics (ARIAA) and leads several other co-design efforts at PNNL. In the past, Dr. Gioiosa worked at LANL, BSC, IBM Watson, and ORNL. Dr. Gioiosa holds a Ph.D. from the University of Rome “Tor Vergata”.

David Trilla is a post-doctoral Researcher at IBM T. J. Watson Research Center. He has worked on critical-embedded real-time systems and his current research interests include security and agile hardware development. He obtained his Ph.D. at the Barcelona Supercomputing Center (BSC) granted by the Polytechnic University of Catalonia (UPC), Spain.

Subhankar Pal is a Research Staff Member at IBM T. J. Watson Research Center. His research is focused on SoC design methodologies and hardware-software co-design for privacy-preserving machine learning. He holds a Ph.D. and M.S. from the University of Michigan. His Ph.D. thesis looked at designing a reconfigurable, software-defined hardware solution that balances programmability with energy efficiency. Prior to that, Subhankar was with NVIDIA, where he worked on pre-silicon verification and bring-up of multiple generations of GPUs.

Karthik Swaminathan is a Research Staff Member at IBM T. J. Watson Research Center. His research interests include power-aware architectures, domain-specific accelerators and emerging device technologies in processor design. He is also interested in architectures for approximate and cognitive computing, particularly in aspects related to their reliability and energy efficiency. He holds a Ph.D. degree from Penn State University.

Carlos Costa is a Principal Research Staff Member at IBM T. J. Watson Research Center, where he leads the effort to build a serverless, cloud-native platform for emerging AI/ML workflows. His research is mainly focused on system software, programming models and middleware for next-generation distributed systems, working at the intersection of traditional HPC and emerging distributed computing paradigms. He has been involved in multiple projects in the areas of HPC and analytics, including the BlueGene/Q system, the Active Memory Cube (AMC) architecture for in-memory processing, and DoE ORNL’s Summit and LLNL’s Sierra supercomputer systems, among other projects with clients and academic partners. He is currently the lead of IBM Research’s Foundation Model Stack.

Alper Buyuktosunoglu is a Research Staff Member at IBM T. J. Watson Research Center. He has been involved in research and development work in support of IBM Power Systems and IBM z Systems in the area of high performance, reliability and power-aware computer architectures. He holds a Ph.D. degree from University of Rochester.

Pradip Bose is a Distinguished Research Staff Member and manager of Efficient and Resilient Systems at IBM T. J. Watson Research Center. He has over thirty-three years of experience at IBM, and was a member of the pioneering RISC super scalar project at IBM (a pre-cursor to the first RS/6000 system product). He holds a Ph.D. degree from University of Illinois at Urbana-Champaign.

Augusto Vega is a Research Staff Member at IBM T. J. Watson Research Center involved in research and development work in the areas of highly-reliable power-efficient embedded designs, cognitive systems and mobile computing. He holds a Ph.D. degree from Polytechnic University of Catalonia (UPC), Spain.

Ananda Samajdar is a Research Staff Member at IBM T. J. Watson Research Center working on accelerator design and compilation/mapping strategies for DNN workloads on IBM’s RaPiD AI accelerator. He holds a Ph.D. from Georgia Tech.


CogArch will be held in conjunction with the 50th International Symposium on Computer Architecture (ISCA 2023). Refer to the main venue to continue with the registration process.

Event Location

Orlando World Center Marriott
8701 World Center Dr Orlando, FL 32821

Check main venue site for more information.