Welcome

Welcome to the Intelligent Computing Lab at Yale University! 

We are located in the Electrical & Computer Engineering Department at Dunham Labs. The lab is led by Prof. Priyadarshini Panda. Our lab’s research focuses on developing algorithm and hardware design solutions to enable energy-efficient, sustainable and ubiquitous artificial intelligence (AI) technologies. We are inspired by the brain and embrace the nature’s blueprint of integrating efficiency, robustness and adapatibility together as we shape the future of sustainable AI.

Research Interests:

1. Neuromorphic Computing 

  • Bio-inspired Spiking Neural Networks 
  • Emerging Compute-in-Memory based Hardware Accelerators (Device-Circuit-Architecture-Algorithm Co-Simulation and Optimization)

​2. Efficient Machine Learning 

  • Compression-friendly (sparsity and quantization) algorithm design for Transformers,  LLMs and foundation models  
  • Algorithm-Hardware Co-Design,  System Integration and Acceleration on FPGA, SoC & ASIC

Please check out our Research page and Publications to learn more about our research focus and recent works.


Research Highlights:

  1. GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration accepted in ICML 2025. We introduce a novel finetuning-free quantization method for compressing large-scale transformer architectures. Unlike the previous GPTQ method, which independently calibrates each layer, we always match the quantized layer’s output to the exact output in the full-precision model, resulting in a scheme that we call asymmetric calibration. Such a scheme can effectively reduce the quantization error accumulated in previous layers. [Code, Paper]

    HIGHLIGHT!!!! GPTAQ unlocks new ultra-low bit quantization. GPTAQ (previously known as GPTQv2) now supported on production ready LLM model compression/quantization toolkit (see CODE) with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

  2. MEADOW: MEMORY-EFFICIENT DATAFLOW AND DATA PACKING FOR LOW POWER EDGE LLMS presented in MLSys’25. We propose a framework that significantly reduces the off-chip memory access for LLMs with a novel token-parallel head-sequential (TPHS) dataflow and applies weight packing, that performs loss-less decomposition of large weight matrices to their unique elements thereby, reducing the enormous weight fetch latency. [Paper]

  3. Spiking Transformer with Spatial-Temporal Attention presented in CVPR 2025. Existing spike-based transformers predominantly focus on spatial attention while neglecting crucial temporal dependencies inherent in spike-based processing. We propose STAtten that introduces a block-wise computation strategy that processes information in spatial temporal chunks, enabling comprehensive feature capture while maintaining the same computational complexity as previous spatial-only approaches. [Code, Paper]

  4. PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs presented in DAC 2025. We investigate methods to accelerate GEMM operations involving packed low-precision INT weights and high-precision FP activations. Our approach co-optimizes tile-level packing and dataflow strategies for INT weight matrices. We further design a specialized FP-INT multiplier unit tailored to our packing and dataflow strategies, enabling parallel processing of multiple INT weights. [Paper]