Haseeb Shah

Ph.D. Student · Reinforcement Learning

I am a Ph.D. student at the RLAI lab at the University of Alberta, advised by Prof. Martha White. My research focuses on foundational topics in reinforcement learning — policy gradient algorithms, general value functions, and recurrent learning — as well as applying reinforcement learning to real-world physical systems, such as automating drinking water-treatment plants and controlling laser wakefield accelerators.

Curriculum Vitae

Education

University of AlbertaSep 2024 – Present

Ph.D. in Computer Science

Ph.D. Recruitment Scholarship

Supervisor: Martha White

Teaching Assistant: AI Everywhere

University of AlbertaJan 2021 – Jan 2023

M.Sc. in Computer Science

CGPA 4.00/4.00

Supervisor: Martha White

Teaching Assistant: Intermediate Machine Learning, Introduction to CS

Thesis Slides Code

National University of Sciences and TechnologySep 2015 – May 2019

B.E. in Software Engineering

Merit Scholarship for High Achievers (awarded each semester)

DAAD Summer Research Scholarship, Germany

1st Place, ACM Digital Design Competition

Supervisor: Faisal Shafait

Teaching Assistant: Computer Networks; PPRS Autumn School on Deep Learning

Paper Code

Experience

Advanced Laser Light Source Laboratory, INRSSep 2025 – Oct 2025

RL Researcher·Montreal, Canada

RL Core TechnologiesJan 2024 – Aug 2024

Machine Learning Intern·Canada

University of AlbertaFeb 2023 – Jan 2024

Research Assistant (Full-time)·Canada

LAVIS Lab, Hochschule RheinMainJan 2020 – Dec 2020

Research Assistant·Remote

DCube Tech.May 2020 – Aug 2020

Machine Learning Engineer·Pakistan

LAVIS Lab, Hochschule RheinMainJun 2018 – Sep 2018

DAAD Research Intern·Germany

TUKL-NUST R&D CenterJun 2017 – Dec 2017

Undergraduate Research Intern·Pakistan

Under Review

Deconstructing Actor-Critic: A Large-Scale Empirical Study of Design Components for Practitioners

PNAS Under Review

H. Shah, L. Zhu, A. White, M. White

Submitted to the Proceedings of the National Academy of Sciences (PNAS)

We run and analyze over 33,000 experiments on a task derived from a real-world system to identify how actor-critic components affect performance and stability in deployment, highlighting that many existing defaults are relatively more unstable.

Stable Deployment in Offline-to-Online RL: Mitigating Degradation during Continual Fine-Tuning

RLC 2026 Under Review

H. Wang, H. Shah, A. White, M. White

Submitted to the Continual RL Workshop at the Reinforcement Learning Conference (RLC), 2026

We introduce a new offline-to-online fine-tuning algorithm that gradually allows more exploration based on off-policy estimates of performance. We also propose new metrics to properly measure performance degradation.

Symmetric Behavior Regularized Policy Optimization

NeurIPS 2026 Under Review

L. Zhu, H. Shah, C. Zheng, N. Yukie, M. White

Submitted to the Conference on Neural Information Processing Systems (NeurIPS), 2026

We study symmetric divergences for behavior-regularized policy optimization and propose Symmetric f-Actor Critic, which avoids the per-environment failures encountered by other offline reinforcement learning methods.

Publications

q-Exponential Family for Policy Optimization

ICLR 2025

L. Zhu*, H. Shah*, H. Wang*, M. White* Equal contribution

In Proceedings of the International Conference on Learning Representations (ICLR), 2025

@inproceedings{ICLR2025_6507b115,
 author = {Zhu, Lingwei and Shah, Haseeb and Wang, Han and Nagai, Yukie and White, Martha},
 booktitle = {International Conference on Representation Learning},
 editor = {Y. Yue and A. Garg and N. Peng and F. Sha and R. Yu},
 pages = {40717--40744},
 title = {q-exponential family for policy optimization},
 url = {https://proceedings.iclr.cc/paper_files/paper/2025/file/6507b115562bb0a305f1958ccc87355a-Paper-Conference.pdf},
 volume = {2025},
 year = {2025}
}

We explore the effectiveness of q-exponential policies in policy optimization methods, finding that heavy-tailed policies (q > 1) are generally more effective and can consistently outperform the Gaussian policy.

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

JMLR 2023

K. Javed, H. Shah, R. Sutton, M. White

Journal of Machine Learning Research (JMLR), 2023

@article{javed2023scalable,
  title={Scalable real-time recurrent learning using columnar-constructive networks},
  author={Javed, Khurram and Shah, Haseeb and Sutton, Richard S and White, Martha},
  journal={Journal of Machine Learning Research},
  volume={24},
  number={256},
  pages={1--34},
  year={2023}
}

We show that by either decomposing the network into independent modules or learning a recurrent network incrementally, we can make RTRL scale linearly with the number of parameters. Unlike prior scalable gradient estimation algorithms, our algorithms do not add noise or bias to the gradient estimate.

GVFs in the Real World: Making Predictions Online for Water Treatment

MLJ 2023

K. Janjua, H. Shah, M. White, E. Miahi, M. C. Machado, A. White

Machine Learning (MLJ), 2023

@article{janjua2024gvfs,
  title={GVFs in the real world: making predictions online for water treatment},
  author={Janjua, Muhammad Kamran and Shah, Haseeb and White, Martha and Miahi, Erfan and Machado, Marlos C and White, Adam},
  journal={Machine Learning},
  volume={113},
  number={8},
  pages={5151--5181},
  year={2024},
  publisher={Springer}
}

We propose a framework for making accurate predictions on a real-world water treatment plant based on general value functions. This work is one of the first to motivate the importance of adapting predictions in real-time for non-stationary, high-volume systems in the real world.

An Open-World Extension for Knowledge Graph Completion Models

AAAI 2019 ★ Oral

H. Shah, J. Villmow, A. Ulges, U. Schwanecke, F. Shafait

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2019

@inproceedings{shah2019open,
  title={An Open-World Extension to Knowledge Graph Completion Models},
  author={Shah, Haseeb and Villmow, Johannes and Ulges, Adrian and Schwanecke, Ulrich and Shafait, Faisal},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={33},
  pages={3044--3051},
  year={2019},
  doi={10.1609/aaai.v33i01.33013044}
}

We propose an extension that enables any existing knowledge graph completion model to predict facts about open-world entities. This approach is more robust, more portable, and has better performance than the published state of the art on most datasets. We also released a new dataset that overcomes the shortcomings of previous ones.

Workshops & Preprints

Relation Specific Transformations for Open World Knowledge Graph Completion

COLING 2020

H. Shah, J. Villmow, A. Ulges

TextGraphs Workshop at COLING, 2020

@inproceedings{shah-etal-2020-relation,
    title = "Relation Specific Transformations for Open World Knowledge Graph Completion",
    author = "Shah, Haseeb and Villmow, Johannes and Ulges, Adrian",
    booktitle = "Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.textgraphs-1.9",
    pages = "79--84",
}

We introduced relation-specific transformations to substantially improve the performance of open-world knowledge graph completion models. We also proposed an approach for clustering relations to reduce the training time and memory footprint.

Distillation Techniques for Pseudo-rehearsal Based Incremental Learning

arXiv 2018

H. Shah, K. Javed, F. Shafait

arXiv preprint arXiv:1807.02799, 2018

@article{shah2018distillation,
  title={Distillation techniques for pseudo-rehearsal based incremental learning},
  author={Shah, Haseeb and Javed, Khurram and Shafait, Faisal},
  journal={arXiv preprint arXiv:1807.02799},
  year={2018}
}

Standard neural networks suffer from catastrophic forgetting when trained on an incrementally arriving stream of i.i.d. data. One approach to combat this forgetting is to train GANs on previously seen data and feed it to the network again. In this paper, we highlight that this method is biased and propose an approach to mitigate this bias and reduce the effect of catastrophic forgetting.

Public Talks

Online Feature Decorrelation

AMII 2022 Talk

A significant proportion of the representations learned by current generate-and-test methods consist of highly redundant features. This talk demonstrates how the feature ranking criteria used by these methods are ineffective at addressing this problem, and presents a new approach for decorrelating features in an online setting. I show that this decorrelator can effectively eliminate redundant features and produce a statistically significant performance improvement in the low-capacity function approximation setting.