Ph.D. Student · Reinforcement Learning
I am a Ph.D. student at the RLAI lab at the University of Alberta, advised by Prof. Martha White. My research focuses on foundational topics in reinforcement learning — policy gradient algorithms, general value functions, and recurrent learning — as well as applying reinforcement learning to real-world physical systems, such as automating drinking water-treatment plants and controlling laser wakefield accelerators.
Submitted to the Proceedings of the National Academy of Sciences (PNAS)
We run and analyze over 33,000 experiments on a task derived from a real-world system to identify how actor-critic components affect performance and stability in deployment, highlighting that many existing defaults are relatively more unstable.
Submitted to the Continual RL Workshop at the Reinforcement Learning Conference (RLC), 2026
We introduce a new offline-to-online fine-tuning algorithm that gradually allows more exploration based on off-policy estimates of performance. We also propose new metrics to properly measure performance degradation.
Submitted to the Conference on Neural Information Processing Systems (NeurIPS), 2026
We study symmetric divergences for behavior-regularized policy optimization and propose Symmetric f-Actor Critic, which avoids the per-environment failures encountered by other offline reinforcement learning methods.
In Proceedings of the International Conference on Learning Representations (ICLR), 2025
@inproceedings{ICLR2025_6507b115,
author = {Zhu, Lingwei and Shah, Haseeb and Wang, Han and Nagai, Yukie and White, Martha},
booktitle = {International Conference on Representation Learning},
editor = {Y. Yue and A. Garg and N. Peng and F. Sha and R. Yu},
pages = {40717--40744},
title = {q-exponential family for policy optimization},
url = {https://proceedings.iclr.cc/paper_files/paper/2025/file/6507b115562bb0a305f1958ccc87355a-Paper-Conference.pdf},
volume = {2025},
year = {2025}
}
We explore the effectiveness of q-exponential policies in policy optimization methods, finding that heavy-tailed policies (q > 1) are generally more effective and can consistently outperform the Gaussian policy.
Journal of Machine Learning Research (JMLR), 2023
@article{javed2023scalable,
title={Scalable real-time recurrent learning using columnar-constructive networks},
author={Javed, Khurram and Shah, Haseeb and Sutton, Richard S and White, Martha},
journal={Journal of Machine Learning Research},
volume={24},
number={256},
pages={1--34},
year={2023}
}
We show that by either decomposing the network into independent modules or learning a recurrent network incrementally, we can make RTRL scale linearly with the number of parameters. Unlike prior scalable gradient estimation algorithms, our algorithms do not add noise or bias to the gradient estimate.
Machine Learning (MLJ), 2023
@article{janjua2024gvfs,
title={GVFs in the real world: making predictions online for water treatment},
author={Janjua, Muhammad Kamran and Shah, Haseeb and White, Martha and Miahi, Erfan and Machado, Marlos C and White, Adam},
journal={Machine Learning},
volume={113},
number={8},
pages={5151--5181},
year={2024},
publisher={Springer}
}
We propose a framework for making accurate predictions on a real-world water treatment plant based on general value functions. This work is one of the first to motivate the importance of adapting predictions in real-time for non-stationary, high-volume systems in the real world.
In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2019
@inproceedings{shah2019open,
title={An Open-World Extension to Knowledge Graph Completion Models},
author={Shah, Haseeb and Villmow, Johannes and Ulges, Adrian and Schwanecke, Ulrich and Shafait, Faisal},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
pages={3044--3051},
year={2019},
doi={10.1609/aaai.v33i01.33013044}
}
We propose an extension that enables any existing knowledge graph completion model to predict facts about open-world entities. This approach is more robust, more portable, and has better performance than the published state of the art on most datasets. We also released a new dataset that overcomes the shortcomings of previous ones.
TextGraphs Workshop at COLING, 2020
@inproceedings{shah-etal-2020-relation,
title = "Relation Specific Transformations for Open World Knowledge Graph Completion",
author = "Shah, Haseeb and Villmow, Johannes and Ulges, Adrian",
booktitle = "Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)",
month = dec,
year = "2020",
address = "Barcelona, Spain (Online)",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.textgraphs-1.9",
pages = "79--84",
}
We introduced relation-specific transformations to substantially improve the performance of open-world knowledge graph completion models. We also proposed an approach for clustering relations to reduce the training time and memory footprint.
arXiv preprint arXiv:1807.02799, 2018
@article{shah2018distillation,
title={Distillation techniques for pseudo-rehearsal based incremental learning},
author={Shah, Haseeb and Javed, Khurram and Shafait, Faisal},
journal={arXiv preprint arXiv:1807.02799},
year={2018}
}
Standard neural networks suffer from catastrophic forgetting when trained on an incrementally arriving stream of i.i.d. data. One approach to combat this forgetting is to train GANs on previously seen data and feed it to the network again. In this paper, we highlight that this method is biased and propose an approach to mitigate this bias and reduce the effect of catastrophic forgetting.
A significant proportion of the representations learned by current generate-and-test methods consist of highly redundant features. This talk demonstrates how the feature ranking criteria used by these methods are ineffective at addressing this problem, and presents a new approach for decorrelating features in an online setting. I show that this decorrelator can effectively eliminate redundant features and produce a statistically significant performance improvement in the low-capacity function approximation setting.