Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Building an AI Peer Review Assistant: Useful, but Not Yet Trustworthy by Default

9 minute read

Published: June 23, 2026

Peer review is one of the most important quality-control systems in science, but it is also under pressure. Submission volume keeps growing, reviewers are overloaded, and many authors wait months for feedback. At the same time, large language models are becoming much better at reading papers, summarizing arguments, checking structure, and generating detailed critiques.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

MED12 loss activates endogenous retroelements to sensitise immunotherapy in pancreatic cancer

Published in Gut (IF: 24.5), 2024

This study characterizes the role of MED12 loss in activating endogenous retroelements, which sensitizes pancreatic ductal adenocarcinoma (PDAC) to immune checkpoint blockade. Using integrative multi-omics analysis, we demonstrate that MED12 deficiency rewires the tumor immune microenvironment, offering a potential biomarker for immunotherapy response in pancreatic cancer. Published as co-first author in Gut (2024).

Recommended citation: Tang Y*, Tang S*, Yang W, et al. MED12 loss activates endogenous retroelements to sensitise immunotherapy in pancreatic cancer. Gut. Published Online First: 31 August 2024. doi:10.1136/gutjnl-2024-332350 (*co-first author)
Download Paper

ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering

Published in bioRxiv, 2025

ARCADE is a controllable multi-objective codon design framework that leverages pretrained genomic language models and extends activation engineering — originally developed for controllable text generation — to steer continuous-valued biological metrics. By deriving semantic steering vectors in the model’s activation space, ARCADE directly controls properties such as Codon Adaptation Index (CAI), Minimum Free Energy (MFE), and GC content, enabling programmable biological sequence design for applications including mRNA vaccines and gene therapies.

Recommended citation: Li J, Lai HS, Liang L, Du S, Tang S, Kingsford C. ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering. bioRxiv. 2025. doi:10.1101/2025.08.19.668819
Download Paper

CodonRL: Multi-Objective Codon Sequence Optimization Using Demonstration-Guided Reinforcement Learning

Published in bioRxiv, 2026

CodonRL is a reinforcement learning framework for multi-objective mRNA codon optimization that learns a structural prior from efficient folding feedback and demonstration-guided replay. The framework uses LinearFold for fast intermediate reward computation, milestone-based rewards to address delayed feedback in long-range optimization, and enables user-controlled trade-offs between translation efficiency, RNA stability, and compositional properties. On a benchmark of 55 human proteins, CodonRL outperforms GEMORNA by 9.5% higher CAI, 25.4 kcal/mol more favorable MFE, and 3.4% lower uridine content.

Recommended citation: Du S, Kaynar G, Li J, You Z, Tang S, Kingsford C. CodonRL: Multi-Objective Codon Sequence Optimization Using Demonstration-Guided Reinforcement Learning. bioRxiv. 2026. doi:10.64898/2026.02.12.705465
Download Paper

Models Know Their Shortcuts: Deployment-Time Shortcut Mitigation

Published in arXiv preprint arXiv:2604.12277, 2026

This paper addresses shortcut learning in pretrained language models, where superficial token-level patterns fail to generalize under distribution shifts. “Shortcut Guardrail” is a deployment-time framework that requires no access to original training data or advance knowledge of shortcut types. It identifies problematic shortcuts through gradient-based attribution and employs a lightweight LoRA-based module trained with Masked Contrastive Learning to maintain consistent representations regardless of individual tokens. The approach improves performance across sentiment classification, toxicity detection, and natural language inference tasks under distribution shift.

Recommended citation: Li J, Tang S, Kaynar G, Du S, Kingsford C. Models Know Their Shortcuts: Deployment-Time Shortcut Mitigation. arXiv preprint arXiv:2604.12277. 2026.
Download Paper

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Shijie Tang

Sitemap

Pages

Posts

portfolio

publications

talks

teaching