Debangshu Banerjee

I am a 4th-year PhD student in the Computer Science department at the University of Illinois, Urbana-Champaign. I work at the intersection of machine learning and formal methods to develop next-generation programming systems with provable correctness. I am presently advised by Prof. Gagandeep Singh. My PhD is supported by the Bloomberg PhD Fellowship. Before starting my PhD, I completed my undergraduate studies at IIT Guwahati.

I currently work on self-evolving LLM agents, where a “planner” LLM automatically generates programmatic systems without explicit human supervision. My research focuses on giving users direct control over the behavior of these systems, including enforcing safety and access constraints in generated agents. On the theoretical side, I study the capabilities and limitations of LLMs in understanding code semantics. Previously, my work focused on neural network verification and certifiable training, where I developed the first scalable GPU-accelerated methods for verifying and training models with relational guarantees, such as monotonicity and robustness to universal adversarial perturbations (UAPs).

Selected Publications

SEVerA

SEVerA: Verified Synthesis of Self-Evolving Agents

Banerjee, Debangshu*, Xu, Changming*, and Singh, Gagandeep

In Preprint 2026

Abs HTML PDF

Recent advances have demonstrated the effectiveness of self-evolving LLM agents on tasks such as program repair and scientific discovery. In this paradigm, a planner LLM synthesizes agent code that invokes parametric models, including probabilistic generative models such as LLMs, smaller neural networks, and external tools such as SMT solvers. These components are then tuned per task to improve performance. However, unlike traditional constraint-guided program synthesis, existing self-evolving agent frameworks provide no formal guarantees of safety or correctness. Because such synthesized programs are often executed autonomously on unseen inputs, the lack of formal guarantees raises serious reliability and security concerns. To address this gap, we formulate agentic code generation as a constrained learning problem that combines hard formal specifications with soft objectives capturing task utility. We introduce Formally Guarded Generative Models (FGGM), which allow the planner LLM to specify a formal output contract for each generative-model call using first-order logic. Each FGGM call automatically wraps the underlying parametric generative model in a rejection sampler with a verified fallback, treating model outputs as samples from a proposal distribution. As a result, every returned output satisfies the specified contract for any input and any parameter setting of the underlying model. Building on FGGM, we present SEVerA (Self-Evolving Verified Agents), a three-stage framework for solving constrained learning problems arising from agent synthesis. In Search, the planner LLM synthesizes candidate parametric programs that may contain multiple FGGM calls. In Verification, we prove correctness with respect to the hard constraints for all parameter values, reducing the problem to unconstrained learning. In Learning, we apply scalable gradient-based optimization, including GRPO-style fine-tuning for LLMs, to improve the soft objective while preserving formal correctness. We evaluate SEVerA on constrained symbolic regression, invariant generation for Dafny programs, symbolic mathematical expression synthesis, and policy-compliant agentic tool use (τ^2-bench). Across all tasks, SEVerA achieves zero constraint violations while simultaneously improving task performance over unconstrained and state-of-the-art baselines. Our results demonstrate that formal behavioral constraints not only guarantee correctness but also prune the space of candidate programs, steering synthesis toward higher-quality agents.
DINGO

DINGO: Constrained Inference for Diffusion LLMs

Suresh, Tarun*, Banerjee, Debangshu*, Ugare, Shubham, Misailovic, Sasa, and Singh, Gagandeep

In The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Abs

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering substantial potential for improving runtime efficiency. However, existing diffusion models fail to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models, which generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, designed to enforce constraints with sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model’s predicted distribution while strictly adhering to any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68% points of improvement over unconstrained inference.
CRANE

CRANE: Reasoning with constrained LLM generation

Banerjee, Debangshu*, Suresh, Tarun*, Ugare, Shubham, Misailovic, Sasa, and Singh, Gagandeep

In Forty-second International Conference on Machine Learning (ICML) 2025

Abs HTML PDF

Code generation, symbolic math reasoning, and other tasks require LLMs to produce outputs that are both syntactically and semantically correct. Constrained LLM generation is a promising direction to enforce adherence to formal grammar, but prior works have empirically observed that strict enforcement of formal constraints often diminishes the reasoning capabilities of LLMs. In this work, we first provide a theoretical explanation for why constraining LLM outputs to very restrictive grammars that only allow syntactically valid final answers reduces the reasoning capabilities of the model. Second, we demonstrate that by augmenting the output grammar with carefully designed additional rules, it is always possible to preserve the reasoning capabilities of the LLM while ensuring syntactic and semantic correctness in its outputs. Building on these theoretical insights, we propose a reasoning-augmented constrained decoding algorithm, CRANE, which effectively balances the correctness of constrained generation with the flexibility of unconstrained generation. Experiments on multiple open-source LLMs and benchmarks show that CRANE significantly outperforms both state-of-the-art constrained decoding strategies and standard unconstrained decoding, showing up to a 9% improvement over baselines on challenging symbolic mathematical benchmarks GSM-symbolic and FOLIO.
CIVET

Support is All You Need for Certified VAE Training

Xu, Calvin, Banerjee, Debangshu, Vasisht, Deepak, and Singh, Gagandeep

In The Thirteenth International Conference on Learning Representations (ICLR) 2025

Abs PDF

Variational Autoencoders (VAEs) have become increasingly popular and deployed in safety-critical applications. In such applications, we want to give certified probabilistic guarantees on performance under adversarial attacks. We propose a novel method, CIVET, for certified training of VAEs. CIVET depends on the key insight that we can bound worst-case VAE error by bounding the error on carefully chosen support sets at the latent layer. We show this point mathematically and present a novel training algorithm utilizing this insight. We show in an extensive evaluation across different datasets (in both the wireless and vision application areas), architectures, and perturbation magnitudes that our method outperforms SOTA methods achieving good standard performance with strong robustness guarantees.
RABBit

Relational Verification Leaps Forward with RABBit

Suresh, Tarun*, Banerjee, Debangshu*, and Singh, Gagandeep

In The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS) 2024

Abs HTML PDF

We propose RABBit, a Branch-and-Bound-based verifier for verifying relational properties defined over Deep Neural Networks, such as robustness against universal adversarial perturbations (UAP). Existing state-of-the-art (SOTA) complete input specific robustness verifiers can not reason about dependencies between multiple executions and, as a result, are imprecise for relational verification. In contrast, existing SOTA relational verifiers only apply a single bounding step and do not utilize any branching strategies to refine the obtained bounds, thus producing imprecise results. We develop the first scalable Branch-and-Bound-based relational verifier, RABBit, which efficiently combines branching over multiple executions with cross-executional bound refinement to utilize relational constraints, gaining substantial precision over SOTA baselines on a wide range of datasets and networks.
RACoon

Relational DNN Verification With Cross Executional Bound Refinement

Banerjee, Debangshu, and Singh, Gagandeep

In Forty-first International Conference on Machine Learning (ICML) 2024

Abs HTML PDF

We focus on verifying relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations (UAP), certified worst-case hamming distance for binary string classifications, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. However, most of the existing works in DNN verification only handle properties defined over single executions and as a result, are imprecise for relational properties. Though few recent works for relational DNN verification, capture linear dependencies between the inputs of multiple executions, they do not leverage dependencies between the outputs of hidden layers producing imprecise results. We develop a scalable relational verifier RACoon that utilizes cross-execution dependencies at all layers of the DNN gaining substantial precision over SOTA baselines on a wide range of datasets, networks, and relational properties.
RaVeN

Input-Relational Verification of Deep Neural Networks

Banerjee, Debangshu, Xu, Calvin, and Singh, Gagandeep

In Programming Language Design and Implementation (PLDI) 2024

Abs HTML PDF

We consider the verification of input-relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations, monotonicity, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. We introduce a novel concept of difference tracking to compute the difference between the outputs of two executions of the same DNN at all layers. We design a new abstract domain, DiffPoly for efficient difference tracking that can scale large DNNs. DiffPoly is equipped with custom abstract transformers for common activation functions (ReLU, Tanh, Sigmoid, etc.) and affine layers and can create precise linear cross-execution constraints. We implement an input-relational verifier for DNNs called RaVeN which uses DiffPoly and linear program formulations to handle a wide range of input-relational properties. Our experimental results on challenging benchmarks show that by leveraging precise linear constraints defined over multiple executions of the DNN, RaVeN gains substantial precision over baselines on a wide range of datasets, networks, and input-relational properties.
ProFIt

Interpreting Robustness Proofs of Deep Neural Networks

Banerjee, Debangshu, Singh, Avaljot, and Singh, Gagandeep

In The Twelfth International Conference on Learning Representations (ICLR) 2024

Abs HTML PDF

In recent years numerous methods have been developed to formally verify the robustness of deep neural networks (DNNs). Though the proposed techniques are effective in providing mathematical guarantees about the DNNs’ behavior, it is not clear whether the proofs generated by these methods are human understandable. In this paper, we bridge this gap by developing new concepts, algorithms, and representations to generate human understandable insights into the internal workings of DNN robustness proofs. Leveraging the proposed method, we show that the robustness proofs of standard DNNs rely more on spurious input features as compared to the proofs of DNNs trained to be robust. Robustness proofs of the provably robust DNNs filter out a larger number of spurious input features as compared to adversarially trained DNNs, sometimes even leading to the pruning of semantically meaningful input features. The proofs for the DNNs combining adversarial and provably robust training tend to achieve the middle ground.
IRS

Incremental Randomized Smoothing Certification

Ugare, Shubham, Suresh, Tarun, Banerjee, Debangshu, Singh, Gagandeep, and Misailovic, Sasa

In The Twelfth International Conference on Learning Representations (ICLR) 2024

Abs HTML PDF

Randomized smoothing-based certification is an effective approach for obtaining robustness certificates of deep neural networks (DNNs) against adversarial attacks. This method constructs a smoothed DNN model and certifies its robustness through statistical sampling, but it is computationally expensive, especially when certifying with a large number of samples. Furthermore, when the smoothed model is modified (e.g., quantized or pruned), certification guarantees may not hold for the modified DNN, and recertifying from scratch can be prohibitively expensive. We present the first approach for incremental robustness certification for randomized smoothing, IRS. We show how to reuse the certification guarantees for the original smoothed model to certify an approximated model with very few samples. IRS significantly reduces the computational cost of certifying modified DNNs while maintaining strong robustness guarantees. We experimentally demonstrate the effectiveness of our approach, showing up to 4.1x certification speedup over the certification that applies randomized smoothing of the approximate model from scratch.
IVAN

Incremental Verification of Neural Networks

Ugare, Shubham, Banerjee, Debangshu, Misailovic, Sasa, and Singh, Gagandeep

Programming Language Design and Implementation (PLDI) Jun 2023

Abs HTML PDF

Complete verification of deep neural networks (DNNs) can exactly determine whether the DNN satisfies a desired trustworthy property (e.g., robustness, fairness) on an infinite set of inputs or not. Despite the tremendous progress to improve the scalability of complete verifiers over the years on individual DNNs, they are inherently inefficient when a deployed DNN is updated to improve its inference speed or accuracy. The inefficiency is because the expensive verifier needs to be run from scratch on the updated DNN. To improve efficiency, we propose a new, general framework for incremental and complete DNN verification based on the design of novel theory, data structure, and algorithms. Our contributions implemented in a tool named IVAN yield an overall geometric mean speedup of 2.4x for verifying challenging MNIST and CIFAR10 classifiers and a geometric mean speedup of 3.8x for the ACAS-XU classifiers over the state-of-the-art baselines.

Affiliations & Internships


IIT Guwahati 2016-2020 B. Tech Computer Science	Google Summer 2019 SDE Intern	Google 2020-2022 Software Engineer (L4)	UIUC 2022-Present PhD Computer Science