The Stanford Software Research Lunch is a weekly event on Thursday where students and researchers present their latest work to peers. Talks are open to anybody, but regular attendees are expected to give a presentation on their work. Members of the Computer Forum are especially welcome.
Mailing list: software-research-lunch@lists.stanford.edu (subscribe via mailman)
Calendar: ical
Format: The lunch is held every week during fall, winter and spring quarter. The first week of every quarter is an organizational lunch where people can sign up to give a talk. If you'd like to give a talk, please contact Rohan Yadav.
Past quarters: Spring 2025, Winter 2025, Fall 2024, Spring 2024, Winter 2024, Fall 2023, Spring 2023, Winter 2023, Fall 2022, Winter 2021, Fall 2020, Winter 2020, Fall 2019, Spring 2019, Winter 2019, Fall 2018, Spring 2018, Winter 2018, Fall 2017, Spring 2017, Winter 2017, Fall 2016.
Ordering Food: For suggestions for those ordering food for the lunch, see here.
9/11: Noisy Quantum Simulation Using Tracking, Uncomputation and Sampling
Time: Thursday, September 11, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Siddharth Dangwal
Abstract: Quantum computers have grown rapidly in size and qubit quality in recent years, enabling the execution of complex quantum circuits. However, for most researchers, access to compute time on quantum hardware is limited. This necessitates the need to build simulators that mimic the execution of quantum circuits on noisy quantum hardware accurately and scalably. In this work, we propose TUSQ - Tracking, Uncomputation, and Sampling for Noisy Quantum Simulation. TUSQ is a simu- lator that can perform noisy simulation of up to 30-qubit Adder circuits on a single Nvidia A100 GPU in less than 820 seconds. To represent the stochastic noisy channels accurately, we average the output of multiple quantum circuits with fixed noisy gates sampled from the channels. However, this leads to a substantial increase in circuit overhead, which slows down the simulation. To eliminate this overhead, TUSQ uses two modules: the Error Characterization Module (ECM), and the Tree-based Execution Module (TEM). The ECM tracks the number of unique circuit executions needed to accurately represent the noise. That is, if initially we needed n1 circuit executions, ECM reduces that number to n2 by eliminating redundancies so that n2 < n1. This is followed by the TEM, which reuses computation across these n2 circuits. This computational reuse is facilitated by representing all n2 circuits as a tree. We sample the significant leaf nodes of this tree and prune the remaining ones. We traverse this tree using depth-first search. We use uncomputation to perform rollback-recovery at several stages which reduces simulation time. We evaluate TUSQ for a total of 186 benchmarks and report an average speedup of 52.5× and 12.53× over Qiskit and CUDA-Q, which goes up to 7878.03× and 439.38× respectively. For larger benchmarks (more than than 15 qubits), the average speedup is 55.42× and 23.03× over Qiskit and CUDA-Q respectively.
Food:
9/25: Organizational Lunch
Time: Thursday, September 25, 2025, 12 noon - 1pm
Location: CoDa E401
Organizational lunch. Come sign up to give a talk during the quarter.
Food:
10/2: Automated Formal Verification of a Software Fault Isolation System
Time: Thursday, October 2, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Matthew Sotoudeh
Abstract: Software fault isolation (SFI) is a popular way to sandbox untrusted software. A key component of SFI is the verifier that checks the untrusted code is written in a subset of the machine language that guarantees it never reads or writes outside of a region of memory dedicated to the sandbox. Soundness bugs in the SFI verifier would break the SFI security model and allow the supposedly sandboxed code to read protected memory. In this paper, we address the concern of SFI verifier bugs by performing an automated formal verification of a recent SFI system called Lightweight Fault Isolation (LFI). In particular, we formally verify that programs accepted by the LFI verifier never read or write to memory outside of a designated sandbox region. Joint work with Zachary Yedidia.
The talk will be a short practice talk, and I'm hoping for feedback from attendees on the presentation. If time permits, I'll also give a practice talk for some garbage collection work and/or tell you about how I've been abusing dynamic linkers to make large software systems easier to hack on.
Food:
10/9: Building Programming Systems for Near-Data-Processing: Theory and Practice
Time: Thursday, October 9, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Yiwei Zhao
Abstract: Data movement has become the dominant cost in modern computer systems. Near-Data Processing (NDP)---also known as Processing-in-Memory (PIM)---is re-emerging as a promising approach to mitigate this cost by enabling computation resources embedded within memory modules. While considerable recent work has been published on the architectural and technological fronts, the theoretical and programming foundations of NDP remain under-explored. This absence of algorithmic analysis makes it difficult to identify the fundamental principles in the complicated design space. In this talk, we address two central questions in designing programming systems for NDP: How should programming models and algorithm design for NDP systems differ from traditional parallel or distributed settings? and What are the fundamental trade-offs and limitations inherent in NDP? To answer these, we focus in this talk on NDP-friendly indexing structures: specifically, PIM-optimized B-trees, radix trees, and space-partitioning indexes. Our designs directly address the fundamental tension between minimizing communication and maintaining load balance, achieving provable guarantees under arbitrary query and data skew. Experimental results on UPMEM’s 2,560-module PIM system show up to 59× speedups over prior state-of-the-art PIM indexes. Finally, we discuss how these NDP techniques generalize to end-to-end transactional (OLTP) databases on NDP platforms; and extend to broader programming systems for distributed and/or heterogeneous architectures.
Food:
10/16: Keith Talks About Intro CS/EE
Time: Thursday, October 16, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Keith Winstein
Abstract: Keith Winstein will discuss ongoing work to create a new frosh-level CS/EE course, emphasizing the joy and playfulness of these disciplines and their fundamental concepts through real-time interaction. The students will create programs that understand sounds and play with them in real time by making noise with real-world musical instruments. We want students to learn about the Nyquist sampling theorem, discover Minsky's circle algorithm for themselves, create an interpreter for a simple programming language, understand concepts like computability, and write code that understands information encoded in the notes of a toddler xylophone playing in real time in the real world.
A big part of the course will involve programming in raw WebAssembly in a new live programming environment we call 'Codillon' -- a structured or projectional editor similar in spirit to systems like Hazel (POPL 2019), that ensure that every input typechecks and is runnable, even if there are 'holes'. The environment will execute students' code as they type, showing the execution trajectory alongside the code (similar to some of Bret Victor's demos). Unlike with Hazel, however, we are doing this for an existing, text-based programming language and allowing arbitrary text manipulations, which creates some new challenges -- not to mention the challenge of teaching freshman computer science in a typed assembly language that few students will have previously encountered. A prior lack of familiarity is probably helpful in the sense of creating an equalizer and encouraging students to think and learn, but not if it makes them hate computer science. I'll discuss some of our UI/IDE concepts and will be grateful for feedback on this ongoing work.
Food:
10/23: Fix: Externalizing Network I/O in Serverless Computing
Time: Thursday, October 23, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Yuhan Deng
Abstract: We describe a system for serverless computing where users, programs, and the underlying platform share a common representation of a computation: a deterministic procedure, run in an environment of well-specified data or the outputs of other computations. This representation externalizes I/O: data movement over the network is performed exclusively by the platform. Applications can describe the precise data needed at each stage, helping the provider schedule tasks and network transfers to reduce starvation. The design suggests an end-to-end argument for outsourced computing, shifting the service model from ''pay-for-effort'' to ''pay-for-results.''
This is a practice talk, and feedback would be greatly appreciated.
Food:
10/30: Programming Systems for Spatial Dataflow Architectures.
Time: Thursday, October 30, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Souradip Ghosh
Abstract: Spatial dataflow architectures (SDAs) are a promising and versatile accelerator platform. They are software-programmable and achieve near-ASIC performance and energy efficiency, beating CPUs by orders of magnitude. Unfortunately, many SDAs still struggle to efficiently implement a variety of irregular computations.
In this talk, I’ll focus on a key source of this inefficiency for SDAs — an abstraction inversion in the hardware–software stack: many SDAs fail to capture coarse-grain dataflow semantics in the application — namely asynchronous communication, pipelining, and queueing — that are naturally supported by the dataflow execution model and existing SDA hardware. I’ll present Ripple, an asynchronous programming language and architecture that corrects the abstraction inversion by preserving dataflow semantics down the stack. Additionally, I’ll touch on some lessons learned in codesigning a programming system and architecture and how we’ve applied some of our findings to build an energy-efficient SDA for industry.
Food:
11/6: GPU-Accelerated Dependent Partitioning
Time: Thursday, November 6, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Rohan Chanani
Abstract: Scaling high-performance computing applications to massive computing clusters requires data partitioning. Given an initial partition of one aspect of the data, often divided equally amongst all nodes, it is natural to derive dependent partitions (Treichler 2016) for the rest of the data based on the first independent partition. Prior to this project, algorithms to compute these dependent partitioning operations existed only for CPUs. However, the cost of these operations scales with the size of the program data, so CPU-only dependent partitioning blows up program initialization time and makes dynamic partitioning during the main loop of a program entirely impractical. Therefore, we developed GPU partitioning algorithms which deliver upwards of 20x speedup in program initialization and unlock previously impossible dynamic partitioning operations in high-performance applications.
Food:
11/13: TBD
Time: Thursday, November 13, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Pu (Luke) Yi
Abstract: TBD
Food:
11/20: TBD (Justin Lubin)
Time: Thursday, November 20, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Justin Lubin
Food:
12/4: TBD
Time: Thursday, December 4, 2025, 12 noon - 1pm
Location: CoDa E401
Speaker: Rupanshu Soi
Abstract: TBD
Food: