[ Software Research Lunch ]


The Stanford Software Research Lunch is a weekly event on Thursday where students and researchers present their latest work to peers. Talks are open to anybody, but regular attendees are expected to give a presentation on their work.

Mailing list: software-research-lunch@lists.stanford.edu (subscribe via mailman)

Calendar: ical

Format: The lunch is held every week during fall, winter and spring quarter. The first week of every quarter is an organizational lunch where people can sign up to give a talk. If you'd like to give a talk, please contact Rohan Yadav.

Past quarters: Fall 2023, Spring 2023, Winter 2023, Fall 2022, Winter 2021, Fall 2020, Winter 2020, Fall 2019, Spring 2019, Winter 2019, Fall 2018, Spring 2018, Winter 2018, Fall 2017, Spring 2017, Winter 2017, Fall 2016.

Ordering Food: For suggestions for those ordering food for the lunch, see here.


10/3: Proving Query Equivalence Using Linear Integer Arithmetic

Time: Thursday, October 3, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Haoran Ding

Abstract: Proving the equivalence between SQL queries is a fundamental problem in database research. Existing provers model queries using algebraic representations and convert such representations into first-order logic formulas so that query equivalence can be verified by solving a satisfiability problem. The main challenge lies in 'unbounded summations', which is used to model common SQL features. Unfortunately, existing provers handle unbounded summations in an ad-hoc manner, which severely limits the verification capability.
SQLSolver is a new SQL equivalence prover, which can handle unbounded summations in a principled way. Our key insight is to use the theory of LIA*, which extends linear integer arithmetic formulas with unbounded sums. We augment the basic LIA* theory to handle several complex scenarios that arise from modeling real-world queries. Evaluated on 400 equivalent query pairs derived from Apache Calcite, Spark SQL, TPC-C, and TPC-H, SQLSolver successfully proves 388 pairs of them, which significantly outperforms existing provers.
https://dl.acm.org/doi/abs/10.1145/3626768

Food:


10/10: Solving the Phase Ordering Problem != Generating the Globally Optimal Code

Time: Thursday, October 10, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Ke Wang

Abstract: Phase ordering problem has been a long-standing challenge in compiler optimizations. Over the past four decades, a significant amount of effort has been devoted, and indeed, substantial progress has been made. However, in this paper, we raise questions about the overall significance of solving the phase ordering problem in the first place, as pursuing a solution to this problem may not align with the fundamental goal of compiler optimizations, i.e., generating the globally optimal code among all programs that compilers deem semantically equivalent to an input program.
Our findings, supported by both theoretical and empirical evidence, show that solving the phase ordering problem is not equivalent to generating such globally optimal code. To achieve this goal, we propose a theoretical approach, called infinitive iterative bi-directional optimizations (IIBO), which is guaranteed to converge to the globally optimal code for any input program. We realize IIBO into a practical algorithm and apply it to optimize real-world programs. Results show that IIBO frequently generates more efficient code than GCC/LLVM, two state-of-the-art industry compilers, as well as exhaustive search, which can be deemed the solution to the phasing ordering problem.
We are currently in active discussions with LLVM engineers on the possible incorporation of our findings into their next release. Given the significance and impact of our results, we expect our work to inspire new design principles for compiler development in the pursuit of generating the globally optimal code.

Food:


10/17: New Algorithms for High-Precision Floating-Point Arithmetic

Time: Thursday, October 17, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: David Zhang

Abstract: Floating-point arithmetic is an indispensable tool for computational modeling and simulation. Modern computing platforms have ubiquitous support for IEEE 754 single (binary32) and double (binary64) precision arithmetic. However, when these precision levels are insufficient, users are forced to turn to software emulation of higher-precision floating-point formats, which can be many thousands of times slower than native machine precision. In this talk, I will present ongoing work to develop new algorithms for high-precision floating-point arithmetic that are simultaneously faster and more accurate than previous work. These algorithms are completely branch-free, making them suitable for data-parallel processors, such as GPUs and SIMD CPUs. I will also present counterexamples to several algorithms in the published literature, highlighting the difficulty of reasoning about floating-point edge cases and roundoff errors. To address this difficulty, I will introduce a novel technique for using SMT solvers to verify floating-point algorithms by coarsely overapproximating the behavior of the FPU. This technique enables previously-infeasible verification problems to be solved in milliseconds.

Food:


10/24: Automated Reasoning About Distributed Systems

Time: Thursday, October 24, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Federico Mora

Abstract: Designing and implementing distributed systems is still an enormous task for software engineers. Much of this challenge stems from the fact that bugs can arise from complex combinations of machine failures and message interleavings that are difficult for humans to reason about manually. As distributed systems become increasingly critical infrastructure, engineers will need more and more computational support to correctly build and deploy them.
In this talk, I will present three automated reasoning techniques that can help software engineers build and deploy correct distributed systems. First, a verification framework for message-passing distributed systems that allows users to write specifications as monitors and execute them alongside deployed systems. Second, a decision procedure for a relevant fragment of logic. Third, a new approach for semi-automatically modelling distributed systems in formal languages.

Food:


10/31: Testing GPU Memory Consistency at Large

Time: Thursday, October 31, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Reese Levine

Abstract: Memory consistency specifications (MCSs) are a difficult, yet critical, part of a concurrent programming framework. Existing MCS testing tools are not immediately accessible, and thus, have only been applied to a limited number of devices. However, in the post-Dennard scaling landscape, there has been an explosion of new architectures and frameworks, exemplified by graphics processing units (GPUs). Studying the shared memory semantics of these new platforms is important for understanding program behavior and ensuring conformance to framework specifications.
In this talk, I will discuss our work on widescale GPU MCS testing. We developed a new methodology, MC Mutants, which utilizes mutation testing to evaluate the effectiveness of MCS testing techniques. MC Mutants is built into an accessible testing tool, GPUHarbor, which we used to collect data from over 100 devices from seven GPU vendors. This massive testing campaign revealed bugs in several GPU compilers and provided insights into weak behavior characteristics across diverse architectures. Furthermore, these results were used to tune testing environments across different devices, allowing us to make testing portable and contribute our tests to the official conformance test suite for WebGPU. Our ongoing work is investigating how to increase the safety and security of GPU programming languages in the face of their weak shared memory guarantees, as well as the challenges and opportunities that come with evolving architectures.

Food:


11/7: Democratizing High-Performance DSL development with BuildIt

Time: Thursday, November 7, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Ajay Brahmakshatriya

Abstract: Today, applications that require high-performance rely on libraries of hand- optimized kernels, with thousands available across various domains and architectures, while Domain-Specific Languages (DSLs) and their accompanying compilers remain relatively rare. A well-designed DSL can describe a much wider variety of programs within a given domain than even the most comprehensive library, while also unlocking powerful cross- function and global domain-specific optimizations that hand-optimized kernels cannot achieve. However, building high-performance DSLs is complex and time-consuming, often requiring compiler experts to devote years to development.
In this talk, I will introduce BuildIt, a C++ framework designed for the rapid prototyping of high-performance DSLs. BuildIt uses a multi-stage programming approach to combine the flexibility of libraries with the performance and specialization of code generation. With BuildIt, domain experts can transform existing libraries into efficient, specialized compilers simply by modifying types of the variables. Moreover, it allows them to implement analyses and transformations without needing to write traditional compiler code. Currently, BuildIt supports code generation for multi-core CPUs and GPUs, with FPGA support coming soon. I will also showcase three DSLs created with BuildIt to highlight its power and ease of use: a reimplementation of the GraphIt graph computing language, the BREeze DSL for regular expressions, and NetBlocks, a DSL for custom network protocol development.

Food:


11/14: Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers

Time: Thursday, November 14, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Anjiang Wei

Abstract: Mapping computations to processors and assigning data to memory are critical for maximizing performance in parallel programming. These mapping decisions are managed through the development of specialized low-level system code, called mappers, crafted by performance engineers. Each mapper is tailored to a specific application and optimized for the underlying machine architecture, a process that requires days of refinement and tuning from an expert. Despite advances in system research, automating mapper generation remains a challenge due to the complexity of making millions of decisions to find the optimal solution and generate the solution as code. We introduce an approach that leverages recent advances in LLM-based optimizers for mapper design. In under ten minutes, our method automatically discovers mappers that surpass human expert designs in scientific applications by up to 1.34X speedup. For parallel matrix multiplication algorithms, our mapper achieves up to 1.31X of the expert-designed solution. To achieve this, we simplify the complexity of low-level code generation by introducing a domain-specific language (DSL) that abstracts the low-level system programming details and defines a structured search space for LLMs to explore. To maximize the application performance, we use an LLM optimizer to improve an agentic system that generates the mapper code. As a result, this approach significantly reduces the workload for performance engineers while achieving substantial performance gains across diverse applications. Finally, our results demonstrate the effectiveness of LLM-based optimization in system design and suggest its potential for addressing other complex system challenges.

Food:


11/21: Composing Turn-Based Games

Time: Thursday, November 21, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Scott Kovach

Abstract: This talk (on work in progress) will discuss a domain specific language for turn-based games, with applications to GUI software and game design. We'll argue that games (as a model of computation) are more suitable than algorithms/functions for modeling interactive software like GUIs. Additionally, board games (as a subset of games) have several properties that are also desirable software properties: their logic is usually specified in small, easily learnable pieces, such as playing cards containing natural language rules; they are designed to be extended with new packs of rules that augment previous rules without replacing or editing them; their state is concrete and visual, and state changes bear simple causal explanations; they're fun :) The language features are designed to support concise game definitions. A program in the language consists of a set of rules, each of which may observe the current state and then produce events in response. The language uses relational state to compose rule effects and defines higher level concepts using relational queries. Actors and choice are a first-class concept, and choices can be interleaved with observations. Actors can be multiple and cooperative (as in games like 'Pandemic' or 'Google Sheets'), or can model other factors such as randomness and algorithmic entities like play strategies and heuristics. The language uses an explicit, simple model of time that divides action into episodes, which may be composed sequentially or concurrently. There will be a demo of the language in action.

Food:


11/28: Cancelled for Thanksgiving

Time: Thursday, November 28, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: No Speaker

Abstract: Enjoy Thanksgiving!

Food:


12/5: Social Lunch

Time: Thursday, December 5, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: The attendees

Abstract: Social Lunch

Food:


12/12: Bridging Safety and Performance

Time: Thursday, December 12, 2024, 12 noon - 1pm
Location: Gates 415

Speaker: Umut Acar

Abstract: As computer scientists, we have long wrestled with a perceived trade-off between safety and performance. Choosing performance meant using low-level languages, which help optimize for performance but at the expense of safety. Choosing safety meant using high-level languages, which guarantee safety at the expense of performance. Historically, we have chosen performance over safety. This choice has not aged well, because societal significance and economic costs of safety has been steadily increasing. Therefore, it is (past) time to challenge this rigid dichotomy. Obvious question is: why can't we have both?
In this talk, I show that we can have both safety and performance, by developing programming languages and systems that integrate both. I present two advances in this direction by considering classic problems. First, I show that parallel functional programming languages, which are traditionally regarded as safe but slow, can deliver impressive performance. Second, I show that parallel programming languages, including both functional and procedural, can automatically and safely optimize their own performance, reducing reliance on labor-intensive manual optimizations. The method underpinning these advances blends several 'ingredients' 1) 'Theory A' (algorithms and complexity, a.k.a., 'Turing' theory) for asymptotic efficiency, 2) 'Theory B' (theory of programming, a.k.a., 'Church' theory) for safety, and 3) system building and empirical computer science for practical efficiency.

Food: