Semantic Maintenance – Takeaways from Tower’s Conference on Synthetic Software

By: Neel Baronia

Last month, Tower Research Ventures hosted its inaugural Conference on Synthetic Software (CSS) which focused on convening leading researchers, developers, and entrepreneurs working in the broad field of code generation. Speakers presented on a wide range of topics, from the types of infrastructure that software engineering agents require to push code, to how LLMs can be used to optimize low-level CUDA kernels.

As the researchers and entrepreneurs spoke about their relevant areas of focus, one overarching theme emerged:

Code generation is already one of the premier LLM use cases in production environments today. Anthropic’s recent Economic Index shows codegen as a clear outlier use case of its foundation models:

Meanwhile, companies like Google report that close to 25% of new code shipped for their products is being written by AI, and AI-native IDE Cursor is reported to be the fastest company to generate $100M in annual revenue. What are the consequences of this trend toward code being written by increasingly capable models, tools, and agents?

“Programs must be written for people to read and only incidentally for machines to execute.” -Hal Abelson, author of Structure and Interpretation of Computer Programs

During our conference, Henry Zhu, maintainer of Babel JS, highlighted the growing need for tools and UX to help humans parse and track edits to code, especially for open source projects. Currently, for most open source projects, the number of maintainers (community members actively responsible for stewarding the project) is dwarfed by the number of contributors – many popular projects have one or two maintainers, and these individuals must filter, triage, and manage any bugs and proposals raised by community members. This is a very tall order. Zhu spoke about the potential concern of maintainers being overwhelmed by codegen agents as contributors.

While somewhat counterintuitive, it is not enough to simply verify that new code does what we expect it to. It is equally important to assess how well the new code assimilates with the existing code. Does it adhere to style guides? Does it introduce breaking changes? Does it complicate the docs or deteriorate the API design? Hunter Brooks, Co-Founder at Ellipsis.dev, spoke at CSS about how engineering organizations use AI Code Review to enforce both a high bar for code quality and the company style guide at scale. These tools would help humans better manage the code proposed by codegen models, as simply relying on code to manage other code may not be sufficient.

One potential approach is to increasingly rely on “semantic diffs” – looking at code changes for conceptual changes rather than pure token edits (i.e., does it really matter if you deleted empty lines or renamed some variable?). While LLMs are generally adept at synthesizing and summarizing information, using code to summarize semantic differences may not be as easy as expected. As SemanticDiff mentions on its recent blog, different languages (and even different compilers) don’t parse text in the same way. As such, the right level of abstraction to demonstrate changes is hard to nail down. This can be further complicated by use case specifics. Two matrix multiplication implementations in C may be similar enough for someone building a consumer application, but they may contain critically important differences for high performance use cases like high frequency trading or foundation model training. By contrast, Saurabh Misra (Founder of CodeFlash) mentioned in his talk at CSS that performance and legibility may not be opposed within the context of Python improvement – simplifying logic or optimizing library usage may actually result in better readability and quality.

While we may not yet know what form semantic management of unwieldy codebases and CI/CD pipelines will take in an increasingly synthetic world, it does appear a logical next step to enable great products and companies in the codegen space (after building the foundation models themselves, the AI-native IDEs, and agents to use them). If this is a space you’re thinking about or actively building in, we’d love to hear from you. You can reach us at ventures@tower-research.com.


The views expressed herein are solely the views of the author(s), are as of the date they were originally posted, and are not necessarily the views of Tower Research Ventures LLC, or any of its affiliates. They are not intended to provide, and should not be relied upon for, investment advice, nor is any information herein any offer to buy or sell any security or intended as the basis for the purchase or sale of any investment. The information herein has not been and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date of preparation. Certain information contained herein is based on published and unpublished sources. The information has not been independently verified by TRV or its representatives, and the accuracy or completeness of such information is not guaranteed. Your linking to or use of any third-party websites is at your own risk. Tower Research Ventures disclaims any responsibility for the products or services offered or the information contained on any third-party websites.