On October 29, 2024, Tower Research Ventures had the pleasure of co-hosting another edition of the AI Research Roundtable series with the GenAI collective. Speakers included Professor Vlodymyr Kuleshov, Jack Morris, and Avner May covering topics such as applying diffusion models to language, enhancing retrieval-augmented generation (RAG) systems by adding context in embeddings, and accelerating LLM inference.
The first talk by Professor Kuleshov from Cornell highlighted his team’s latest work on Language Diffusion Models[1]. While diffusion models are prominent in continuous data applications (e.g. Midjourney for images), discrete data applications are mostly based on autoregressive models today. Professor Kuleshov’s team is pioneering ways to apply diffusion models to language, motivated by their potential advantages, including (i) faster/more efficient generation, (ii) improved controllability and (iii) “native multi-modality”. Professor Kuleshov detailed their novel masking techniques and shared impressive results from their recently published Masked Diffusion Language Model.[2] In closing, he discussed exciting directions for future research, particularly in understanding how these models scale with increased computational power. We look forward to new findings in this area.
Professor Kuleshov discussing Language Diffusion Models
Jack Morris, currently a researcher at Meta, presented his work on Contextual Document Embeddings (CDE). RAG, a commonly used method for grounding large language models in external knowledge, relies on embedding models to encode documents in a knowledge corpus. Traditional document embedding models create vector representations of documents independently without considering the context of other similar documents or the overall corpus. By introducing two key elements – contextual training and contextual architecture[3] – Morris’ work demonstrates how to make document embeddings context aware. This technique significantly improves RAG document selection and performance across various domains. Morris and his team released a small version of their contextual document embedding model, cde-small-v1, on Huggingface. We look forward to seeing further applications of these models in live production settings.
Jack Morris on contextual document embeddings
Lastly, Avner May from Together AI discussed accelerating LLM inference with the Sequoia framework[4], a method that leverages a dynamic programming algorithm to create an optimal tree structure (hence the name Sequoia) for speculative token generation. While the details of the algorithm are beyond the scope of this post,[5] the speed improvements it achieves are remarkable, reaching up to 10x in decoding speed of Llama2-7B on an A100 GPU. The team is actively working on further enhancements to the framework and anticipates sharing updates publicly soon.
If you are an engineer/founder working on language diffusion models, contextual embeddings or LLM inference speed, we’d love to chat. Please reach out to Tower Research Ventures at ventures@tower-research.com!
[1] For reference, see https://arxiv.org/abs/2406.07524
[2] https://github.com/kuleshov-group/mdlm
[3] For more on this, see https://arxiv.org/abs/2410.02525
[4] https://www.together.ai/blog/sequoia
[5] More information here: https://infini-ai-lab.github.io/Sequoia-Page/
The views expressed herein are solely the views of the author(s), are as of the date they were originally posted, and are not necessarily the views of Tower Research Ventures LLC, or any of its affiliates. They are not intended to provide, and should not be relied upon for, investment advice, nor is any information herein any offer to buy or sell any security or intended as the basis for the purchase or sale of any investment. The information herein has not been and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date of preparation. Certain information contained herein is based on published and unpublished sources. The information has not been independently verified by TRV or its representatives, and the accuracy or completeness of such information is not guaranteed. Your linking to or use of any third-party websites is at your own risk. Tower Research Ventures disclaims any responsibility for the products or services offered or the information contained on any third-party websites.