AI Engineer World's Fair (2024)

GPU & Inference Track

Jun 27, 2024 — PromptPanel

← Head back to all of our AI Engineer World's Fair recaps

Covalent Launch: The GPU Cheatcode: Fine-tune 20 Llama Models in 5 Minutes

Santosh Radha #santoshkumarradha / Agnostiq (Covalent)
Watch it on YouTube | AI.Engineer Talk Details

This talk introduces Covalent, an open-source platform that allows developers to run Python code on various compute backends, including GPUs, without needing to manage Docker or Kubernetes. The speaker demonstrates how Covalent can be used to easily fine-tune and deploy multiple AI models directly from Python, with automatic scaling and cost-effective resource allocation.

Compute & System Design for Next Generation Frontier Models

Dylan Patel @dylan522p / SemiAnalysis
Watch it on YouTube | AI.Engineer Talk Details

This talk is about running large AI models and the challenges involved. The speaker discusses current models, upcoming larger models, and the massive computing power needed to train and run them.

Breaking AI’s 1 Gigahertz Barrier

Sunny Madra @sundeep / Groq
Watch it on YouTube | AI.Engineer Talk Details

This talk discusses the rapid advancement of Large Language Models (LLMs) and their potential to revolutionize computing, drawing parallels with the historical progress in microprocessor speeds. The speaker explores how increasingly faster and more efficient LLMs could lead to a paradigm shift in technology, enabling personalized experiences, universal natural language processing, advanced virtual assistants, and transformative applications across various industries.

Accelerating Mixture of Experts Training With Rail-Optimized InfiniBand Networking in Crusoe Cloud

Ievgen Vakulenko #evakulen / Crusoe
Watch it on YouTube | AI.Engineer Talk Details

This talk by Ievgen Vakulenko from Crusoe introduces their AI cloud platform, which focuses on providing high-performance, easy-to-use infrastructure for AI workloads while using renewable and stranded energy sources to minimize environmental impact. The speaker discusses Crusoe's approach to optimizing GPU networking, particularly highlighting their use of NVIDIA's PXN feature to improve communication between GPUs, resulting in significant performance gains for distributed AI training tasks.

⭐ Unveiling the latest Gemma model advancements

Kathleen Kenealy #kathleen-kenealy / Google
Watch it on YouTube | AI.Engineer Talk Details

This talk introduces Google DeepMind's latest advancements in their Gemma model family, particularly highlighting the release of Gemma 2 models (9B and 27B parameters) and PolyGemma for multimodal capabilities. The speaker emphasizes Gemma's focus on responsible AI, state-of-the-art performance, extensibility across various frameworks, and open access, encouraging developers to innovate and build upon these models in diverse applications.

Read our Deep Dive on this talk as well.

⭐ Making Open Models 10x faster and better for Modern Application Innovation

Lin Qiao @lqiao / Fireworks
Dmytro (Dima) Dzhulgakov @dzhulgakov / Fireworks
Watch it on YouTube | AI.Engineer Talk Details

This talk presents Fireworks AI, a platform focused on productionizing and customizing open-source AI models for efficient inference. The speaker highlights Fireworks' custom serving stack, support for various open-source models, and capabilities for fine-tuning and deploying models, emphasizing their focus on performance optimization and the development of compound AI systems that combine multiple models and external tools for more complex applications.

Read our Deep Dive on this talk as well.

Scott Wu and the Making of Devin by Cognition AI

Scott Wu @scottwu46 / Cognition (Devin)
Watch it on YouTube | AI.Engineer Talk Details

This talk by Scott from Cognition AI introduces Devin, an autonomous software engineering AI agent capable of building and modifying complex applications based on natural language instructions. The speaker discusses the potential of AI agents like Devin to transform software development, allowing engineers to focus more on high-level problem-solving while the AI handles implementation details, and explores the future implications for the software engineering profession.

GPU & Inference Track

GPU & Inference Track

Covalent Launch: The GPU Cheatcode: Fine-tune 20 Llama Models in 5 Minutes

Compute & System Design for Next Generation Frontier Models

Breaking AI’s 1 Gigahertz Barrier

Accelerating Mixture of Experts Training With Rail-Optimized InfiniBand Networking in Crusoe Cloud

⭐ Unveiling the latest Gemma model advancements

⭐ Making Open Models 10x faster and better for Modern Application Innovation

Scott Wu and the Making of Devin by Cognition AI

On this page