1 LLM: one piece of a larger puzzle of AI
Artificial Intelligence (AI) refers to a system’s ability to accurately interpret external data, learn from it, and apply that learning to achieve specific goals through flexible adaptation. The concept of AI emerged in the 1950s, leading to an initial “golden age” before 1970. In the following decades, the success of systems like Deep Blue and AlphaGo in specific domains reinforced the value of using AI algorithms to solve well-defined problems.
With the rapid rise of Large Language Models (LLMs), such as GPT and its successors, a second golden age of AI has clearly arrived. LLMs have brought unprecedented advancements in natural language understanding and generation, and their applications are now among the hottest topics in both research and industry.
However, this success has also led to a common misconception: that LLMs are synonymous with AI itself. In reality, LLMs are just one powerful example within the broader AI landscape. While they specialize in language-related tasks, AI as a whole spans many other domains—such as vision, planning, reasoning, robotics, and autonomous decision-making. In practical systems, LLMs are often integrated with other AI components to tackle complex, multi-modal tasks. Thus, LLMs should be viewed as a specialized tool within a much larger and more diverse AI ecosystem.
Therefore, we, as the PhoebeDB engineering team, prefer to consider the role of DBMS in the context of the current LLM era from the broader perspective of AI as a comprehensive system.
2 LLM systems leverage DBMS as key stone
There are several typical scenarios that DBMS deployments are involved as key stones, and we can study them to conclude the relationship between modern applications especially AI applications and DBMS.
Training LLMs. While a DBMS is not directly involved in the numerical training of LLMs, it plays a critical role in the supporting data infrastructure—enabling dataset curation, version tracking, monitoring, and efficient retrieval. These functions are essential for building reliable, scalable, and reproducible LLM training workflows. For example, organizations like OpenAI, Google DeepMind, Meta, and Anthropic commonly use systems such as PostgreSQL or BigQuery to manage curated corpora, track dataset versions, and support internal audits—key components of responsible LLM development. In this context, multi-model DBMS capabilities become essential, as LLM training systems often need to handle structured metadata, unstructured content, vector representations, and graph-based relationships in an integrated manner.
RAG system. In a RAG system, DBMSs are critical for enabling intelligent and controlled retrieval, combining semantic similarity (vector DBs) with structured filters (relational DBs), and optionally supporting rich relationships (graph DBs). This retrieval layer is what grounds LLM outputs in accurate, up-to-date, and domain-specific knowledge.
AI agent. An AI agent using an LLM acts autonomously or semi-autonomously to perform tasks. In AI agent systems, DBMSs act as persistent, structured memory and control infrastructure — supporting long-term memory, tool use, task planning, prompt engineering, and auditability. This structured foundation enables LLM agents to go beyond one-shot completions and operate autonomously over time and across tasks.
Integrating with tradition applications. LLMs integrate into enterprise applications by serving as natural-language interfaces, intelligent content processors, or reasoning layers—enhancing how users interact with data and systems. However, these capabilities rely fundamentally on traditional DBMSs, which continue to manage the enterprise’s core structured and unstructured data. In addition to their established strengths, modern use cases introduce new requirements for DBMSs, such as hybrid query support, vector search, fine-grained access control, and real-time responsiveness. Despite the rise of LLMs, the DBMS remains the backbone of enterprise data infrastructure—now evolving to support a more intelligent and AI-driven future.
In conclusion, across both the LLM training pipeline and a wide range of inference-driven applications, the DBMS remains a foundational component at the backend. Its capabilities in managing data reliably, efficiently, and at scale are essential. As AI applications evolve, the importance of DBMSs is further reinforced. They are now expected to support emerging computational workloads—such as tensor data handling—and to deliver higher performance and adaptability to meet the growing demands of AI-integrated systems.
3 DBMS is Expected More
When considering the actual technical requirements of a DBMS, we approach the problem from the perspective of AI—grounded in both its proven successes and its most promising current advancement: large language models (LLMs).
- Multi-Model Data Handling. Multi-model data management is important for modern LLM applications because these systems increasingly need to interact with and reason over diverse types of data—not just structured tables, but also unstructured text, vectors, documents, and even graph relationships. A multi-model DBMS allows these different data types to be managed within a single, unified system, which improves efficiency, consistency, and scalability.
- Real-time HTAP high performance accessing. Real-time and low-latency access is critical for DBMSs in applications including LLM applications, where timely data retrieval directly impacts user experience. In systems like RAG-based chatbots or AI customer assistants, the DBMS must quickly return relevant documents, user history, or analytics results—often combining semantic vector search with structured filters. These use cases reflect real-time HTAP (Hybrid Transactional and Analytical Processing) workloads, where LLMs depend on both fresh transactional data and fast analytical queries. Without fast DB access, the responsiveness and effectiveness of the LLM system degrade significantly. As a result, low-latency, hybrid-capable DBMSs are essential infrastructure for modern AI applications.
- Fine-Grained Versioning and Time Travel. Fine-grained versioning and time travel are critical for LLM applications, enabling reproducibility, auditability, and consistent retrieval. They allow systems to trace responses back to the exact data and context used—supporting reliable evaluation, debugging, and compliance. These features also strengthen DBMS resilience for disaster recovery, enabling point-in-time restores, partial rollbacks, safer schema changes, and root-cause analysis after failures. In LLM-powered systems, a version-aware, time-travel-capable DBMS is essential for maintaining accuracy, accountability, and stability.
- Scalable Feature Store Integration. In machine learning systems—especially those supporting LLMs, recommendation engines, or real-time prediction services—a feature store acts as a centralized repository for managing and serving model features. To support this effectively, a DBMS must integrate with feature stores in a scalable and efficient way, handling both offline training needs and online inference workloads. Scalable feature store integration means the DBMS must support high-performance storage, versioning, and retrieval of machine learning features across both training and inference pipelines. It ensures consistency between training and production, supports low-latency lookups, enables versioned access for reproducibility, and plays a critical role in enabling reliable, real-time AI applications.
- In-DB Compute/UDFs/Model Inference. In-DB compute, UDFs, and model inference are foundational capabilities for DBMSs in AI-centric systems. They enable fast, flexible, and intelligent interactions with data by integrating ML/LLM operations directly into SQL workflows—allowing databases to become active participants in inference, not just passive data stores. This capability is especially valuable in applications like retrieval-augmented generation (RAG), LLM-driven agents, adaptive recommendation systems, and real-time customer support automation.
- Scalability and Cost Efficiency. Scalability ensures that the DBMS can handle growth. Cost efficiency ensures it can do so wisely. In the context of LLM and AI systems, cost efficiency goes far beyond simply extending storage or compute. A DBMS designed for modern AI workloads must scale not just in volume, but in variety, velocity, and value—while keeping costs under control through design choices that support multi-purpose reuse, smart resource allocation, and low-friction operations.
4 Summary
The modern AI era—especially one shaped by LLMs—demands more from DBMSs than ever before. They are no longer just data storage engines, but must act as intelligent, scalable, and AI-aware systems that power the entire lifecycle of LLM development, deployment, and interaction. As such, the future of AI depends not only on more powerful models, but also on more capable, adaptive, and integrated DBMSs.
In addition to scalability and performance, cost-efficiency has become a defining requirement. A modern DBMS must not only scale with data volume and compute demand, but do so intelligently and sustainably—avoiding wasteful data movement, supporting reuse across applications, and enabling resource-aware query execution etc. A truly AI-ready DBMS must therefore provide best performance, flexibility, and operational efficiency to support long-term, economically viable AI systems.