Releases: OpenSPG/KAG
Version 0.6
Version 0.6 (2025-01-07)
On January 7, 2025, OpenSPG officially released version 0.6, bringing updates across multiple areas, including domain knowledge mounting, vertical domain schema management, visual knowledge exploration, and support for summary generation tasks. In terms of user experience, it offers a mechanism for resuming knowledge base tasks from breakpoints, introduces a user login and permission system, and optimizes task scheduling for building processes. In developer mode, it supports configuring different models for different stages and enables schema-constraint mode for extraction, significantly enhancing the system's flexibility, usability, performance, and security. This release provides users with a more powerful knowledge management platform that adapts to diverse application scenarios.
🌟 New Features
-
Support for Summary Generation Tasks
- Native support for abstractive summarization tasks without sacrificing multi-hop factual reasoning accuracy. On the CSQA dataset, while comprehensiveness, diversity, and empowerment metrics are slightly lower than LightRAG (-1.2/10), the factual accuracy metric is better than LightRAG (+0.1/10). On multi-hop question answering datasets such as HotpotQA, TwoWiki, and MuSiQue, since LightRAG and GraphRAG do not provide a factual QA evaluation entry, the EM metric using the default entry is close to 0. For quantitative evaluation results, please refer to the KAG code repository under
examples/csqa/README.md
and follow the steps to reproduce.
- Native support for abstractive summarization tasks without sacrificing multi-hop factual reasoning accuracy. On the CSQA dataset, while comprehensiveness, diversity, and empowerment metrics are slightly lower than LightRAG (-1.2/10), the factual accuracy metric is better than LightRAG (+0.1/10). On multi-hop question answering datasets such as HotpotQA, TwoWiki, and MuSiQue, since LightRAG and GraphRAG do not provide a factual QA evaluation entry, the EM metric using the default entry is close to 0. For quantitative evaluation results, please refer to the KAG code repository under
-
Domain Schema Management
- The product provides SPG schema management capabilities, allowing users to optimize knowledge base construction and inference Q&A performance by customizing schemas.
-
Knowledge Exploration
- Added a knowledge exploration feature to enable visual query and analysis of knowledge base data, and provided an HTTP API for integration with other systems.
-
Support for Mounting Domain Knowledge in KAG-Builder(Developer Mode)
- In developer mode, the system supports injecting domain knowledge (domain vocabulary, relationships between terms) into the knowledge base, which can significantly improve knowledge base construction and inference Q&A performance (with a 10%+ improvement in the medical domain).
-
Adding Knowledge Alignment Component to the KAG-Builder Pipeline
- Kag-Builder provides a default knowledge alignment component that includes features such as filtering out invalid data and linking similar entities. This optimizes the structure and data quality of the graph.
⚙️ User Experience Optimizations
-
Resumable Tasks
- Provide resumable capabilities for knowledge base construction tasks at the file level and chunk level in both product mode and developer mode, to reduce the time and token consumption caused by full re-runs after task failures.
-
User Login & Permission System
- Implement a user login and permission system to prevent unauthorized access and operations on the knowledge base data.
-
Optimized Knowledge Base Construction Task Scheduling
- Provide database-based knowledge base construction task scheduling to avoid task anomalies or interruptions after container restarts.
-
Support for Configuring Different Models at Different Stages (Developer Mode)
- The system provides a component management mechanism based on a registry, allowing users to instantiate component objects via configuration files. This supports users in developing and embedding custom components into the KAG-Builder and KAG-Solver workflows. Additionally, it enables the configuration of different-sized models at different stages of the workflow, thereby enhancing the overall reasoning and question-answering performance.
-
Optimization of Layout Analysis for Markdown, PDF, and Word Files
- For Markdown, PDF, and Word files, the system prioritizes dividing the content into chunks based on the file's sections. This ensures that the content within each chunk is more cohesive.**
-
Global Configuration and Knowledge Base Configuration
- Provide global configuration for the knowledge base, allowing unified settings for storage engines, generation models, and representation model access information.
-
Support for Schema-Constrained Extraction and Linking (Developer Mode)
- Provide a schema-constraint mode that strictly adheres to schema definitions during the knowledge base construction phase, enabling finer-grained and more complex knowledge extraction.
Version 0.6 (2025-01-07)
2025 年 1 月 7 日,OpenSPG 正式发布 v0.6 版本,此次发布带来多方面更新,包括领域知识挂载、垂域schema 管理、可视化知识探查、摘要生成类任务支持等;用户体验上,提供知识库任务的断点续跑机制,新增用户登录与权限体系、优化构建任务调度;开发者模式下支持不同阶段配置不同模型、支持 schema-constraint 模式抽取等,极大地提升了系统的灵活性、易用性、性能和安全性,为用户提供了一个更加强大且适应多样化应用场景的知识管理平台。
🌟 新增功能
-
摘要生成类任务支持
- 不牺牲多跳事实推理精度的情况下,原生支持摘要生成任务。
在CSQA 数据集上,全面性、多样性、赋权性 等指标弱于LightRAG (-1.2/10)情况下,事实性指标优于 LightRAG(+0.1/10);在hotpotqa, twowiki, musique 等多跳问答数据集上,鉴于LightRAG & GraphRAG均未提供事实问答的测评入口,使用默认入口测试EM指标接近0。
KAG 量化评测结果,可参考 KAG 代码仓库 examples/csqa/READEME.md 按步骤复现。
- 不牺牲多跳事实推理精度的情况下,原生支持摘要生成任务。
-
领域 Schema 管理
- 产品侧提供spg schema 管理能力,支持用户根据通过自定义schema 以优化知识库构建&推理问答的效果。
-
知识探查
- 新增知识探查功能,实现知识库数据的可视化查询分析,并提供HttpAPI 与其它系统对接。
-
知识库构建支持挂载领域知识 (开发者模式)
- 开发者模式下,支持将领域知识(领域词汇、词条间关系)注入知识库中,可显著提升知识库构建、推理问答效果(医疗场景下有10%+ 的提升)。
-
构建链路增加知识对齐组件
- Kag-Builder 提供默认的知识对齐组件,并内嵌无效数据过滤、相似实体链指等功能,以优化图谱的结构和数据质量。
⚙️ 用户体验优化
-
断点续跑
- 产品模式、开发者模式下,分别提供文件级别、Chunk 级别的知识库构建任务的断点续跑能力,以降低任务失败后全量重跑所带来的时间和tokens 消耗。
-
用户登录&权限体系
- 提供 用户登录&权限体系,防止未经授权的知识库数据访问和操作。
-
知识库构建任务调度优化
- 提供基于数据库的知识库构建任务调度能力,避免容器重启后任务异常或者中断。
-
支持不同阶段配置不同模型(开发者模式)
- 提供基于注册器的组件管理机制,允许用户通过配置文件实例化组件对象,支持用户开发&嵌入自定义组件到KAG-Builder、KAG-Solver 工作流 中,同时在工作流的不同阶段配置不同规模的大模型,以提升整体的推理问答性能。
-
Markdown、PDF、Word 文件版面分析优化
- Markdown、pdf、word 等文件优先根据文件章节划分Chunk,以实现同一chunk 的内容更内聚。
-
项目全局配置及知识库配置
- 提供知识库全局配置功能,统一设置存储引擎、生成模型、表示模型的访问信息。
-
支持 schema-constraint 模式的抽取链接(开发者模式)
- 提供schema-constraint 模式,知识库构建阶段,严格按照 Schema 的定义进行操作,从而实现更细粒度和更复杂的知识抽取。
Version 0.5.1
Version 0.5.1 (2024-11-21)
OpenSPG released version v0.5.1 on November 21, 2024. This version focuses on addressing user feedback and introduces a series of new features and user experience optimizations.
🌟 New Features
- Support for Word Documents
- Users can now directly upload
.doc
or.docx
files to streamline the knowledge base construction process.
- Users can now directly upload
- New Project Deletion API
- Quickly clear and delete projects and related data through an API, compatible with the latest Neo4j image version.
- Model Call Concurrency Setting
- Added the
builder.model.execute.num
parameter, with a default concurrency of 5, to improve efficiency in large-scale knowledge base construction.
- Added the
- Improved Logging
- Added a startup success marker in the logs to help users quickly verify if the service is running correctly.
⚙️ User Experience Optimizations
- Neo4j Memory Overflow Issues
- Addressed memory overflow problems in Neo4j during large-scale data processing, ensuring stable operation for extensive datasets.
- Concurrent Neo4j Query Execution Issues
- Optimized execution strategies to resolve Graph Data Science (GDS) library conflicts or failures in high-concurrency scenarios.
- Schema Preview Prefix Issue
- Fixed issues where extracted schema preview entities lacked necessary prefixes, ensuring consistency between extracted entities and predefined schemas.
- Default Neo4j Password for Project Creation/Modification
- Automatically fills a secure default password if none is specified during project creation or modification, simplifying the configuration process.
- Frontend Bug Fixes
- Resolved issues with JS dependencies relying on external addresses and embedded all frontend files into the image. Improved the knowledge base management interface for a smoother user experience.
- Empty Node/Edge Type in Neo4j Writes
- Enhanced writing logic to handle empty node or edge types during knowledge graph construction, preventing errors or data loss in such scenarios.
Version 0.5.1 (2024-11-21)
OpenSPG 在 2024 年 11 月 21 日发布了 v0.5.1 版本。此版本重点解决了用户反馈的问题,并带来了一系列新功能和用户体验的优化。
🌟 新增功能
- 支持 word 文档的构建
- 用户现可通过知识库管理页面直接上传 .doc 或 .docx 后缀的文件,进行知识库的构建流程。这一更新使得知识内容的导入更加便捷,提高效率。
- 提供项目删除接口
- 为了帮助用户更高效地管理项目,我们新增了一个项目删除接口。用户可以通过访问 http://127.0.0.1:8887/project/api/delete?projectId=xx 完成项目的快速清空与删除操作。该接口会同步清理项目下的所有schema、知识库任务、知识库问答任务以及关联的 Neo4j 数据库。
Tips:使用此功能前,需确保已将 openspg-neo4j 镜像更新至最新版本
- 为了帮助用户更高效地管理项目,我们新增了一个项目删除接口。用户可以通过访问 http://127.0.0.1:8887/project/api/delete?projectId=xx 完成项目的快速清空与删除操作。该接口会同步清理项目下的所有schema、知识库任务、知识库问答任务以及关联的 Neo4j 数据库。
- 支持模型调用并发度设置
- 在大规模知识库构建过程中,为了提高构建效率,我们引入了模型调用的并发控制机制。用户可以通过设置 builder.model.execute.num 参数来调整并发数量,默认值设定为5。这有助于避免因模型服务性能瓶颈而导致的任务失败或系统卡顿。
- 日志中添加启动成功标识
- 为了让用户能够更直观地判断 OpenSPG 服务是否启动成功,我们在日志输出中加入了明确的启动成功标识。openspg-server 成功启动后,会输出这一标识。
⚙️ 用户体验优化
- 解决大规模数据构建下 Neo4j 调用内存超限问题
- 针对在处理大规模数据集时出现的 Neo4j 内存溢出问题,我们进行了深入分析并实施了有效的解决方案。现在,面对大规模数据集Neo4j 能保持稳定运行,有效防止了因内存不足而导致的服务中断。
- 解决多并发下执行 Neo4j 查询导致的 GDS 加载问题
- 在多并发场景下执行 Neo4j 查询时,图数据科学 (GDS) 库的加载会出现冲突或失败的情况。为此,我们优化了查询执行策略,确保了在高并发环境下的查询性能和稳定性。
- 解决抽取结果 Schema 预览实体无前缀问题
- 在之前版本中,部分用户反馈在查看抽取结果的 Schema 预览时,实体名称缺少必要的前缀信息导致抽取的实体和预定义的Schema不一致。此次更新修正了这一问题,保证了所有实体名称的完整性和准确性。
- 创建修改项目时 Neo4j 无密码时填充默认值
- 当用户在创建或修改项目时,如果未指定 Neo4j 密码,系统将自动填充一个安全的默认值,从而简化了配置流程,减少了用户的输入负担。
- 前端 bugfix
- 修复了JS依赖外部地址问题,已将前端文件全部内置到镜像内;同时针对知识库管理页面进行了多项改进,以提供更加流畅的操作体验。
- 解决点边类型为空导致的 Neo4j 写入失败问题
- 对于在构建知识图谱时可能出现的节点或关系类型为空的情况,我们优化了写入逻辑,确保即便在这些特殊情况下也能顺利完成数据的写入操作,避免了因类型缺失而引发的数据丢失或错误。
Version 0.5
Version 0.5 (2024-10-25)
retrieval Augmentation Generation (RAG) technology promotes the integration of domain applications with large models. However, RAG has problems such as a large gap between vector similarity and knowledge reasoning correlation, and insensitivity to knowledge logic (such as numerical values, time relationships, expert rules, etc.), which hinder the implementation of professional knowledge services. On October 25, officially releasing the professional domain knowledge Service Framework for knowledge enhancement generation (KAG) .
Highlights of the Release Version:
1. KAG: Knowledge Augmented Generation
KAG aims to make full use of the advantages of Knowledge Graph and vector retrieval, and bi-directionally enhance large language models and knowledge graphs through four aspects to solve RAG challenges
(1) LLM-friendly semantic knowledge management
(2) Mutual indexing between the knowledge map and the original snippet.
(3) Logical symbol-guided hybrid inference engine
(4) Knowledge alignment based on semantic reasoning
KAG is significantly better than NaiveRAG, HippoRAG and other methods in multi-hop question and answer tasks. The F1 score on hotpotQA is relatively improved by 19.6, and the F1 score on 2wiki is relatively improved by 33.5
The KAG framework includes three parts: kg-builder, kg-solver, and kag-model. This release only involves the first two parts, kag-model will be gradually open source release in the future.
kg-builder
implements a knowledge representation that is friendly to large-scale language models (LLM). Based on the hierarchical structure of DIKW (data, information, knowledge and wisdom), IT upgrades SPG knowledge representation ability, and is compatible with information extraction without schema constraints and professional knowledge construction with schema constraints on the same knowledge type (such as entity type and event type), it also supports the mutual index representation between the graph structure and the original text block, which supports the efficient retrieval of the reasoning question and answer stage.
kg-solver
uses a logical symbol-guided hybrid solving and reasoning engine that includes three types of operators: planning, reasoning, and retrieval, to transform natural language problems into a problem-solving process that combines language and symbols. In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation.
Version 0.5 (2024-10-25)
检索增强生成(RAG)技术推动了领域应用与大模型结合。然而,RAG 存在着向量相似度与知识推理相关性差距大、对知识逻辑(如数值、时间关系、专家规则等)不敏感等问题,这些都阻碍了专业知识服务的落地。10月25日正式发布了知识增强生成(KAG)的专业领域知识服务框架
版本亮点
1. KAG 专业领域知识服务框架
KAG 旨在充分利用知识图谱和向量检索的优势,并通过四个方面双向增强大型语言模型和知识图谱,以解决 RAG 挑战
(1) 对 LLM 友好的语义化知识管理
(2) 知识图谱与原文片段之间的互索引
(3) 逻辑符号引导的混合推理引擎
(4) 基于语义推理的知识对齐
KAG 在多跳问答任务中显著优于 NaiveRAG、HippoRAG 等方法,在 hotpotQA 上的 F1 分数相对提高了 19.6%,在 2wiki 上的 F1 分数相对提高了33.5%
kag 框架包括 kg-builder、kg-solver、kag-model 三部分。本次发布只涉及前两部分,kag-model 将在后续逐步开源发布。
kg-builder
实现了一种对大型语言模型(LLM)友好的知识表示,在 DIKW(数据、信息、知识和智慧)的层次结构基础上,升级 SPG 知识表示能力,在同一知识类型(如实体类型、事件类型)上兼容无 schema 约束的信息提取和有 schema 约束的专业知识构建,并支持图结构与原始文本块之间的互索引表示,为推理问答阶段的高效检索提供支持。
kg-solver
采用逻辑符号引导的混合求解和推理引擎,该引擎包括三种类型的运算符:规划、推理和检索,将自然语言问题转化为结合语言和符号的问题求解过程。在这个过程中,每一步都可以利用不同的运算符,如精确匹配检索、文本检索、数值计算或语义推理,从而实现四种不同问题求解过程的集成:检索、知识图谱推理、语言推理和数值计算。