Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset collection forbid (#1883) #1884

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: 微信交流群
url: https://oss.laf.run/htr4n1-images/fastgpt-qr-code.jpg
- name: 飞书话题群
url: https://oss.laf.run/otnvvf-imgs/1719505774252.jpg
about: FastGPT 全是问题群
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,9 @@ https://github.com/labring/FastGPT/assets/15308462/7d3a38df-eb0e-4388-9250-2409b

## 🏘️ 社区交流群

wx 扫一下加入
扫码加入飞书话题群(新开,逐渐弃用微信群)

![](https://oss.laf.run/htr4n1-images/wechat-qr-code.jpg)
![](https://oss.laf.run/otnvvf-imgs/1719505774252.jpg)

<a href="#readme">
<img src="https://img.shields.io/badge/-返回顶部-7d09f1.svg" alt="#" align="right">
Expand Down
4 changes: 2 additions & 2 deletions docSite/content/zh-cn/docs/development/upgrading/484.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'V4.8.4'
title: 'V4.8.4(需要初始化)'
description: 'FastGPT V4.8.4 更新说明'
icon: 'upgrade'
draft: false
Expand Down Expand Up @@ -35,4 +35,4 @@ curl --location --request POST 'https://{{host}}/api/admin/init/484' \
6. 修复 - 定时执行初始化错误。
7. 修复 - 应用调用传参异常。
8. 修复 - ctrl + cv 复杂节点时,nodeId错误。
9. 调整组件库全局theme。
9. 调整组件库全局theme。
4 changes: 2 additions & 2 deletions docSite/content/zh-cn/docs/development/upgrading/485.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'V4.8.5'
title: 'V4.8.5(需要初始化)'
description: 'FastGPT V4.8.5 更新说明'
icon: 'upgrade'
draft: false
Expand Down Expand Up @@ -58,4 +58,4 @@ curl --location --request POST 'https://{{host}}/api/admin/init/485' \
12. 修复 - 定时任务无法实际关闭
13. 修复 - 输入引导特殊字符导致正则报错
14. 修复 - 文件包含特殊字符`%`,且为转义时会导致页面崩溃
15. 修复 - 自定义输入选择知识库引用时页面崩溃
15. 修复 - 自定义输入选择知识库引用时页面崩溃
14 changes: 14 additions & 0 deletions docSite/content/zh-cn/docs/development/upgrading/486.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: 'V4.8.6(进行中)'
description: 'FastGPT V4.8.6 更新说明'
icon: 'upgrade'
draft: false
toc: true
weight: 818
---


## V4.8.6 更新说明

1. 新增 - 知识库支持单个集合禁用功能
2.
11 changes: 11 additions & 0 deletions packages/global/core/dataset/collection/utils.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import { DatasetCollectionTypeEnum, TrainingModeEnum, TrainingTypeMap } from '../constants';
import { CollectionWithDatasetType, DatasetCollectionSchemaType } from '../type';

export const getCollectionSourceData = (
Expand All @@ -12,3 +13,13 @@ export const getCollectionSourceData = (
sourceName: collection?.name || ''
};
};

export const checkCollectionIsFolder = (type: DatasetCollectionTypeEnum) => {
return type === DatasetCollectionTypeEnum.folder || type === DatasetCollectionTypeEnum.virtual;
};

export const getTrainingTypeLabel = (type?: TrainingModeEnum) => {
if (!type) return '';
if (!TrainingTypeMap[type]) return '';
return TrainingTypeMap[type].label;
};
2 changes: 2 additions & 0 deletions packages/global/core/dataset/type.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ export type DatasetCollectionSchemaType = {
type: DatasetCollectionTypeEnum;
createTime: Date;
updateTime: Date;
forbid?: boolean;

trainingType: TrainingModeEnum;
chunkSize: number;
Expand Down Expand Up @@ -89,6 +90,7 @@ export type DatasetDataSchemaType = {
updateTime: Date;
q: string; // large chunks or question
a: string; // answer or custom content
forbid?: boolean;
fullTextToken: string;
indexes: DatasetDataIndexItemType[];
rebuilding?: boolean;
Expand Down
3 changes: 3 additions & 0 deletions packages/service/common/vectorStore/controller.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ export type InsertVectorControllerProps = InsertVectorProps & {
export type EmbeddingRecallProps = {
teamId: string;
datasetIds: string[];

forbidCollectionIdList: string[];
// forbidEmbIndexIdList: string[];
// similarity?: number;
// efSearch?: number;
};
Expand Down
9 changes: 7 additions & 2 deletions packages/service/common/vectorStore/milvus/class.ts
Original file line number Diff line number Diff line change
Expand Up @@ -213,14 +213,19 @@ export class MilvusCtrl {
};
embRecall = async (props: EmbeddingRecallCtrlProps): Promise<EmbeddingRecallResponse> => {
const client = await this.getClient();
const { teamId, datasetIds, vector, limit, retry = 2 } = props;
const { teamId, datasetIds, vector, limit, forbidCollectionIdList, retry = 2 } = props;

const forbidColQuery =
forbidCollectionIdList.length > 0
? `and (collectionId not in [${forbidCollectionIdList.map((id) => `"${String(id)}"`).join(',')}])`
: '';

try {
const { results } = await client.search({
collection_name: DatasetVectorTableName,
data: vector,
limit,
filter: `(teamId == "${teamId}") and (datasetId in [${datasetIds.map((id) => `"${String(id)}"`).join(',')}])`,
filter: `(teamId == "${teamId}") and (datasetId in [${datasetIds.map((id) => `"${String(id)}"`).join(',')}]) ${forbidColQuery}`,
output_fields: ['collectionId']
});

Expand Down
23 changes: 22 additions & 1 deletion packages/service/common/vectorStore/pg/class.ts
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,29 @@ export class PgVectorCtrl {
}
};
embRecall = async (props: EmbeddingRecallCtrlProps): Promise<EmbeddingRecallResponse> => {
const { teamId, datasetIds, vector, limit, retry = 2 } = props;
const { teamId, datasetIds, vector, limit, forbidCollectionIdList, retry = 2 } = props;

const forbidCollectionSql =
forbidCollectionIdList.length > 0
? `AND collection_id NOT IN (${forbidCollectionIdList.map((id) => `'${String(id)}'`).join(',')})`
: 'AND collection_id IS NOT NULL';
// const forbidDataSql =
// forbidEmbIndexIdList.length > 0 ? `AND id NOT IN (${forbidEmbIndexIdList.join(',')})` : '';

try {
// const explan: any = await PgClient.query(
// `BEGIN;
// SET LOCAL hnsw.ef_search = ${global.systemEnv?.pgHNSWEfSearch || 100};
// EXPLAIN ANALYZE select id, collection_id, vector <#> '[${vector}]' AS score
// from ${DatasetVectorTableName}
// where team_id='${teamId}'
// AND dataset_id IN (${datasetIds.map((id) => `'${String(id)}'`).join(',')})
// ${forbidCollectionSql}
// order by score limit ${limit};
// COMMIT;`
// );
// console.log(explan[2].rows);

const results: any = await PgClient.query(
`
BEGIN;
Expand All @@ -129,6 +149,7 @@ export class PgVectorCtrl {
from ${DatasetVectorTableName}
where team_id='${teamId}'
AND dataset_id IN (${datasetIds.map((id) => `'${String(id)}'`).join(',')})
${forbidCollectionSql}
order by score limit ${limit};
COMMIT;`
);
Expand Down
35 changes: 20 additions & 15 deletions packages/service/core/dataset/collection/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,15 @@ const DatasetCollectionSchema = new Schema({
type: Date,
default: () => new Date()
},
forbid: {
type: Boolean,
default: false
},

// chunk filed
trainingType: {
type: String,
enum: Object.keys(TrainingTypeMap),
required: true
enum: Object.keys(TrainingTypeMap)
},
chunkSize: {
type: Number,
Expand Down Expand Up @@ -91,23 +94,25 @@ const DatasetCollectionSchema = new Schema({
}
});

export const MongoDatasetCollection: Model<DatasetCollectionSchemaType> =
models[DatasetColCollectionName] || model(DatasetColCollectionName, DatasetCollectionSchema);

try {
// auth file
DatasetCollectionSchema.index({ teamId: 1, fileId: 1 }, { background: true });
DatasetCollectionSchema.index({ teamId: 1, fileId: 1 });

// list collection; deep find collections
DatasetCollectionSchema.index(
{
teamId: 1,
datasetId: 1,
parentId: 1,
updateTime: -1
},
{ background: true }
);
DatasetCollectionSchema.index({
teamId: 1,
datasetId: 1,
parentId: 1,
updateTime: -1
});

// get forbid
// DatasetCollectionSchema.index({ teamId: 1, datasetId: 1, forbid: 1 });

MongoDatasetCollection.syncIndexes({ background: true });
} catch (error) {
console.log(error);
}

export const MongoDatasetCollection: Model<DatasetCollectionSchemaType> =
models[DatasetColCollectionName] || model(DatasetColCollectionName, DatasetCollectionSchema);
32 changes: 16 additions & 16 deletions packages/service/core/dataset/data/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,27 +77,27 @@ const DatasetDataSchema = new Schema({
rebuilding: Boolean
});

export const MongoDatasetData: Model<DatasetDataSchemaType> =
models[DatasetDataCollectionName] || model(DatasetDataCollectionName, DatasetDataSchema);

try {
// list collection and count data; list data; delete collection(relate data)
DatasetDataSchema.index(
{ teamId: 1, datasetId: 1, collectionId: 1, chunkIndex: 1, updateTime: -1 },
{ background: true }
);
DatasetDataSchema.index({
teamId: 1,
datasetId: 1,
collectionId: 1,
chunkIndex: 1,
updateTime: -1
});
// full text index
DatasetDataSchema.index({ teamId: 1, datasetId: 1, fullTextToken: 'text' }, { background: true });
DatasetDataSchema.index({ teamId: 1, datasetId: 1, fullTextToken: 'text' });
// Recall vectors after data matching
DatasetDataSchema.index(
{ teamId: 1, datasetId: 1, collectionId: 1, 'indexes.dataId': 1 },
{ background: true }
);
DatasetDataSchema.index({ updateTime: 1 }, { background: true });
DatasetDataSchema.index({ teamId: 1, datasetId: 1, collectionId: 1, 'indexes.dataId': 1 });
DatasetDataSchema.index({ updateTime: 1 });
// rebuild data
DatasetDataSchema.index({ rebuilding: 1, teamId: 1, datasetId: 1 }, { background: true });
DatasetDataSchema.index({ rebuilding: 1, teamId: 1, datasetId: 1 });

MongoDatasetData.syncIndexes({ background: true });
} catch (error) {
console.log(error);
}

export const MongoDatasetData: Model<DatasetDataSchemaType> =
models[DatasetDataCollectionName] || model(DatasetDataCollectionName, DatasetDataSchema);

MongoDatasetData.syncIndexes();
Loading
Loading