Skip to content

Commit

Permalink
feat(interactive): Support more data types for vertex/edge property (#…
Browse files Browse the repository at this point in the history
…3346)

Add more data type support for vertex/edge property, now we support the
following types.
- DT_SIGNED_INT32
- DT_UNSIGNED_INT32
- DT_SIGNED_INT64
- DT_UNSIGNED_INT64
- DT_BOOL
- DT_FLOAT
- DT_DOUBLE
- DT_STRING
- DT_DATE32

The vertex property column of following types can be used as primary key
column
- DT_SIGNED_INT32
- DT_UNSIGNED_INT32
- DT_SIGNED_INT64
- DT_UNSIGNED_INT64
- DT_STRING
  • Loading branch information
zhanglei1949 authored Nov 13, 2023
1 parent d386676 commit df64984
Show file tree
Hide file tree
Showing 28 changed files with 1,116 additions and 297 deletions.
23 changes: 20 additions & 3 deletions .github/workflows/flex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,12 @@ jobs:
export FLEX_DATA_DIR=../../../../interactive/examples/modern_graph/
./run_grin_test 'flex://schema_file=../../../../interactive/examples/modern_graph/modern_graph.yaml&bulk_load_file=../../../../interactive/examples/modern_graph/bulk_load.yaml'
- name: Prepare test dataset
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest/
run: |
git clone -b master --single-branch --depth=1 https://github.com/GraphScope/gstest.git ${GS_TEST_DIR}
- name: Test Graph Loading on modern graph
env:
FLEX_DATA_DIR: ${{ github.workspace }}/flex/interactive/examples/modern_graph/
Expand All @@ -67,16 +73,27 @@ jobs:
BULK_LOAD_FILE=../interactive/examples/modern_graph/bulk_load.yaml
GLOG_v=10 ./bin/graph_loader ${SCHEMA_FILE} ${BULK_LOAD_FILE} /tmp/csr-data-dir/
- name: Test Graph Loading on type_test graph
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest/
FLEX_DATA_DIR: ${{ github.workspace }}/gstest/flex/type_test/
run: |
# remove modern graph indices
rm -rf /tmp/csr-data-dir/
cd ${GITHUB_WORKSPACE}/flex/build/
SCHEMA_FILE=${GS_TEST_DIR}/type_test/graph.yaml
BULK_LOAD_FILE=${GS_TEST_DIR}/type_test/import.yaml
GLOG_v=10 ./bin/graph_loader ${SCHEMA_FILE} ${BULK_LOAD_FILE} /tmp/csr-data-dir/ 2
- name: Test Graph Loading on LDBC SNB sf0.1
env:
GS_TEST_DIR: ${{ github.workspace }}/gstest/
FLEX_DATA_DIR: ${{ github.workspace }}/gstest/flex/ldbc-sf01-long-date/
run: |
# remove modern graph indices
# remove previous graph indices
rm -rf /tmp/csr-data-dir/
git clone -b master --single-branch --depth=1 https://github.com/GraphScope/gstest.git ${GS_TEST_DIR}
cd ${GITHUB_WORKSPACE}/flex/build/
SCHEMA_FILE=${FLEX_DATA_DIR}/audit_graph_schema.yaml
BULK_LOAD_FILE=${FLEX_DATA_DIR}/audit_bulk_load.yaml
Expand Down
27 changes: 16 additions & 11 deletions docs/flex/interactive/data_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,24 @@ The graph data encompasses two fundamental elements:

Note: This graph model aligns with the property graph model, which offers a detailed explanation [here](https://subscription.packtpub.com/book/data/9781784393441/1/ch01lvl1sec09/the-property-graph-model). However, it's essential to note our terminology: we use "vertex" instead of "node" and "edge" instead of "relationship", and we only support **directed** edge instead of both directed and undirected edge.

Within the `gs_interactive_image.yaml` file, vertices are delineated under the `vertex_types` section. Each vertex type is structured with mandatory fields: `type_name`, `properties`, and `primary_keys`. For instance:
Within the `graph.yaml` file, vertices are delineated under the `vertex_types` section. Each vertex type is structured with mandatory fields: `type_name`, `properties`, and `primary_keys`. For instance:

```yaml
- type_name: person
properties:
- property_name: id
property_data_type:
primitive_type: DT_SIGNED_INT64
property_type:
primitive_type: DT_SIGNED_INT64
- property_name: name
property_data_type:
primitive_type: DT_STRING
property_type:
primitive_type: DT_STRING
primary_keys: # these must also be listed in the properties
- id
```
Note: In the current version, only one single primary key can be specified, but we plan to support multiple primary keys in the future.
Note:
- In the current version, only one single primary key can be specified, but we plan to support multiple primary keys in the future.
- The data type of primary key column must be one of `DT_SIGNED_INT32`, `DT_UNSIGNED_INT32`, `DT_SIGNED_INT64` or `DT_UNSIGNED_INT64`.

Edges are defined within the `edge_types` section, characterized by the mandatory fields: `type_name`, `vertex_type_pair_relations`, and `properties`. The type_name and properties fields function similarly to those in vertices. However, the vertex_type_pair_relations field is exclusive to edges, specifying the permissible source and destination vertex types, as well as the relationship detailing how many source and destination vertices can be linked by this edge. Here's an illustrative example:
```yaml
Expand All @@ -41,6 +43,7 @@ vertex_type_pair_relations:
Note:
- A single edge type can have multiple `vertex_type_pair_relations`. For instance, a "knows" edge might connect one person to another, symbolizing their friendship. Alternatively, it could associate a person with a skill, indicating their proficiency in that skill.
- The permissible relations include: `ONE_TO_ONE`, `ONE_TO_MANY`, `MANY_TO_ONE`, and `MANY_TO_MANY`. These relations can be utilized by the optimizer to generate more efficient execution plans.
- Currently we only support at most one property for each edge triplet.


## Entity Data
Expand All @@ -58,22 +61,24 @@ Entity data pertains to the properties associated with vertices and edges. In Gr
- DT_STRING
- DT_DATE32

In the `gs_interactive_image.yaml`, a primitive type, such as `DT_STRING`, can be written as:
In the `graph.yaml`, a primitive type, such as `DT_STRING`, can be written as:
```yaml
property_data_type:
property_type:
primitive_type: DT_STRING
```

Please note that we currently do not support the use of string data type for properties on edges.

### Array Types

Array types are currently not supported, but are planned to be supported in the near future.
Once supported, albeit requiring that every element within the array adheres to one of the previously mentioned primitive types.
It's crucial that all elements within a single array share the same type. In `gs_interactive_image.yaml`, user can describe designating a property as an array of the `DT_STRING` type as:
It's crucial that all elements within a single array share the same type. In `graph.yaml`, user can describe designating a property as an array of the `DT_STRING` type as:

```yaml
property_data_type:
property_type:
array:
component_type:
primitive_type: DT_UNSIGNED_INT64
max_length: 10 # overflowed elements will be truncated
```
```
15 changes: 15 additions & 0 deletions flex/engines/graph_db/database/graph_db_session.cc
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,21 @@ std::shared_ptr<RefColumnBase> GraphDBSession::get_vertex_id_column(
return std::make_shared<TypedRefColumn<int64_t>>(
dynamic_cast<const TypedColumn<int64_t>&>(
db_.graph().lf_indexers_[label].get_keys()));
} else if (db_.graph().lf_indexers_[label].get_type() ==
PropertyType::kInt32) {
return std::make_shared<TypedRefColumn<int32_t>>(
dynamic_cast<const TypedColumn<int32_t>&>(
db_.graph().lf_indexers_[label].get_keys()));
} else if (db_.graph().lf_indexers_[label].get_type() ==
PropertyType::kUInt64) {
return std::make_shared<TypedRefColumn<uint64_t>>(
dynamic_cast<const TypedColumn<uint64_t>&>(
db_.graph().lf_indexers_[label].get_keys()));
} else if (db_.graph().lf_indexers_[label].get_type() ==
PropertyType::kUInt32) {
return std::make_shared<TypedRefColumn<uint32_t>>(
dynamic_cast<const TypedColumn<uint32_t>&>(
db_.graph().lf_indexers_[label].get_keys()));
} else if (db_.graph().lf_indexers_[label].get_type() ==
PropertyType::kString) {
return std::make_shared<TypedRefColumn<std::string_view>>(
Expand Down
24 changes: 24 additions & 0 deletions flex/engines/graph_db/database/transaction_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,15 @@ namespace gs {

inline void serialize_field(grape::InArchive& arc, const Any& prop) {
switch (prop.type) {
case PropertyType::kBool:
arc << prop.value.b;
break;
case PropertyType::kInt32:
arc << prop.value.i;
break;
case PropertyType::kUInt32:
arc << prop.value.ui;
break;
case PropertyType::kDate:
arc << prop.value.d.milli_second;
break;
Expand All @@ -40,19 +46,31 @@ inline void serialize_field(grape::InArchive& arc, const Any& prop) {
case PropertyType::kInt64:
arc << prop.value.l;
break;
case PropertyType::kUInt64:
arc << prop.value.ul;
break;
case PropertyType::kDouble:
arc << prop.value.db;
break;
case PropertyType::kFloat:
arc << prop.value.f;
break;
default:
LOG(FATAL) << "Unexpected property type";
}
}

inline void deserialize_field(grape::OutArchive& arc, Any& prop) {
switch (prop.type) {
case PropertyType::kBool:
arc >> prop.value.b;
break;
case PropertyType::kInt32:
arc >> prop.value.i;
break;
case PropertyType::kUInt32:
arc >> prop.value.ui;
break;
case PropertyType::kDate:
arc >> prop.value.d.milli_second;
break;
Expand All @@ -64,9 +82,15 @@ inline void deserialize_field(grape::OutArchive& arc, Any& prop) {
case PropertyType::kInt64:
arc >> prop.value.l;
break;
case PropertyType::kUInt64:
arc >> prop.value.ul;
break;
case PropertyType::kDouble:
arc >> prop.value.db;
break;
case PropertyType::kFloat:
arc >> prop.value.f;
break;
default:
LOG(FATAL) << "Unexpected property type";
}
Expand Down
21 changes: 20 additions & 1 deletion flex/engines/graph_db/database/update_transaction.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,15 @@ UpdateTransaction::UpdateTransaction(MutablePropertyFragment& graph,
if (graph_.lf_indexers_[idx].get_type() == PropertyType::kInt64) {
added_vertices_.emplace_back(
std::make_shared<IdIndexer<int64_t, vid_t>>());
} else if (graph_.lf_indexers_[idx].get_type() == PropertyType::kUInt64) {
added_vertices_.emplace_back(
std::make_shared<IdIndexer<uint64_t, vid_t>>());
} else if (graph_.lf_indexers_[idx].get_type() == PropertyType::kInt32) {
added_vertices_.emplace_back(
std::make_shared<IdIndexer<int32_t, vid_t>>());
} else if (graph_.lf_indexers_[idx].get_type() == PropertyType::kUInt32) {
added_vertices_.emplace_back(
std::make_shared<IdIndexer<uint32_t, vid_t>>());
} else if (graph_.lf_indexers_[idx].get_type() == PropertyType::kString) {
added_vertices_.emplace_back(
std::make_shared<IdIndexer<std::string_view, vid_t>>());
Expand Down Expand Up @@ -431,11 +440,21 @@ void UpdateTransaction::IngestWal(MutablePropertyFragment& graph,
if (graph.lf_indexers_[idx].get_type() == PropertyType::kInt64) {
added_vertices.emplace_back(
std::make_shared<IdIndexer<int64_t, vid_t>>());
} else if (graph.lf_indexers_[idx].get_type() == PropertyType::kUInt64) {
added_vertices.emplace_back(
std::make_shared<IdIndexer<uint64_t, vid_t>>());
} else if (graph.lf_indexers_[idx].get_type() == PropertyType::kInt32) {
added_vertices.emplace_back(
std::make_shared<IdIndexer<int32_t, vid_t>>());
} else if (graph.lf_indexers_[idx].get_type() == PropertyType::kUInt32) {
added_vertices.emplace_back(
std::make_shared<IdIndexer<uint32_t, vid_t>>());
} else if (graph.lf_indexers_[idx].get_type() == PropertyType::kString) {
added_vertices.emplace_back(
std::make_shared<IdIndexer<std::string_view, vid_t>>());
} else {
LOG(FATAL) << "Only int64 and string_view types for pk are supported..";
LOG(FATAL) << "Only int64, uint64, int32, uint32 and string_view types "
"for pk are supported..";
}
}

Expand Down
1 change: 1 addition & 0 deletions flex/engines/graph_db/grin/predefine.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ typedef enum {
Date32 = 8, ///< date
Time32 = 9, ///< Time32
Timestamp64 = 10, ///< Timestamp
Bool = 11, ///< bool
} GRIN_DATATYPE;

/// Enumerates the error codes of grin
Expand Down
15 changes: 15 additions & 0 deletions flex/engines/graph_db/grin/src/index/pk.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,21 @@ GRIN_VERTEX grin_get_vertex_by_primary_keys_row(GRIN_GRAPH g,
if (!_g->g.get_lid(label, oid, vid)) {
return GRIN_NULL_VERTEX;
}
} else if (type == gs::PropertyType::kInt32) {
auto oid = *static_cast<const int32_t*>((*_r)[0]);
if (!_g->g.get_lid(label, oid, vid)) {
return GRIN_NULL_VERTEX;
}
} else if (type == gs::PropertyType::kUInt32) {
auto oid = *static_cast<const uint32_t*>((*_r)[0]);
if (!_g->g.get_lid(label, oid, vid)) {
return GRIN_NULL_VERTEX;
}
} else if (type == gs::PropertyType::kUInt64) {
auto oid = *static_cast<const uint64_t*>((*_r)[0]);
if (!_g->g.get_lid(label, oid, vid)) {
return GRIN_NULL_VERTEX;
}
} else if (type == gs::PropertyType::kString) {
auto oid = *static_cast<const std::string_view*>((*_r)[0]);
if (!_g->g.get_lid(label, oid, vid)) {
Expand Down
26 changes: 25 additions & 1 deletion flex/engines/graph_db/grin/src/predefine.cc
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@
#include "grin/src/predefine.h"

GRIN_DATATYPE _get_data_type(const gs::PropertyType& type) {
if (type == gs::PropertyType::kInt32) {
if (type == gs::PropertyType::kBool) {
return GRIN_DATATYPE::Bool;
} else if (type == gs::PropertyType::kInt32) {
return GRIN_DATATYPE::Int32;
} else if (type == gs::PropertyType::kUInt32) {
return GRIN_DATATYPE::UInt32;
} else if (type == gs::PropertyType::kInt64) {
return GRIN_DATATYPE::Int64;
} else if (type == gs::PropertyType::kUInt64) {
return GRIN_DATATYPE::UInt64;
} else if (type == gs::PropertyType::kString) {
return GRIN_DATATYPE::String;
} else if (type == gs::PropertyType::kDate) {
return GRIN_DATATYPE::Timestamp64;
} else if (type == gs::PropertyType::kDouble) {
return GRIN_DATATYPE::Double;
} else if (type == gs::PropertyType::kFloat) {
return GRIN_DATATYPE::Float;
} else {
return GRIN_DATATYPE::Undefined;
}
Expand All @@ -32,6 +40,18 @@ void init_cache(GRIN_GRAPH_T* g) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::LongColumn>(
table.get_column_by_id(idx))
.get());
} else if (type == gs::PropertyType::kUInt32) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::UIntColumn>(
table.get_column_by_id(idx))
.get());
} else if (type == gs::PropertyType::kUInt64) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::ULongColumn>(
table.get_column_by_id(idx))
.get());
} else if (type == gs::PropertyType::kBool) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::BoolColumn>(
table.get_column_by_id(idx))
.get());
} else if (type == gs::PropertyType::kString) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::StringColumn>(
table.get_column_by_id(idx))
Expand All @@ -44,6 +64,10 @@ void init_cache(GRIN_GRAPH_T* g) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::DoubleColumn>(
table.get_column_by_id(idx))
.get());
} else if (type == gs::PropertyType::kFloat) {
tmp.emplace_back(std::dynamic_pointer_cast<gs::FloatColumn>(
table.get_column_by_id(idx))
.get());
} else {
tmp.emplace_back((const void*) NULL);
}
Expand Down
6 changes: 6 additions & 0 deletions flex/engines/graph_db/grin/src/property/primarykey.cc
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ GRIN_VERTEX_PROPERTY_LIST grin_get_primary_keys_by_vertex_type(
vp += (label * 1u) << 8;
if (type == gs::PropertyType::kInt64) {
vp += (GRIN_DATATYPE::Int64 * 1u) << 16;
} else if (type == gs::PropertyType::kInt32) {
vp += (GRIN_DATATYPE::Int32 * 1u) << 16;
} else if (type == gs::PropertyType::kUInt64) {
vp += (GRIN_DATATYPE::UInt64 * 1u) << 16;
} else if (type == gs::PropertyType::kUInt32) {
vp += (GRIN_DATATYPE::UInt32 * 1u) << 16;
} else if (type == gs::PropertyType::kString) {
vp += (GRIN_DATATYPE::String * 1u) << 16;
} else {
Expand Down
Loading

0 comments on commit df64984

Please sign in to comment.