This is mongo source/sink connectors for Pulsar, enjoy!
The params of Mongo Source :
- The uri of mongodb that the connector connects to
- (see:
- the database name to be watched
- null ,it means to watch all databases
- not null , it means to just watch a database
- the collection to be watched
- if database is null , collection will be ignored
- if database is not null , collection is not null , it means that we are just interested in a single collection
- if database is not null , collection is null, it means to watch all collections of the database
- The batch size of read from the database used for the mongodb cursor
- default 100
- whether or not to copy existing mongo data
- run at the first time , this value works
- if the copy task does not run successfully , then it will continue to copy from head at the next time
- once copy task run successfully , this value will be ignored forever
- default false
- the regix used to filter namespace in copy existing document , used to filter the namespaces
- default empty string
- thread count used for copy executorService , the final value will depends on the cpu and mongo namespaces also
- default 8
- the queue size used for this source to put mongodb document
- default 2000
please execute commands as follows to build the nar file
git clone
cd pulsar-mongodb-source-connector
mvn clean package -Dmaven.test.skip=true
then , you can find the nar file at pulsar-mongodb-source-connector/target/pulsar-io-mongo-[the version].nar
create test environment for mongodb , for example (on Centos 8) :
mkdir -p /root/mongodb && cd /root/mongodb mkdir dbpath0 dbpath1 dbpath2 wget tar -zvxfmongodb-linux-x86_64-rhel80-5.0.4.tgz /root/mongodb/mongodb-linux-x86_64-rhel80-5.0.4/bin/mongod --fork --replSet myreplSet --bind_ip= --port=27017 --dbpath=/root/mongodb/dbpath0 --logpath=/root/mongodb/mongodb0.log --logappend /root/mongodb/mongodb-linux-x86_64-rhel80-5.0.4/bin/mongod --fork --replSet myreplSet --bind_ip= --port=27018 --dbpath=/root/mongodb/dbpath1 --logpath=/root/mongodb/mongodb1.log --logappend /root/mongodb/mongodb-linux-x86_64-rhel80-5.0.4/bin/mongod --fork --replSet myreplSet --bind_ip= --port=27019 --dbpath=/root/mongodb/dbpath2 --logpath=/root/mongodb/mongodb2.log --logappend
please enter shell
after enter shell, type as follows and press enter key:
config_myreplSet={ _id:"myreplSet", members: [ {_id:0,host:"localhost:27017",priority:4}, {_id:1,host:"localhost:27018",priority:2}, {_id:2,host:"localhost:27019",arbiterOnly:true} ] } ;
then , execute as follows :
check the status :
create db and insert document into collection mytable , for example :
use test; db.createCollection("mytable"); db.mytable.insert({"text":"hello world"}) db.mytable.find()
bingo , the test environment has been build successfully !
start standalone pulsar
mkdir -p /root/pulsar && cd /root/pulsar wget tar -zvxf apache-pulsar-2.9.1-bin.tar.gz mv apache-pulsar-2.9.1 pulsar-2.9.1-bin /root/pulsar/pulsar-2.9.1-bin/bin/pulsar standalone
if you want to start in daemon mode , you can try
nohup /root/pulsar/pulsar-2.9.1-bin/bin/pulsar standalone 1>/dev/null 2>&1 &
start pulsar io - mongo source task in 【localrun】 mode just for test
mkdir -p /root/pulsar/pulsar-2.9.1-bin/connectors
move the nar file to /root/pulsar/pulsar-2.9.1-bin/connectors
now ,let us prepare config file, please create file - /root/pulsar/pulsar-2.9.1-bin/connectors/mongo-source-config.yaml and then, the content is as follows :
tenant: public namespace: default name: pulsar-mongo-source parallelism: 1 topicName: mongo-source-topic-test archive: /root/pulsar/pulsar-2.9.1-bin/connectors/pulsar-io-mongo-2.9.1.nar #used by your mongo source configs: {"mongoUri":"mongodb://localhost:27017,localhost:27018,localhost:27019","database":null,"collection":null,"batchSize":1000}
very important !
- the parallelism can just be 1 , or error will happen
- after task runs , the resume token will be saved periodically , the key of the resume token in the state store is decided by the value joined by tenant/namespace/name , and the resume token will be read when starting , so if you want to start a source task with different resume token , you can use another config file , more details see
ok , let us start the localrun pulsar io mongo source task
export PULSAR_HOME=/root/pulsar/pulsar-2.9.1-bin /root/pulsar/pulsar-2.9.1-bin/bin/pulsar-admin sources localrun --source-config-file /root/pulsar/pulsar-2.9.1-bin/connectors/mongo-source-config.yaml --state-storage-service-url bk:// --broker-service-url pulsar://localhost:6650/
start pulsar consume client
/root/pulsar/pulsar-2.9.1-bin/bin/pulsar-client consume mongo-source-topic-test -s "first-subscription" -n 0
handle mongo document see 1)
observe the shell output of 4)
for example
----- got message ----- key:[{"_id": {"$oid": "62112e29ad9d5ad972f2bbad"}}], properties:[], content:{"clusterTime":7068881897035661795,"fullDocument":"{\"_id\": {\"$oid\": \"62112e29ad9d5ad972f2bbad\"}, \"text\": \"123456\"}","ns":{"databaseName":"test","collectionName":"mytable1","fullName":"test.mytable1"},"operation":"copy"}
----- got message ----- key:[{"_id": {"$oid": "6219b897fd2cdc8aa3ac8791"}}], properties:[], content:{"clusterTime":7068884048814276609,"fullDocument":"{\"_id\": {\"$oid\": \"6219b897fd2cdc8aa3ac8791\"}, \"text\": \"insert\"}","ns":{"databaseName":"test","collectionName":"mytable","fullName":"test.mytable"},"operation":"insert"}
----- got message ----- key:[{"_id": {"$oid": "6219bb2efd2cdc8aa3ac8792"}}], properties:[], content:{"clusterTime":7068886947917201409,"fullDocument":"{\"_id\": {\"$oid\": \"6219bb2efd2cdc8aa3ac8792\"}, \"text\": \"123\", \"title\": \"456\"}","ns":{"databaseName":"test","collectionName":"mytable","fullName":"test.mytable"},"operation":"update"}
db.mytable.find({ _id:ObjectId("6219c8d8fd2cdc8aa3ac8793") }).forEach(function(item){;; }); ----- got message ----- key:[{"_id": {"$oid": "6219c8d8fd2cdc8aa3ac8793"}}], properties:[], content:{"clusterTime":7068903457771487233,"fullDocument":"{\"_id\": {\"$oid\": \"6219c8d8fd2cdc8aa3ac8793\"}, \"text\": 1.0, \"aaa\": 1.0}","ns":{"databaseName":"test","collectionName":"mytable","fullName":"test.mytable"},"operation":"replace"}
----- got message ----- key:[{"_id": {"$oid": "620fc75879d0dfe57de5e4c9"}}], properties:[], content:{"clusterTime":7068895902924013569,"ns":{"databaseName":"test","collectionName":"mytable","fullName":"test.mytable"},"operation":"delete"}
----- got message ----- key:[], properties:[], content:{"clusterTime":7068919778647212034,"operation":"invalidate"}
----- got message ----- key:[], properties:[], content:{"clusterTime":7068897174234333185,"ns":{"databaseName":"test","collectionName":"mytable","fullName":"test.mytable"},"operation":"drop"}
----- got message ----- key:[], properties:[], content:{"clusterTime":7068899124149485570,"operation":"dropDatabase"}
----- got message ----- key:[], properties:[], content:{"clusterTime":7068888300831899649,"destNamespace":{"databaseName":"test","collectionName":"orders2022","fullName":"test.orders2022"},"ns":{"databaseName":"test","collectionName":"mytable","fullName":"test.mytable"},"operation":"rename"}
all done , enjoy !
lzqdename [email protected]
if you have questions , please post new issues in this project , and we will answer it as soon as possible .