-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blog : Using Node-RED as an ETL tool #2175
base: main
Are you sure you want to change the base?
Conversation
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 8%, saving 7.25 KB.
1056 images did not require optimisation. |
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 48.4%, saving 14.81 KB.
1058 images did not require optimisation. |
- business intelligence | ||
--- | ||
|
||
ETL (Extract, Transform, Load) is essential for integrating and analyzing data, helping businesses unlock detailed insights. You already know Node-RED for its user-friendly approach to creating IoT applications, but did you know it can also be a powerful tool for ETL tasks? When IBM published a blog about using Node-RED for ETL, it caught a lot of attention and got people talking about its potential in this space. In this guide, we'll walk you through how to use Node-RED for ETL, sharing its strengths and weaknesses along the way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumitshinde-84 This feels very ChatGPT, can you tone down the bravado?
|
||
### Extracting | ||
|
||
Node-RED can extract data from various sources, including APIs, databases, local filesystems, and IoT devices using built-in nodes and community-contributed nodes. For example, the HTTP request node can be used to pull data from web services, while nodes for MySQL, MongoDB, and PostgreSQL can extract data from databases. Nodes for MQTT and Kafka can fetch data from message brokers. File nodes enable extraction of data from local filesystems, while different cloud nodes for platforms like AWS, GCP, and IBM Watson allow extraction of data from cloud storage services. Moreover, Node-RED running on edge devices can extract data from sensors connected directly to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modbus, OPC-UA, etc etc aren't mentioned. Please lean into the industrial applicability more
## Benefits of using Node-RED as an ETL tool. | ||
|
||
1. **Ease of Use:** Its visual programming interface makes it accessible to non-developers. | ||
2. **Flexibility:** A wide range of nodes and the ability to write custom JavaScript allow for flexible data processing. | ||
3. **Integration:** Node-RED excels in integrating IoT devices and handling real-time data, making it well-suited for combining diverse data sources into unified workflows. | ||
4. **Cost-Effective:** Being open-source, Node-RED can be a cost-effective alternative to expensive ETL tools. | ||
5. **Community Support:** A large community provides a wealth of nodes, examples, and support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatGPT? Feels uninspired and machine generated.
### Customer Data Deduplication | ||
|
||
1. Drag a Change node onto the canvas, and set `msg.data` to `msg.payload` and `msg.payload` to `[]`. | ||
2. Drag a MongoDB4 node onto the canvas, and configure it with your correct details. If you haven't used MongoDB with Node-RED, please refer to the [Using MongoDB with Node-RED](/blog/2024/04/using-mongodb-with-node-red/). Enter `find` into the Operation field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think MongoDB is used very often in an ETL context, precisely as it's structureless. Do we have a better example? An event based sensor perhaps?
## What are the limitations of using Node-RED as an ETL tool | ||
|
||
While Node-RED is versatile, there are some limitations to consider while using it as an ETL tool: | ||
|
||
- Advanced Features: Some advanced ETL features, like automated schema detection and sophisticated error handling, might require additional customization or external modules. | ||
- Data Governance: Node-RED does not inherently provide robust data governance and lineage tracking, which are often essential in enterprise ETL tools. | ||
- Scalability: While Node-RED can handle many heavy tasks, but it may not offer the same level of optimization for processing extremely large datasets compared to dedicated ETL tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChatGPT?
Co-authored-by: Zeger-Jan van de Weg <[email protected]>
|
||
## Building a Simple ETL Project Using Node-RED | ||
|
||
Let's walk through a simple project where we will use Node-RED as an ETL tool to extract sample customer data from an API, transform it, and then load cleaned data into a MongoDB database and process data into a local file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should advice to use MongoDB? PG, Snowflake, and others are more often a target for the data.
|
||
Let's walk through a simple project where we will use Node-RED as an ETL tool to extract sample customer data from an API, transform it, and then load cleaned data into a MongoDB database and process data into a local file. | ||
|
||
*Note: The goal of this project is to understand how to utilize Node-RED as an ETL tool. We assume that the reader has a basic knowledge of Node-RED.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Note: The goal of this project is to understand how to utilize Node-RED as an ETL tool. We assume that the reader has a basic knowledge of Node-RED.* |
## Extracting Data | ||
|
||
1. Drag an Inject node onto the canvas. | ||
2. Drag an HTTP Request node onto the canvas, double-click on it, and set the URL to `https://api.slingacademy.com/v1/sample-data/files/customers.json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like cheating. Not a real world scenario
@sumitshinde-84 This has gone stale? Any updates pending? |
Not sure which example I should use. Using the temperature example every time is not good, though. |
@sumitshinde-84 What about machine logs? |
Description
Related Issue(s)
Checklist