Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/andkret/Cookbook
Browse files Browse the repository at this point in the history
  • Loading branch information
andkret committed Jul 17, 2024
2 parents 060a51e + 13aabcb commit b736d57
Show file tree
Hide file tree
Showing 6 changed files with 131 additions and 68 deletions.
39 changes: 21 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,24 +196,27 @@ Find the change log with all recent updates here: [SEE UPDATES](sections/10-Upda
- [Apache Drill](sections/03-AdvancedSkills.md#apache-drill)
- [StreamSets](sections/03-AdvancedSkills.md#streamsets)
- [Store](sections/03-AdvancedSkills.md#store)
- [Data Warehouse vs Data Lake](sections/03-AdvancedSkills.md#data-warehouse-vs-data-lake)
- [SQL Databases](sections/03-AdvancedSkills.md#sql-databases)
- [PostgreSQL DB](sections/03-AdvancedSkills.md#postgresql-db)
- [Database Design](sections/03-AdvancedSkills.md#database-design)
- [SQL Queries](sections/03-AdvancedSkills.md#sql-queries)
- [Stored Procedures](sections/03-AdvancedSkills.md#stored-procedures)
- [ODBC/JDBC Server Connections](sections/03-AdvancedSkills.md#odbc-jdbc-server-connections)
- [NoSQL Stores](sections/03-AdvancedSkills.md#nosql-stores)
- [HBase KeyValue Store](sections/03-AdvancedSkills.md#keyvalue-stores-hbase)
- [HDFS Document Store](sections/03-AdvancedSkills.md#document-stores-hdfs)
- [MongoDB Document Store](sections/03-AdvancedSkills.md#document-stores-mongodb)
- [Elasticsearch Document Store](sections/03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)
- [Hive Warehouse](sections/03-AdvancedSkills.md#hive-warehouse)
- [Impala](sections/03-AdvancedSkills.md#impala)
- [Kudu](sections/03-AdvancedSkills.md#kudu)
- [Apache Druid](sections/03-AdvancedSkills.md#apache-druid)
- [InfluxDB Time Series Database](sections/03-AdvancedSkills.md#influxdb-time-series-database)
- [Greenplum MPP Database](sections/03-AdvancedSkills.md#mpp-databases-greenplum)
- [Analytical Data Stores](03-AdvancedSkills.md#analytical-data-stores)
- [Data Warehouse vs Data Lake](sections/03-AdvancedSkills.md#data-warehouse-vs-data-lake)
- [Snowflake and dbt](sections/03-AdvancedSkills.md#snowflake-and-dbt)
- [Transactional Data Stores](sections/03-AdvancedSkills.md#transactional-data-stores)
- [SQL Databases](sections/03-AdvancedSkills.md#sql-databases)
- [PostgreSQL DB](sections/03-AdvancedSkills.md#postgresql-db)
- [Database Design](sections/03-AdvancedSkills.md#database-design)
- [SQL Queries](sections/03-AdvancedSkills.md#sql-queries)
- [Stored Procedures](sections/03-AdvancedSkills.md#stored-procedures)
- [ODBC/JDBC Server Connections](sections/03-AdvancedSkills.md#odbc-jdbc-server-connections)
- [NoSQL Stores](sections/03-AdvancedSkills.md#nosql-stores)
- [HBase KeyValue Store](sections/03-AdvancedSkills.md#keyvalue-stores-hbase)
- [HDFS Document Store](sections/03-AdvancedSkills.md#document-stores-hdfs)
- [MongoDB Document Store](sections/03-AdvancedSkills.md#document-stores-mongodb)
- [Elasticsearch Document Store](sections/03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)
- [Hive Warehouse](sections/03-AdvancedSkills.md#hive-warehouse)
- [Impala](sections/03-AdvancedSkills.md#impala)
- [Kudu](sections/03-AdvancedSkills.md#kudu)
- [Apache Druid](sections/03-AdvancedSkills.md#apache-druid)
- [InfluxDB Time Series Database](sections/03-AdvancedSkills.md#influxdb-time-series-database)
- [Greenplum MPP Database](sections/03-AdvancedSkills.md#mpp-databases-greenplum)
- [Visualize](sections/03-AdvancedSkills.md#visualize)
- [Android and IOS](sections/03-AdvancedSkills.md#android-and-ios)
- [API Design for Mobile Apps](sections/03-AdvancedSkills.md#how-to-design-apis-for-mobile-apps)
Expand Down
Binary file added images/03/Snowflake-dbt-thumbnail.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/03/Snowflake-ui.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/03/dbt-ui.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
155 changes: 105 additions & 50 deletions sections/03-AdvancedSkills.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,27 +72,30 @@ Advanced Data Engineering Skills
- [Apache Drill](03-AdvancedSkills.md#apache-drill)
- [StreamSets](03-AdvancedSkills.md#streamsets)
- [Store](03-AdvancedSkills.md#store)
- [Data Warehouse vs Data Lake](03-AdvancedSkills.md#data-warehouse-vs-data-lake)
- [SQL Databases](03-AdvancedSkills.md#sql-databases)
- [PostgreSQL DB](03-AdvancedSkills.md#postgresql-db)
- [Database Design](03-AdvancedSkills.md#database-design)
- [SQL Queries](03-AdvancedSkills.md#sql-queries)
- [Stored Procedures](03-AdvancedSkills.md#stored-procedures)
- [ODBC/JDBC Server Connections](03-AdvancedSkills.md#odbc-jdbc-server-connections)
- [NoSQL Stores](03-AdvancedSkills.md#nosql-stores)
- [HBase KeyValue Store](03-AdvancedSkills.md#keyvalue-stores-hbase)
- [HDFS Document Store](03-AdvancedSkills.md#document-stores-hdfs)
- [MongoDB Document Store](03-AdvancedSkills.md#document-stores-mongodb)
- [Elasticsearch Document Store](03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)
- [Graph Databases (Neo4j)](03-AdvancedSkills.md#graph-db-neo4j)
- [Impala](03-AdvancedSkills.md#impala)
- [Kudu](03-AdvancedSkills.md#kudu)
- [Apache Druid](03-AdvancedSkills.md#apache-druid)
- [InfluxDB Time Series Database](03-AdvancedSkills.md#influxdb-time-series-database)
- [Greenplum MPP Database](03-AdvancedSkills.md#mpp-databases-greenplum)
- [NoSQL Data Warehouses](03-AdvancedSkills.md#nosql-data-warehouses)
- [Hive Warehouse](03-AdvancedSkills.md#hive-warehouse)
- [Impala](03-AdvancedSkills.md#impala)
- [Analytical Data Stores](03-AdvancedSkills.md#analytical-data-stores)
- [Data Warehouse vs Data Lake](03-AdvancedSkills.md#data-warehouse-vs-data-lake)
- [Snowflake and dbt](03-AdvancedSkills.md#snowflake-and-dbt)
- [Transactional Data Stores](03-AdvancedSkills.md#transactional-data-stores)
- [SQL Databases](03-AdvancedSkills.md#sql-databases)
- [PostgreSQL DB](03-AdvancedSkills.md#postgresql-db)
- [Database Design](03-AdvancedSkills.md#database-design)
- [SQL Queries](03-AdvancedSkills.md#sql-queries)
- [Stored Procedures](03-AdvancedSkills.md#stored-procedures)
- [ODBC/JDBC Server Connections](03-AdvancedSkills.md#odbc-jdbc-server-connections)
- [NoSQL Stores](03-AdvancedSkills.md#nosql-stores)
- [HBase KeyValue Store](03-AdvancedSkills.md#keyvalue-stores-hbase)
- [HDFS Document Store](03-AdvancedSkills.md#document-stores-hdfs)
- [MongoDB Document Store](03-AdvancedSkills.md#document-stores-mongodb)
- [Elasticsearch Document Store](03-AdvancedSkills.md#Elasticsearch-search-engine-and-document-store)
- [Graph Databases (Neo4j)](03-AdvancedSkills.md#graph-db-neo4j)
- [Impala](03-AdvancedSkills.md#impala)
- [Kudu](03-AdvancedSkills.md#kudu)
- [Apache Druid](03-AdvancedSkills.md#apache-druid)
- [InfluxDB Time Series Database](03-AdvancedSkills.md#influxdb-time-series-database)
- [Greenplum MPP Database](03-AdvancedSkills.md#mpp-databases-greenplum)
- [NoSQL Data Warehouses](03-AdvancedSkills.md#nosql-data-warehouses)
- [Hive Warehouse](03-AdvancedSkills.md#hive-warehouse)
- [Impala](03-AdvancedSkills.md#impala)
- [Visualize](03-AdvancedSkills.md#visualize)
- [Android and IOS](03-AdvancedSkills.md#android-and-ios)
- [API Design for Mobile Apps](03-AdvancedSkills.md#how-to-design-apis-for-mobile-apps)
Expand Down Expand Up @@ -1070,21 +1073,21 @@ it a lot easier to configure the resource management.

### Samza

![Link to Apache Samza Homepage](http://samza.apache.org/)
[Link to Apache Samza Homepage](http://samza.apache.org/)

### AWS Lambda

![Link to AWS Lambda Homepage](https://aws.amazon.com/lambda/)
[Link to AWS Lambda Homepage](https://aws.amazon.com/lambda/)


### Apache Flink

![Link to Apache Flink Homepage](https://flink.apache.org/)
[Link to Apache Flink Homepage](https://flink.apache.org/)


### Elasticsearch

![Link to Elatsicsearch Homepage](https://www.elastic.co/products/elastic-stack)
[Link to Elatsicsearch Homepage](https://www.elastic.co/products/elastic-stack)

### Graph DB

Expand Down Expand Up @@ -1133,12 +1136,12 @@ https://neo4j.com/use-cases/

### Apache Solr

![Link to Solr Homepage](https://lucene.apache.org/solr/)
[Link to Solr Homepage](https://solr.apache.org)


### Apache Drill

![Link to Apache Drill Homepage](https://drill.apache.org/)
[Link to Apache Drill Homepage](https://drill.apache.org)


### Apache Storm
Expand All @@ -1156,16 +1159,68 @@ https://storm.apache.org/

## Store

### Data Warehouse vs Data Lake
### Analytical Data Stores

#### Data Warehouse vs Data Lake

| Podcast Episode: #055 Data Warehouse vs Data Lake
|------------------|
|On this podcast we are going to talk about data warehouses and data lakes? When do people use which? What are the pros and cons of both? Architecture examples for both Does it make sense to completely move to a data lake?
| [Watch on YouTube](https://youtu.be/8gNQTrUUwMk) \ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/055-Data-Warehouse-vs-Data-Lake-e45iem)|

### SQL Databases
#### Snowflake and dbt

![Snowlfake thumb](/images/03/Snowflake-dbt-thumbnail.jpeg)

In the rapidly evolving landscape of data engineering, staying ahead means continuously expanding your skill set with the latest tools and technologies. Among the myriad of options available, dbt (data build tool) and Snowflake have emerged as indispensable for modern data engineering workflows. Understanding and leveraging these tools can significantly enhance your ability to manage and transform data, making you a more effective and valuable data engineer. Let's dive into why dbt and Snowflake should be at the top of your learning list and explore how the "dbt for Data Engineers" and "Snowflake for Data Engineers" courses from the Learn Data Engineering Academy can help you achieve mastery in these tools.

##### The Power of Snowflake in Data Engineering

Snowflake has revolutionized the data warehousing space with its cloud-native architecture. It offers a scalable, flexible, and highly performant platform that simplifies data management and analytics. Here’s why Snowflake is a critical skill for data engineers:

1. **Cloud-Native Flexibility:** Snowflake’s architecture allows you to scale resources up or down based on your needs, ensuring optimal performance and cost-efficiency.
2. **Unified Data Platform:** It unifies data silos, enabling seamless data sharing and collaboration across the organization.
3. **Integration Capabilities:** Snowflake integrates with various data tools and platforms, enhancing its versatility in different data workflows.
4. **Advanced Analytics:** With its robust support for data querying, transformation, and integration, Snowflake is ideal for complex analytical workloads.

The "Snowflake for Data Engineers" course in my Learn Data Engineering Academy provides comprehensive training on Snowflake. From the basics of setting up your Snowflake environment to advanced data automation with Snowpipes, the course equips you with practical skills to leverage Snowflake effectively in your data projects.

Learn more about the course [here](https://learndataengineering.com/p/snowflake-for-data-engineers).

![Snowlfake thumb](/images/03/Snowflake-ui.jpeg)


##### Why dbt is a Game-Changer for Data Engineers

dbt is a powerful transformation tool that allows data engineers to transform, test, and document data directly within their data warehouse using simple SQL. Unlike traditional ETL tools, dbt operates on the principle of ELT (Extract, Load, Transform), which aligns perfectly with modern cloud data warehousing paradigms. Here are a few reasons why dbt is a must-have skill for data engineers:

1. **SQL-First Approach:** dbt allows you to write transformations in SQL, the lingua franca of data manipulation, making it accessible to a broad range of data professionals.
2. **Collaboration:** Teams can collaborate seamlessly, creating trusted datasets for reporting, machine learning, and operational workflows.
3. **Ease of Use:** With dbt, you can transform, test, and document your data with ease, streamlining the data pipeline process.
4. **Integration:** dbt integrates effortlessly with your existing data warehouse, such as Snowflake, making it a versatile addition to your toolkit.

In my Learn Data Engineering Academy you find the perfect starting point for mastering dbt with the course "dbt for Data Engineers". The course covers everything from the basics of ELT processes to advanced features like continuous integration and deployment (CI/CD) pipelines. With hands-on training, you'll learn to create data pipelines, configure dbt materializations, test dbt models, and much more.

Learn more about the course [here](https://learndataengineering.com/p/dbt-for-data-engineers).

![Snowlfake thumb](/images/03/dbt-ui.jpeg)

##### dbt and Snowflake: A Winning Combination

When used together, dbt and Snowflake offer a powerful combination for data engineering. Here’s why:

1. **Seamless Integration:** dbt’s SQL-first transformation capabilities integrate perfectly with Snowflake’s scalable data warehousing, creating a streamlined ELT workflow.
2. **Efficiency:** Together, they enhance the efficiency of data transformation and analytics, reducing the time and effort required to prepare data for analysis.
3. **Scalability:** The combined power of dbt’s model management and Snowflake’s dynamic scaling ensures that your data pipelines can handle large and complex datasets with ease.
4. **Collaboration and Documentation:** dbt’s ability to document and test data transformations directly within Snowflake ensures that data workflows are transparent, reliable, and collaborative.
Get right into it with our Academy!

By integrating Snowflake and dbt into your skill set, you position yourself at the forefront of data engineering innovation. These tools not only simplify and enhance your data workflows but also open up new possibilities for data transformation and analysis.

### Transactional Data Stores
#### SQL Databases

#### PostgreSQL DB
##### PostgreSQL DB

Homepage:

Expand All @@ -1175,17 +1230,17 @@ PostgreSQL vs MongoDB:

<https://blog.panoply.io/postgresql-vs-mongodb>

#### Database Design
##### Database Design

#### SQL Queries
##### SQL Queries

#### Stored Procedures
##### Stored Procedures

#### ODBC/JDBC Server Connections
##### ODBC/JDBC Server Connections

### NoSQL Stores
#### NoSQL Stores

#### KeyValue Stores (HBase)
##### KeyValue Stores (HBase)


| Podcast Episode: #056 NoSQL Key Value Stores Explained with HBase
Expand All @@ -1194,7 +1249,7 @@ PostgreSQL vs MongoDB:
| [Watch on YouTube](https://youtu.be/67hIkbpzFc8) \ [Listen on Anchor](https://anchor.fm/andreaskayy/episodes/056-NoSQL-Key-Value-Stores-Explained-With-HBase-e45ifb)|


#### Document Store HDFS
##### Document Store HDFS

The Hadoop distributed file system, or HDFS, allows you to store files
in Hadoop. The difference between HDFS and other file systems like NTFS
Expand Down Expand Up @@ -1252,7 +1307,7 @@ This mechanic of splitting a large file in blocks and distributing them
over the servers is great for processing. See the MapReduce section for
an example.

#### Document Store MongoDB
##### Document Store MongoDB


| Podcast Episode: #093 What is MongoDB
Expand Down Expand Up @@ -1299,7 +1354,7 @@ MongoDB vs Cassandra:

<https://blog.panoply.io/cassandra-vs-mongodb>

#### Elasticsearch Search Engine and Document Store
##### Elasticsearch Search Engine and Document Store

Elasticsearch is not a DB but firstly a search engine that indexes JSON
documents.
Expand Down Expand Up @@ -1356,21 +1411,21 @@ Google Trends Grafana vs Kibana:\
<https://trends.google.com/trends/explore?geo=US&q=%2Fg%2F11fy132gmf,%2Fg%2F11cknd0blr>


#### Apache Impala
##### Apache Impala

![Apache Impala Homepage](https://impala.apache.org/)
[Apache Impala Homepage](https://impala.apache.org/)

#### Kudu
##### Kudu

#### Apache Druid
##### Apache Druid

| Podcast Episode: Druid NoSQL DB and Analytics DB Introduction
|------------------|
|In this video I explain what Druid is and how it works. We look into the architecture of a Druid cluster and check out how Clients access the data.
|[Watch on YouTube](https://youtu.be/EiEIeBXSWjM)


#### InfluxDB Time Series Database
##### InfluxDB Time Series Database

What is time-series data?

Expand All @@ -1396,21 +1451,21 @@ Performance Dashboard Spark and InfluxDB:
Other alternatives for time series databases are: DalmatinerDB,
QuestDB, Prometheus, Riak TS, OpenTSDB, KairosDB

#### MPP Databases (Greenplum)
##### MPP Databases (Greenplum)

#### Azure Cosmos DB
##### Azure Cosmos DB

https://azure.microsoft.com/en-us/services/cosmos-db/

#### Azure Table-Storage
##### Azure Table-Storage

https://azure.microsoft.com/en-us/services/storage/tables/

### NoSQL Data warehouse
#### NoSQL Data warehouse

#### Hive Warehouse
##### Hive Warehouse

#### Impala
##### Impala

## Visualize

Expand Down
5 changes: 5 additions & 0 deletions sections/10-Updates.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ Updates

What's new? Here you can find a list of all the updates with links to the sections

- **2024-07-08**
- Added large article about Snowflake and dbt for Data Engineers [click here](03-AdvancedSkills.md#analytical-data-stores)
- Added new secton "Analytical Data Stores" to Advanced skills with the Snowflake & dbt infos.
- Put SQL and NoSQL datastores into a new section "Transactional Data Stores"

- **2024-03-20**
- Added roadmap for Software Engineers / Computer Scientists [click here](01-Introduction.md#roadmap-for-software-engineers)
- Added many questions and answers from my interview on the Super Data Science Podcast (plus links to YouTube and the Podcast) [click here](01-Introduction.md#Interview-with-Andreas-on-the-Super-Data-Science-Podcast)
Expand Down

0 comments on commit b736d57

Please sign in to comment.