From d99090a749b549f5eced43356d61dd4680374bd1 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Sat, 6 Jul 2024 07:46:15 -0400 Subject: [PATCH 1/4] Update README.md --- README.md | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 96b2b5fc..467cd9af 100644 --- a/README.md +++ b/README.md @@ -11,9 +11,10 @@ Optimize data for fast AI model training.**
 Transform                              Optimize
   
-✅ Parallelize data processing       ✅ Stream large cloud datasets    
-✅ Create vector embeddings          ✅ Accelerate training by 20x     
-✅ Transform any data type           ✅ Pause and resume data streaming
+✅ Parallelize data processing       ✅ Stream large cloud datasets                
+✅ Create vector embeddings          ✅ Accelerate training by 20x                 
+✅ Chain transforms                  ✅ Pause and resume data streaming            
+✅ Transform any data type           ✅ Work with remote data without local loading
 
--- @@ -45,19 +46,32 @@ Transform Optimize   # Transform data at scale. Optimize for fast model training. -LitData helps scale and speed up two key data workflows: +Use LitData to speed up data processing (resizing, embedding, etc) or to optimize datasets to speed up model training. -[Transform datasets](#transform-datasets) - Parallelize (map) transforms across 1000s of machines. -[Optimize datasets](#speed-up-model-training)   - Accelerate AI model training by 20x. +### Transform datasets +Parallelize (map) transforms across many machines at once to process large datasets much faster. -
+
+✅ Paralellize processing:    Reduce processing time by transforming data across multiple machines simultaneously.    
+✅ Scale to large data:       Increase the size of datasets you can efficiently handle.    
+✅ Flexible usecases:         Resize images, create embeddings, scrape the internet, etc...    
+✅ Run local or cloud:        Run on your own machines or auto-scale to 1000s of cloud GPUs with Lightning Studios.         
+✅ Enterprise security:       Self host or process data on your cloud account with Lightning Studios.  
+
+ +  + +### Optimize datasets +Accelerate model training (20x faster) by optimizing datasets for streaming directly from cloud storage. Work with remote data without local downloads with features like loading data subsets, accessing individual samples, and resumable streaming. + +
 ✅ Speed up training:         Speed up model training by 20x with optimized datasets.   
-✅ Stream cloud datasets:     Work with huge datasets directly from cloud storage without downloading.    
+✅ Stream cloud datasets:     Work with cloud data without downloading it.    
 ✅ Pytorch-first:             Works with PyTorch libraries like PyTorch Lightning, Lightning Fabric, Hugging Face.    
 ✅ Easy collaboration:        Share and access datasets in the cloud, streamlining team projects.     
 ✅ Scale across GPUs:         Streamed data automatically scales to all GPUs.      
 ✅ Flexible storage:          Use S3, GCS, Azure, or your own cloud account for data storage.    
-✅ Run local or cloud:        Auto-scale to 1000s of cloud GPUs with Lightning Studios.     
+✅ Run local or cloud:        Run on your own machines or auto-scale to 1000s of cloud GPUs with Lightning Studios.         
 ✅ Enterprise security:       Self host or process data on your cloud account with Lightning Studios.  
 
@@ -92,7 +106,8 @@ pip install 'litdata[extras]' ---- # Speed up model training -Significantly speed up model training by optimizing datasets for fast loading (20x faster) and streaming from cloud storage. +Accelerate model training (20x faster) by optimizing datasets for streaming directly from cloud storage. Work with remote data without local downloads with features like loading data subsets, accessing individual samples, and resumable streaming. + **Step 1: Optimize the data** This step will format the dataset for fast loading (binary, chunked, etc...) From e7f9f2147f5a37104a8f2ebf752044b3074ded06 Mon Sep 17 00:00:00 2001 From: William Falcon Date: Sat, 6 Jul 2024 07:47:06 -0400 Subject: [PATCH 2/4] Update README.md --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 467cd9af..c7554eff 100644 --- a/README.md +++ b/README.md @@ -11,10 +11,10 @@ Optimize data for fast AI model training.**
 Transform                              Optimize
   
-✅ Parallelize data processing       ✅ Stream large cloud datasets                
-✅ Create vector embeddings          ✅ Accelerate training by 20x                 
-✅ Chain transforms                  ✅ Pause and resume data streaming            
-✅ Transform any data type           ✅ Work with remote data without local loading
+✅ Parallelize data processing       ✅ Stream large cloud datasets          
+✅ Create vector embeddings          ✅ Accelerate training by 20x           
+✅ Chain transforms                  ✅ Pause and resume data streaming      
+✅ Transform any data type           ✅ Use remote data without local loading
 
--- From d9509fb96a6290a45777e7981eb0b7f5cdefb25b Mon Sep 17 00:00:00 2001 From: William Falcon Date: Sat, 6 Jul 2024 07:54:58 -0400 Subject: [PATCH 3/4] Update README.md --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index c7554eff..4aecfcd2 100644 --- a/README.md +++ b/README.md @@ -46,10 +46,10 @@ Transform Optimize   # Transform data at scale. Optimize for fast model training. -Use LitData to speed up data processing (resizing, embedding, etc) or to optimize datasets to speed up model training. +LitData scales data processing tasks (data scraping, image resizing, distributed inference, embedding creation) on local or cloud machines. It also enables optimizing datasets to accelerate AI model training and work with large remote datasets without local loading. -### Transform datasets -Parallelize (map) transforms across many machines at once to process large datasets much faster. +### Transform datasets +Accelerate data processing tasks (data scraping, image resizing, embedding creation) by parallelizing (map) the work across many machines at once.
 ✅ Paralellize processing:    Reduce processing time by transforming data across multiple machines simultaneously.    
@@ -166,7 +166,7 @@ for sample in dataloader:
 ----    
 
 # Transform datasets    
-Use LitData to apply transforms to large datasets across multiple machines in parallel. Common usecases are to create vector embeddings, run distributed inference and more.   
+Accelerate data processing tasks (data scraping, image resizing, embedding creation, distributed inference) by parallelizing (map) the work across many machines at once.   
 
 Here's an example that resizes and crops a large image dataset:
 

From bb3d46a6d49ea888966cedb1af0d9cec7f43a01f Mon Sep 17 00:00:00 2001
From: William Falcon 
Date: Sat, 6 Jul 2024 07:55:46 -0400
Subject: [PATCH 4/4] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 4aecfcd2..ddbfc259 100644
--- a/README.md
+++ b/README.md
@@ -13,8 +13,8 @@ Transform                              Optimize
   
 ✅ Parallelize data processing       ✅ Stream large cloud datasets          
 ✅ Create vector embeddings          ✅ Accelerate training by 20x           
-✅ Chain transforms                  ✅ Pause and resume data streaming      
-✅ Transform any data type           ✅ Use remote data without local loading
+✅ Run distributed inference         ✅ Pause and resume data streaming      
+✅ Scrape websites at scale          ✅ Use remote data without local loading
 
---