Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GenAI packages #7169

Open
8 of 13 tasks
LittleLittleCloud opened this issue Jun 6, 2024 · 16 comments
Open
8 of 13 tasks

Add GenAI packages #7169

LittleLittleCloud opened this issue Jun 6, 2024 · 16 comments
Labels
Deep Learning enhancement New feature or request NLP Issues / questions around text processing untriaged New issue has not been triaged

Comments

@LittleLittleCloud
Copy link
Contributor

LittleLittleCloud commented Jun 6, 2024

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
The GenAI packages will provide torchsharp implementation for a series of popular GenAI models. The goal is to load the same weight from the corresponding python regular model.

The following models will be added in the first wave

MEAI intergration

Fine-tuning #7287

Along with the benchmark

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@lostmsu
Copy link

lostmsu commented Sep 14, 2024

Can you guys publish a preview for Microsoft.ML.GenAI.LLaMA package?

@LittleLittleCloud
Copy link
Contributor Author

LittleLittleCloud commented Sep 15, 2024

@lostmsu You should be able to consume it from the daily build below

Oh, just notice that the GenAI package hasn't been set to IsPackable to true so it's not available on daily build. Will publish a PR to enable the package flag

@aforoughi1
Copy link

Can you please publish a preview for Microsoft.ML.GenAI.Core package?
It is not available

https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json

The sample Microsoft.ML.GenAI.Samples/Llama/LLaMA3_1.cs is broken without it .

Furthermore, the sample has hard coded weight folder
var weightFolder = @"C:\Users\xiaoyuz\source\repos\Meta-Llama-3.1-8B-Instruct";
I have downloaded the model and config from Meta site. May be a few comments will be helpful.

@LittleLittleCloud
Copy link
Contributor Author

Can you please publish a preview for Microsoft.ML.GenAI.Core package? It is not available

https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json

The sample Microsoft.ML.GenAI.Samples/Llama/LLaMA3_1.cs is broken without it .

Furthermore, the sample has hard coded weight folder var weightFolder = @"C:\Users\xiaoyuz\source\repos\Meta-Llama-3.1-8B-Instruct"; I have downloaded the model and config from Meta site. May be a few comments will be helpful.

Oh, sorry, I'll make the fix

@aforoughi1
Copy link

I am getting System.IO.FileNotFoundException
couldn't find model.safetensors.index.json
calling at Microsoft.ML.GenAI.LLaMA.LlamaForCausalLM.FromPretrained(String modelFolder, String configName, String checkPointName, ScalarType torchDtype, String device)
I can't get the example working, please explain where/what this file is?

@LittleLittleCloud
Copy link
Contributor Author

LittleLittleCloud commented Oct 1, 2024

@aforoughi1 Which llama, I suppose you are runnning llama 3.2 1B?

@aforoughi1
Copy link

Llama3.1-8B

@LittleLittleCloud
Copy link
Contributor Author

LittleLittleCloud commented Oct 1, 2024

@aforoughi1

The error basically say it can't find the {ModelFolder}/model.safetensors.index.json, could you share the full code to call the model, stacktrace and a screenshot of the llama 3.1 8B model folder

@aforoughi1
Copy link

model folder

// issue 7169
//Meta-Llama-3.1-8B-Instruct/orginial
string weightFolder = @"C:\Users\abbas.llama\checkpoints\Llama3.1-8B";
string configName = "params.json";
string modelFile = "tokenizer.model";

TiktokenTokenizer tokenizer = LlamaTokenizerHelper.FromPretrained(weightFolder, modelFile);
LlamaForCausalLM model = LlamaForCausalLM.FromPretrained(weightFolder, configName, layersOnTargetDevice: -1 ,targetDevice: "cpu");
Console.WriteLine("Loading Llama from model weight folder");

var pipeline = new CausalLMPipeline<TiktokenTokenizer, LlamaForCausalLM>(tokenizer, model, "cpu");

System.IO.FileNotFoundException
HResult=0x80070002
Message=Could not find file 'C:\Users\abbas.llama\checkpoints\Llama3.1-8B\model.safetensors.index.json'.
Source=System.Private.CoreLib
StackTrace:
at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.FileStreamHelpers.ChooseStrategyCore(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)
at System.IO.File.InternalReadAllText(String path, Encoding encoding)
at System.IO.File.ReadAllText(String path)
at TorchSharp.PyBridge.PyBridgeModuleExtensions.load_checkpoint(Module module, String path, String checkpointName, Boolean strict, IList1 skip, Dictionary2 loadedParameters, Boolean useTqdm)
at Microsoft.ML.GenAI.LLaMA.LlamaForCausalLM.FromPretrained(String modelFolder, String configName, String checkPointName, ScalarType torchDtype, String device)
at Test.GenAITest.LLaMATest1() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\Test\GenAITest.cs:line 35
at Test.Program.GenAI() in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\Test\Program.cs:line 425
at Test.Program.Main(String[] args) in C:\Users\abbas\OneDrive\Documents\WorkingProgress\MLStcokMarketPrediction\Test\Program.cs:line 54

@LittleLittleCloud
Copy link
Contributor Author

LittleLittleCloud commented Oct 1, 2024

@aforoughi1
LlamaForCausalLM loads .safetensor model weight while in your code, you are targeting the original .pth model weight folder.

The .safetensor model weight should be located in Meta-Llama-3.1-8B-Instruct, maybe update the weight folder to that path when loading LlamaForCausalLM?

LlamaForCausalLM model = LlamaForCausalLM.FromPretrained("Meta-Llama-3.1-8B-Instruct", configName, layersOnTargetDevice: -1 ,targetDevice: "cpu");

@aforoughi1
Copy link

I sorted the following missing files and the directory structure:
model.safetensors.index
model-00004-of-00004.safetensors
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors

The model is loaded successfully ONLY if I use the defaults
layersOnTargetDevice: -1,
quantizeToInt8: false
quantizeToInt4 = false

Setting layersOnTargetDevice: 26, quantizeToInt8: true causes memory corruptions exception.

The example also missing stopWatch.Stop();

I also don't see RegisterPrintMessage(), print any messages to the console.

@LittleLittleCloud
Copy link
Contributor Author

@aforoughi1 Are you using nightly build or trying the example from main branch

@aforoughi1
Copy link

aforoughi1 commented Oct 7, 2024 via email

@LittleLittleCloud
Copy link
Contributor Author

LittleLittleCloud commented Oct 7, 2024

@aforoughi1 And your GPU device/platform?

@aforoughi1
Copy link

aforoughi1 commented Oct 7, 2024 via email

@LittleLittleCloud
Copy link
Contributor Author

The layersOnTargetDevice is for GPU, so I haven't test values other than -1 in CPU scenario. For the quantizeToInt8 and quantizeToInt4, you probably also won't gain benefits on CPU scenarios. So maybe just keep it as false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deep Learning enhancement New feature or request NLP Issues / questions around text processing untriaged New issue has not been triaged
Projects
None yet
Development

No branches or pull requests

4 participants