The focus of this thesis is to evaluate the chain-of-thought (CoT) prompting architecture of Large Language Models (LLMs) using one or more datasets. I used Iterative Chain-of-thought (Iter CoT) for evaluation and also used different LLMs apart from Chat GPT or GPT models since they were used to analyze the prompt architecture in the original research paper of Iterative Chain-of-thought (Iter CoT). I utilized the two Stanford Alpaca's low-rank adaptation (LoRA) which are 13B and 30B. Furthermore, I employed two 13B LLMs that have been fine-tuned from the Meta's LLaMa2 model. In this thesis, I employed pre-trained LLMs and fine-tuned the prompt to gain better performance from the LLM using the Iter CoT architecture, and I then analyzed the output performance and also performed error analysis. Evaluation showed that iterative instructions improved accuracy of answer. Alpaca LoRA 13b improved the correctness accuracy from 44.77% to 92.53% through three revisions. Additionally, this approach reduced the failure rate in answering questions. The alpaca LoRA 13b had the lowest failure rate of 7.46%. Furthermore, the Iter-CoT method outperformed the manual CoT method, and the alpaca LoRA 13b and 30b outperformed the ChatGPT model.
-
Notifications
You must be signed in to change notification settings - Fork 0
This thesis evaluates Large Language Models (LLMs) using the Chain-of-Thought (CoT) prompting architecture. It employs Iterative Chain-of-Thought (Iter CoT) for analysis, utilizing various LLMs including Alpaca LoRA 13B and 30B. Results show improved accuracy with Iter CoT, notably with Alpaca LoRA 13B, outperforming ChatGPT models.
mahbubur-87/llm-prompt-architecture-evaluation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
This thesis evaluates Large Language Models (LLMs) using the Chain-of-Thought (CoT) prompting architecture. It employs Iterative Chain-of-Thought (Iter CoT) for analysis, utilizing various LLMs including Alpaca LoRA 13B and 30B. Results show improved accuracy with Iter CoT, notably with Alpaca LoRA 13B, outperforming ChatGPT models.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published