Skip to content

This thesis evaluates Large Language Models (LLMs) using the Chain-of-Thought (CoT) prompting architecture. It employs Iterative Chain-of-Thought (Iter CoT) for analysis, utilizing various LLMs including Alpaca LoRA 13B and 30B. Results show improved accuracy with Iter CoT, notably with Alpaca LoRA 13B, outperforming ChatGPT models.

Notifications You must be signed in to change notification settings

mahbubur-87/llm-prompt-architecture-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-prompt-architecture-evaluation

The focus of this thesis is to evaluate the chain-of-thought (CoT) prompting architecture of Large Language Models (LLMs) using one or more datasets. I used Iterative Chain-of-thought (Iter CoT) for evaluation and also used different LLMs apart from Chat GPT or GPT models since they were used to analyze the prompt architecture in the original research paper of Iterative Chain-of-thought (Iter CoT). I utilized the two Stanford Alpaca's low-rank adaptation (LoRA) which are 13B and 30B. Furthermore, I employed two 13B LLMs that have been fine-tuned from the Meta's LLaMa2 model. In this thesis, I employed pre-trained LLMs and fine-tuned the prompt to gain better performance from the LLM using the Iter CoT architecture, and I then analyzed the output performance and also performed error analysis. Evaluation showed that iterative instructions improved accuracy of answer. Alpaca LoRA 13b improved the correctness accuracy from 44.77% to 92.53% through three revisions. Additionally, this approach reduced the failure rate in answering questions. The alpaca LoRA 13b had the lowest failure rate of 7.46%. Furthermore, the Iter-CoT method outperformed the manual CoT method, and the alpaca LoRA 13b and 30b outperformed the ChatGPT model.

About

This thesis evaluates Large Language Models (LLMs) using the Chain-of-Thought (CoT) prompting architecture. It employs Iterative Chain-of-Thought (Iter CoT) for analysis, utilizing various LLMs including Alpaca LoRA 13B and 30B. Results show improved accuracy with Iter CoT, notably with Alpaca LoRA 13B, outperforming ChatGPT models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published