Skip to content

Commit

Permalink
changed table 1
Browse files Browse the repository at this point in the history
  • Loading branch information
RyanLi0802 committed Oct 20, 2024
1 parent 9d7a608 commit 322f102
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 3 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,7 @@ pnpm-debug.log*

# jetbrains setting folder
.idea/

# temporary files
temp.*
tmp.*
Binary file added src/assets/img1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions src/pages/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ export const components = {pre: CodeBlock}
import demo_placeholder from "../assets/demo_placeholder_trimed.mp4";
import flowchart12 from "../assets/flowchart12.png";
import table1 from "../assets/table1.png";
import img1 from "../assets/img1.png";
import multi_turn_examples_5 from "../assets/multi_turn_examples_5.svg";
import multiturn_results_main_4 from "../assets/multiturn_results_main_4.jpg";
import exp1 from "../assets/exp1.png";
Expand Down Expand Up @@ -108,23 +109,23 @@ We evaluated 8 commercial models (GPT-4o, GPT-4o mini, Gemini 1.5 Pro, Gemini 1.
<Figure
caption="Results: Direct Generation"
>
<Image src={table1} alt="Table 1" />
<Image src={img1} alt="Figure 2" />
</Figure>

## Benchmark Performance: Multi-turn Evaluations

<Figure
caption="Multi-turn Generation Examples"
>
<Image src={multi_turn_examples_5} alt="Figure 2" />
<Image src={multi_turn_examples_5} alt="Figure 3" />
</Figure>

We find that all models displayed noticeable improvements in feedback following. The best commercial models achieves improvements of up to 7.1% in visual similarity and 2.7% in IoU-based layout similarity within five rounds of interaction. Quesion asking, however, appears to be a more challenging task as all models struggled to pose effective questions about the sketches and showed very few improvements with statistical significance.

<Figure
caption="Results: Multi-turn Evaluations"
>
<Image src={multiturn_results_main_4} alt="Figure 3" />
<Image src={multiturn_results_main_4} alt="Figure 4" />
</Figure>

{/* <TwoColumns>
Expand Down

0 comments on commit 322f102

Please sign in to comment.