Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copa+…As a result, C1 or C2? prompting error #65

Closed
StellaAthena opened this issue May 17, 2022 · 9 comments
Closed

copa+…As a result, C1 or C2? prompting error #65

StellaAthena opened this issue May 17, 2022 · 9 comments

Comments

@StellaAthena
Copy link
Collaborator

Constructing 'copa+plausible_alternatives' contexts and requests
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 308.64it/s]
Constructing 'copa+…As a result, C1 or C2?' contexts and requests
  0%|                                                                                                                                                        | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 142, in <module>
    main()
  File "main.py", line 109, in main
    results = evaluator.simple_evaluate(
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/utils.py", line 181, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/evaluator.py", line 93, in simple_evaluate
    results = evaluate(
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/utils.py", line 181, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/evaluator.py", line 209, in evaluate
    ctx, fewshotex_logging_info = task.fewshot_context(
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/utils.py", line 181, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/base.py", line 933, in fewshot_context
    example = self.doc_to_text(doc)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/base.py", line 663, in doc_to_text
    text, _ = self.prompt.apply(doc)
  File "/home/mchorse/.local/lib/python3.8/site-packages/promptsource/templates.py", line 194, in apply
    raise ValueError("Prompt did not produce an input and at least one target.")
ValueError: Prompt did not produce an input and at least one target.
@jon-tow
Copy link
Collaborator

jon-tow commented May 18, 2022

Hey, Stella! Can you make sure you're using the latest eval-hackathon branch of promptsource? We ran into this problem before they updated prompt.apply to return a Tuple[str, List[str]]. Let me know if this problem persists!

Charlie suggested the following in Slack:

pip uninstall "promptsource @ git+https://github.com/bigscience-workshop/promptsource@eval-hackathon"
pip install "promptsource @ git+https://github.com/bigscience-workshop/promptsource@eval-hackathon"

@jon-tow
Copy link
Collaborator

jon-tow commented May 18, 2022

Update: Some of the copa prompt templates have conditionals like {% if question == \"cause\" %} (see full template here) where no output is produced if the condition is not met. One hack around this is to try-catch these errors and ignore the exception. What do you think?

@StellaAthena
Copy link
Collaborator Author

@jon-tow sorry I didn’t see this response. I’m a little confused by your loss recent comment… if that conditional fails, does that mean that there isn’t a prompt? In the event that your try-catch code activates, what does the prompt end up being?

@jon-tow
Copy link
Collaborator

jon-tow commented May 21, 2022

@StellaAthena

if that conditional fails, does that mean that there isn’t a prompt?

Yes because promptsource's apply method does not properly handle these "skipped" examples.

In the event that your try-catch code activates, what does the prompt end up being?

The prompt ends up being the filled-in template described in the consequent block of the conditional.

Repro: https://colab.research.google.com/drive/1PlQx_2fkyZxHECqRZ8aTGdA0RiFQgvQS?usp=sharing

@StellaAthena
Copy link
Collaborator Author

I am dissatisfied with this solution but haven't had the bandwidth to work on it at all. Leaving this comment mostly for my future reference.

@Muennighoff
Copy link

Also getting this one

[default0]:
[default0]:   Assigning unique IDs to 'copa+ ^` As a result, C1 or C2?' docs
[default0]:
[default0]:  0%|          | 0/100 [00:00<?, ?ex/s]
[default0]:100%| ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h| 100/100 [00:00<00:00, 15571.37ex/s]
[default0]:   Filtering invalid docs from 'copa+ ^` As a result, C1 or C2?'
[default0]:
[default0]:  0%|          | 0/1 [00:00<?, ?ba/s][default0]:
[default0]:100%| ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h| 1/1 [00:00<00:00,  4.48ba/s]
[default0]:100%| ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h| 1/1 [00:00<00:00,  4.48ba/s]
[default0]:   Constructing 'copa+ ^` As a result, C1 or C2?' contexts and requests
[default0]:
[default0]:  0%|          | 0/100 [00:00<?, ?it/s][default0]:
[default0]:  0%|          | 0/100 [00:00<?, ?it/s]
[default0]:Traceback (most recent call last):
[default0]:  File "./tasks/eval_harness/evaluate_bsevalharness.py", line 499, in <module>
[default0]:    main()
[default0]:  File "./tasks/eval_harness/evaluate_bsevalharness.py", line 479, in main
[default0]:    results = evaluator.evaluate(lm=adaptor, task_dict={task_name: task}, bootstrap_iters=args.bootstrap_iters, rng=np.random.default_rng(args.seed))
[default0]:  File "/gpfsssd/scratch/rech/six/commun/experiments/muennighoff/lm-evaluation-harness/lm_eval/evaluator.py", line 180, in evaluate
[default0]:    ctx, fewshotex_logging_info = task.fewshot_context(
[default0]:  File "/gpfsssd/scratch/rech/six/commun/experiments/muennighoff/lm-evaluation-harness/lm_eval/api/task.py", line 404, in fewshot_context
[default0]:    prompt = self.doc_to_text(doc)
[default0]:  File "/gpfsssd/scratch/rech/six/commun/experiments/muennighoff/lm-evaluation-harness/lm_eval/api/task.py", line 286, in doc_to_text
[default0]:    text, _ = self.prompt_template.apply(doc)
[default0]:ValueError: not enough values to unpack (expected 2, got 1)

@jon-tow
Copy link
Collaborator

jon-tow commented Jul 12, 2022

@Muennighoff Thanks for reporting! We can't robustly resolve this until it gets fixed on the promptsource side but PR #104 addresses your issue.

NOTE: you may need to clear the cache files in your HF datasetsCOPA path (e.g. ~/.cache/huggingface/datasets/super_glue/copa/cache-*)

@Muennighoff
Copy link

Muennighoff commented Jul 16, 2022

Ah yeah the issue comes from prompts like the below where only cause or only effect work. When the prompt is the other type, it returns a list with just one element, thus erroring out..

    jinja: "{% if question == \"effect\" %} \n{{ premise }} As a result, \"{{ answer_choices[0]\
      \ }}\" or \"{{ answer_choices[1] }}\"? ||| {% if label != -1 %}{{ answer_choices[label]\
      \ }}{%endif%}\n{% endif %}"

@stephenbach

@stephenbach
Copy link
Member

Sorry for the slow reply! This is something we're still working out the finer points with. It's related to bigscience-workshop/promptsource#749 and bigscience-workshop/promptsource#792. I don't really have a good answer right now, but in the T0 codebase I believe we had to manually check for this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants