copa+…As a result, C1 or C2? prompting error #65

StellaAthena · 2022-05-17T20:17:44Z

Constructing 'copa+plausible_alternatives' contexts and requests
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 308.64it/s]
Constructing 'copa+…As a result, C1 or C2?' contexts and requests
  0%|                                                                                                                                                        | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 142, in <module>
    main()
  File "main.py", line 109, in main
    results = evaluator.simple_evaluate(
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/utils.py", line 181, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/evaluator.py", line 93, in simple_evaluate
    results = evaluate(
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/utils.py", line 181, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/evaluator.py", line 209, in evaluate
    ctx, fewshotex_logging_info = task.fewshot_context(
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/utils.py", line 181, in _wrapper
    return fn(*args, **kwargs)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/base.py", line 933, in fewshot_context
    example = self.doc_to_text(doc)
  File "/home/mchorse/bigsci/lm-evaluation-harness/lm_eval/base.py", line 663, in doc_to_text
    text, _ = self.prompt.apply(doc)
  File "/home/mchorse/.local/lib/python3.8/site-packages/promptsource/templates.py", line 194, in apply
    raise ValueError("Prompt did not produce an input and at least one target.")
ValueError: Prompt did not produce an input and at least one target.

The text was updated successfully, but these errors were encountered:

jon-tow · 2022-05-18T17:50:25Z

Hey, Stella! Can you make sure you're using the latest eval-hackathon branch of promptsource? We ran into this problem before they updated prompt.apply to return a Tuple[str, List[str]]. Let me know if this problem persists!

Charlie suggested the following in Slack:

pip uninstall "promptsource @ git+https://github.com/bigscience-workshop/promptsource@eval-hackathon"
pip install "promptsource @ git+https://github.com/bigscience-workshop/promptsource@eval-hackathon"

jon-tow · 2022-05-18T20:54:45Z

Update: Some of the copa prompt templates have conditionals like {% if question == \"cause\" %} (see full template here) where no output is produced if the condition is not met. One hack around this is to try-catch these errors and ignore the exception. What do you think?

StellaAthena · 2022-05-21T16:48:23Z

@jon-tow sorry I didn’t see this response. I’m a little confused by your loss recent comment… if that conditional fails, does that mean that there isn’t a prompt? In the event that your try-catch code activates, what does the prompt end up being?

jon-tow · 2022-05-21T17:09:10Z

@StellaAthena

if that conditional fails, does that mean that there isn’t a prompt?

Yes because promptsource's apply method does not properly handle these "skipped" examples.

In the event that your try-catch code activates, what does the prompt end up being?

The prompt ends up being the filled-in template described in the consequent block of the conditional.

Repro: https://colab.research.google.com/drive/1PlQx_2fkyZxHECqRZ8aTGdA0RiFQgvQS?usp=sharing

StellaAthena · 2022-05-24T03:15:37Z

I am dissatisfied with this solution but haven't had the bandwidth to work on it at all. Leaving this comment mostly for my future reference.

Muennighoff · 2022-07-12T20:46:07Z

Also getting this one

[default0]:
[default0]:   Assigning unique IDs to 'copa+ ^` As a result, C1 or C2?' docs
[default0]:
[default0]:  0%|          | 0/100 [00:00<?, ?ex/s]
[default0]:100%| ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h| 100/100 [00:00<00:00, 15571.37ex/s]
[default0]:   Filtering invalid docs from 'copa+ ^` As a result, C1 or C2?'
[default0]:
[default0]:  0%|          | 0/1 [00:00<?, ?ba/s][default0]:
[default0]:100%| ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h| 1/1 [00:00<00:00,  4.48ba/s]
[default0]:100%| ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h ^v^h| 1/1 [00:00<00:00,  4.48ba/s]
[default0]:   Constructing 'copa+ ^` As a result, C1 or C2?' contexts and requests
[default0]:
[default0]:  0%|          | 0/100 [00:00<?, ?it/s][default0]:
[default0]:  0%|          | 0/100 [00:00<?, ?it/s]
[default0]:Traceback (most recent call last):
[default0]:  File "./tasks/eval_harness/evaluate_bsevalharness.py", line 499, in <module>
[default0]:    main()
[default0]:  File "./tasks/eval_harness/evaluate_bsevalharness.py", line 479, in main
[default0]:    results = evaluator.evaluate(lm=adaptor, task_dict={task_name: task}, bootstrap_iters=args.bootstrap_iters, rng=np.random.default_rng(args.seed))
[default0]:  File "/gpfsssd/scratch/rech/six/commun/experiments/muennighoff/lm-evaluation-harness/lm_eval/evaluator.py", line 180, in evaluate
[default0]:    ctx, fewshotex_logging_info = task.fewshot_context(
[default0]:  File "/gpfsssd/scratch/rech/six/commun/experiments/muennighoff/lm-evaluation-harness/lm_eval/api/task.py", line 404, in fewshot_context
[default0]:    prompt = self.doc_to_text(doc)
[default0]:  File "/gpfsssd/scratch/rech/six/commun/experiments/muennighoff/lm-evaluation-harness/lm_eval/api/task.py", line 286, in doc_to_text
[default0]:    text, _ = self.prompt_template.apply(doc)
[default0]:ValueError: not enough values to unpack (expected 2, got 1)

jon-tow · 2022-07-12T21:31:03Z

@Muennighoff Thanks for reporting! We can't robustly resolve this until it gets fixed on the promptsource side but PR #104 addresses your issue.

NOTE: you may need to clear the cache files in your HF datasetsCOPA path (e.g. ~/.cache/huggingface/datasets/super_glue/copa/cache-*)

Muennighoff · 2022-07-16T19:43:25Z

Ah yeah the issue comes from prompts like the below where only cause or only effect work. When the prompt is the other type, it returns a list with just one element, thus erroring out..

    jinja: "{% if question == \"effect\" %} \n{{ premise }} As a result, \"{{ answer_choices[0]\
      \ }}\" or \"{{ answer_choices[1] }}\"? ||| {% if label != -1 %}{{ answer_choices[label]\
      \ }}{%endif%}\n{% endif %}"

@stephenbach

stephenbach · 2022-08-09T17:55:06Z

Sorry for the slow reply! This is something we're still working out the finer points with. It's related to bigscience-workshop/promptsource#749 and bigscience-workshop/promptsource#792. I don't really have a good answer right now, but in the T0 codebase I believe we had to manually check for this case.

jon-tow mentioned this issue May 18, 2022

Fix copa ignored docs #68

Merged

jon-tow mentioned this issue Jul 12, 2022

Add unpacking to trigger invalid promptsource docs #104

Merged

StellaAthena closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copa+…As a result, C1 or C2? prompting error #65

copa+…As a result, C1 or C2? prompting error #65

StellaAthena commented May 17, 2022

jon-tow commented May 18, 2022

jon-tow commented May 18, 2022 •

edited

Loading

StellaAthena commented May 21, 2022

jon-tow commented May 21, 2022

StellaAthena commented May 24, 2022

Muennighoff commented Jul 12, 2022

jon-tow commented Jul 12, 2022 •

edited

Loading

Muennighoff commented Jul 16, 2022 •

edited

Loading

stephenbach commented Aug 9, 2022

copa+…As a result, C1 or C2? prompting error #65

copa+…As a result, C1 or C2? prompting error #65

Comments

StellaAthena commented May 17, 2022

jon-tow commented May 18, 2022

jon-tow commented May 18, 2022 • edited Loading

StellaAthena commented May 21, 2022

jon-tow commented May 21, 2022

StellaAthena commented May 24, 2022

Muennighoff commented Jul 12, 2022

jon-tow commented Jul 12, 2022 • edited Loading

Muennighoff commented Jul 16, 2022 • edited Loading

stephenbach commented Aug 9, 2022

jon-tow commented May 18, 2022 •

edited

Loading

jon-tow commented Jul 12, 2022 •

edited

Loading

Muennighoff commented Jul 16, 2022 •

edited

Loading