📊 LM Eval Harness Tasks

Evaluation of Keyformer on lm-eval-harness framework tasks.

Generate Task data

python -u generate_task_data.py \
    --output-file ./task_data/${task}-${shots}.jsonl \
    --task-name ${task} \
    --num-fewshot ${shots}

Generate Output data with model

Full Attention

python -u run_lm_eval_harness.py \
    --input-path ./task_data/${task}-${shots}.jsonl \
    --output-path ./model_eval_data/${task}-${shots}-${model_type}-${keyformer}-${kv_cache}-${recent}.jsonl \
    --model-name ${model_name} \
    --model-path ${model_path} \
    --dtype ${dtype} \
    --kv_cache ${kv_cache} \
    --recent ${recent}

Keyformer

python -u run_lm_eval_harness.py \
    --input-path ./task_data/${task}-${shots}.jsonl \
    --output-path ./model_eval_data/${task}-${shots}-${model_type}-${keyformer}-${kv_cache}-${recent}.jsonl \
    --model-name ${model_name} \
    --model-path ${model_path} \
    --dtype ${dtype} \
    --keyformer \
    --kv_cache ${kv_cache} \
    --recent ${recent}

Evaluate the performance

python -u evaluate_task_result.py \
    --result-file ./model_eval_data/${task}-${shots}-${model_type}-${keyformer}-${kv_cache}-${recent}.jsonl \
    --output-file ./output/${task}-${shots}-${model_type}-${keyformer}-${kv_cache}-${recent}.jsonl \
    --task-name ${task} \
    --num-fewshot ${shots} \
    --model-name ${model_name}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

📊 LM Eval Harness Tasks

Generate Task data

Generate Output data with model

Full Attention

Keyformer

Evaluate the performance

Files

README.md

Latest commit

History

README.md

File metadata and controls

📊 LM Eval Harness Tasks

Generate Task data

Generate Output data with model

Full Attention

Keyformer

Evaluate the performance