Skip to content

Commit

Permalink
Add results with margin measurement
Browse files Browse the repository at this point in the history
  • Loading branch information
pomonam committed Jul 16, 2024
1 parent 3e8550b commit 9b4367d
Show file tree
Hide file tree
Showing 15 changed files with 22,126 additions and 2 deletions.
2,440 changes: 2,440 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/ai.txt

Large diffs are not rendered by default.

2,410 changes: 2,410 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/canada.txt

Large diffs are not rendered by default.

2,348 changes: 2,348 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/cow.txt

Large diffs are not rendered by default.

1,854 changes: 1,854 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/doctor.txt

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/factor_arguments.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"strategy": "ekfac",
"use_empirical_fisher": false,
"amp_dtype": "torch.bfloat16",
"amp_scale": 65536.0,
"has_shared_parameters": false,
"covariance_max_examples": 100000,
"covariance_data_partitions": 4,
"covariance_module_partitions": 2,
"activation_covariance_dtype": "torch.bfloat16",
"gradient_covariance_dtype": "torch.bfloat16",
"eigendecomposition_dtype": "torch.float64",
"lambda_max_examples": 100000,
"lambda_data_partitions": 4,
"lambda_module_partitions": 4,
"use_iterative_lambda_aggregation": true,
"offload_activations_to_cpu": true,
"per_sample_gradient_dtype": "torch.bfloat16",
"lambda_dtype": "torch.bfloat16"
}
2,022 changes: 2,022 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/inflation.txt

Large diffs are not rendered by default.

2,452 changes: 2,452 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/math.txt

Large diffs are not rendered by default.

2,280 changes: 2,280 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/ml.txt

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"type": "Dataset",
"dataset_size": 10,
"indices": null
}
2,096 changes: 2,096 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/science.txt

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/score_arguments.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"damping_factor": null,
"amp_dtype": "torch.bfloat16",
"offload_activations_to_cpu": true,
"data_partitions": 1,
"module_partitions": 1,
"compute_per_module_scores": false,
"compute_per_token_scores": false,
"query_gradient_accumulation_steps": 10,
"query_gradient_low_rank": 64,
"use_full_svd": true,
"aggregate_query_gradients": false,
"aggregate_train_gradients": false,
"use_measurement_for_self_influence": false,
"query_gradient_svd_dtype": "torch.float32",
"per_sample_gradient_dtype": "torch.float32",
"precondition_dtype": "torch.float32",
"score_dtype": "torch.bfloat16"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"type": "Dataset",
"dataset_size": 100000,
"indices": null
}
1,970 changes: 1,970 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/water.txt

Large diffs are not rendered by default.

2,200 changes: 2,200 additions & 0 deletions examples/openwebtext/files/scores_raw_margin/water_korean.txt

Large diffs are not rendered by default.

7 changes: 5 additions & 2 deletions examples/openwebtext/inspect_scores.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,18 @@


def main():
scores = Analyzer.load_file("influence_results/openwebtext/scores_raw/pairwise_scores.safetensors")[
# scores = Analyzer.load_file("influence_results/openwebtext/scores_raw/pairwise_scores.safetensors")[
# "all_modules"
# ].float()
scores = Analyzer.load_file("influence_results/scores_raw_margin_scores/pairwise_scores.safetensors")[
"all_modules"
].float()

train_dataset = get_openwebtext_dataset()
eval_dataset = get_custom_dataset()
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True, trust_remote_code=True)

eval_idx = 5
eval_idx = 0
sorted_scores = torch.sort(scores[eval_idx], descending=True)
top_indices = sorted_scores.indices

Expand Down

0 comments on commit 9b4367d

Please sign in to comment.