Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added switch to include Chi-squared p-value in terse output #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 25 additions & 10 deletions src/ent.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ static void help(void)
printf("\n -c Print occurrence counts");
printf("\n -f Fold upper to lower case letters");
printf("\n -t Terse output in CSV format");
printf("\n -p Include Chi-square p-value in terse output (as decimal)");
printf("\n -u Print this message\n");
printf("\nVersion " VERSION);
printf("\nBy John Walker");
Expand Down Expand Up @@ -107,9 +108,10 @@ int main(int argc, char *argv[])
int counts = FALSE, /* Print character counts */
fold = FALSE, /* Fold upper to lower */
binary = FALSE, /* Treat input as a bitstream */
terse = FALSE; /* Terse (CSV format) output */
terse = FALSE, /* Terse (CSV format) output */
csp = FALSE; /* Terse includes Chi^2 p-value */

while ((opt = getopt(argc, argv, "bcftuv?BCFTUV")) != -1) {
while ((opt = getopt(argc, argv, "bcfptuv?BCFPTUV")) != -1) {
switch (toISOlower(opt)) {
case 'b':
binary = TRUE;
Expand All @@ -123,6 +125,10 @@ int main(int argc, char *argv[])
fold = TRUE;
break;

case 'p':
csp = TRUE;
break;

case 't':
terse = TRUE;
break;
Expand Down Expand Up @@ -200,22 +206,31 @@ int main(int argc, char *argv[])
}
fclose(fp);

/* Complete calculation and return sequence metrics */
/* Complete calculation */

rt_end(&ent, &chisq, &mean, &montepi, &scc);

if (terse) {
printf("0,File-%ss,Entropy,Chi-square,Mean,Monte-Carlo-Pi,Serial-Correlation\n",
binary ? "bit" : "byte");
printf("1,%ld,%f,%f,%f,%f,%f\n",
totalc, ent, chisq, mean, montepi, scc);
}

/* Calculate probability of observed distribution occurring from
the results of the Chi-Square test */

chip = pochisq(chisq, (binary ? 1 : 255));

/* Return sequence metrics */

if (terse) {
if (csp) {
printf("0,File-%ss,Entropy,Chi-square,Chi-square-p-val,Mean,Monte-Carlo-Pi,Serial-Correlation\n",
binary ? "bit" : "byte");
printf("1,%ld,%f,%f,%f,%f,%f,%f\n",
totalc, ent, chisq, chip, mean, montepi, scc);
} else {
printf("0,File-%ss,Entropy,Chi-square,Mean,Monte-Carlo-Pi,Serial-Correlation\n",
binary ? "bit" : "byte");
printf("1,%ld,%f,%f,%f,%f,%f\n",
totalc, ent, chisq, mean, montepi, scc);
}
}

/* Print bin counts if requested */

if (counts) {
Expand Down
23 changes: 21 additions & 2 deletions src/ent.html
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ <h3>NAME</h3>

<h3>SYNOPSIS</h3>

<b>ent</b> [ <b>-b -c -f -t -u</b> ] [ <i>infile</i> ]
<b>ent</b> [ <b>-b -c -f -p -t -u</b> ] [ <i>infile</i> ]

<h3>DESCRIPTION</h3>

Expand Down Expand Up @@ -304,6 +304,12 @@ <h3>OPTIONS</h3>
<a href="#Terse">Terse Mode Output Format</a>
below for additional details.</dd>

<dt><b>-p</b></dt> <dd>Used in conjunction with <b>-t</b> to
include the Chi-squared p-value in the terse
output (as decimal). See
<a href="#Terse">Terse Mode Output Format</a>
below for additional details.</dd>

<dt><b>-u</b></dt> <dd>Print how-to-call information.</dd>
</dl>

Expand Down Expand Up @@ -340,7 +346,20 @@ <h3><a name="Terse" class="i">TERSE MODE OUTPUT FORMAT</a></h3>
column title record. If the <b>-b</b> option is specified, the second
field of the type 0 record will be &ldquo;<tt>File-bits</tt>&rdquo;, and
the <em>file_length</em> field in type 1 record will be given
in bits instead of bytes. If the <b>-c</b> option is specified,
in bits instead of bytes.
</p>

<p>
Specifying <b>-p</b> in conjunction with <b>-t</b> includes the Chi-squared p-value in the CSV output. Note that it is provided as decimal, not as a percentage. When specified, the output becomes:
</p>

<pre>
0,File-bytes,Entropy,Chi-square,Chi-square-p-val,Mean,Monte-Carlo-Pi,Serial-Correlation
1,<em>file_length</em>,<em>entropy</em>,<em>chi_square</em>,<em>chi_square_p_val</em>,<em>mean</em>,<em>Pi_value</em>,<em>correlation</em>
</pre>

<p>
If the <b>-c</b> option is specified,
additional records are appended to the terse mode output which
contain the character counts:
</p>
Expand Down