-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
executable file
·170 lines (114 loc) · 6.21 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
PREREQS
Python Packages: MySQLdb, PyGUI
You'll also need gnuplot with a png terminal installed. To get the png
terminal, install zlib, libpng, freetype, and libgd before installing
gnuplot (2).
You should have mysql installed and the general log should be divided
by month in tables named yyyy_mm, such as '2010_04' for April
2010. These tables should be in a database called 'general_log'.
================================================================================
CONFIGURING THE TOOL
Edit config.json as necessary.
What can I specify in config.json? config.json is a dictionary with
all the configuration information. Here are the keys allowed (other
keys are ignored) and what their values should be:
1.) "db_conn_params": dict of kwargs to be passed to
MySQLdb.connect(). See MySQLdb documentation for the allowed
kwargs. Note: the "cursorclass" kwarg can be specified here as a
string for json compatability. "Cursor" or "SSCursor" are the only
ones that the tool can use.
2.) "reducer": dict defining the behavior of the query reducer
(preprocessor). The default config.json has reducer options that
are more or less suitable for LSST database developers.
Keys and values -
"ignore_queries": List of full queries to discard
"ignore_users": List of users whose queries will be discarded
"unwanted_terms": List of strings. If any of these strings are
found in a query, that query will be discarded
"unwanted_starts": List of strings. If any of these strings
are found at the start of a query, that
query will be discarded
3.) "numtop": Number of top queries to show for each user (defaults to
200 if not specified)
4.) "plot_dir": directory to write data files, gnuplot scripts, and
plot image files. Can be absolute (start with '/') or relative.
================================================================================
PREPARING THE LOG
To create the reduced_log schema ('reduced_log' db and tables 'users'
and 'servers' therein):
$ python create_reduced_log.py --initialize
To reduce tables from general_log into their reduced versions in
reduced_log:
$ python create_reduced_log.py --reduce
To create the 'unified' table in the reduced_log db
$ python create_reduced_log.py --create_unified
Note: this requires at least 2 tables to be already reduced.
Their data will go into the unified table.
To unify all the individual month tables in the reduced_log db (add a
new month(s)'s data in)
$ python create_reduced_log.py --unify
To display these commands:
$ python create_reduced_log.py
================================================================================
USING THE TOOL
Run tool.py
*** Filters ***
Most of the filters are pretty self explanatory. Here are a few notes:
Enter dates in mm/dd/yyyy format (single digit month/day are okay). Be
careful using group by week, because weeks can get cut off at year
boundaries.
Deselecting all of a certain filter allows anything to pass for that
category (ie, is equivalent to selecting all).
The number in parenthesis next to each user/server is the number of
queries in the current data set from that user/server.
For user/server search strings: typing in the text field will
automatically select the users/servers for which your search string is
a substring. IMPORTANT: On some systems, this may be one keystroke
behind, so if the checkboxes don't light up when expected, hit enter
(or another key). For more info, see footnote (1).
Query search strings are passed to mysql's LIKE, so appropriate
wildcards are needed (ie, "%yFluxSigma%", not just "yFluxSigma"). If
you don't type anything in a query search string box, it won't be
included (leave the text as "Query Search String #"). If you start
typing and decide you don't want to include that term, erase the text:
leaving it empty will also exclude it.
Be careful using the "Negate Filter" button above the "Refresh" and
"Update" buttons - this puts a "NOT" in front of the "WHERE" clause of
the sql statement that selects data. It's usually safer to use the
individual negations/inversions.
*** "Refresh" vs "Update" ***
The "Refresh" button simply updates the graphics and top query lists
based on the currently defined filter. The "Update" button does this
and also causes the next filter to be cascaded on top of this one (ie,
the next filter will select data from the new result set). Once data
has been filtered out with "Update", future filters will execute
faster, but any data that was filtered out won't show up.
*** The graph panel ***
The graph panel contains visualizations of the current result set. The
radio buttons below the graph select the data view. They are as
follows:
All users: total queries over time -> queries by type over time ->
breakdown of total queries by type -> breakdown of total queries by
user
Per user, query type: Breakdown of each user's query traffic by query
type
Per user, time: Line graph showing each user's total queries over time
Per user, query type and time: Line graph showing each user's queries
over time with a separate line for each query type
To cycle between the different graphs in a view, click the graph.
*** The top queries panel ***
The top queries panel shows the top 200 most common queries and the
number of times each query was run. To change between each user and
the total, use the "prev" and "next" buttons.
================================================================================
================================================================================
(1): Some systems (eg Windows) require events to be handled by the
main system event loop, in which case the value of the text field
isn't updated immediately upon the keystroke. However, the code that
checks for updates in the textbox runs immediately upon the keystroke,
which causes the discrepancy. See
http://mail.python.org/pipermail/pygui/2010-November/000102.html and
the "next message"
(2): See
https://mailman.cae.wisc.edu/pipermail/help-octave/2005-August/017633.html
and http://www.physics.buffalo.edu/phy410-505/tools/install/