-
Notifications
You must be signed in to change notification settings - Fork 0
VBaratham/lsst_log_analysis_tool
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PREREQS Python Packages: MySQLdb, PyGUI You'll also need gnuplot with a png terminal installed. To get the png terminal, install zlib, libpng, freetype, and libgd before installing gnuplot (2). You should have mysql installed and the general log should be divided by month in tables named yyyy_mm, such as '2010_04' for April 2010. These tables should be in a database called 'general_log'. ================================================================================ CONFIGURING THE TOOL Edit config.json as necessary. What can I specify in config.json? config.json is a dictionary with all the configuration information. Here are the keys allowed (other keys are ignored) and what their values should be: 1.) "db_conn_params": dict of kwargs to be passed to MySQLdb.connect(). See MySQLdb documentation for the allowed kwargs. Note: the "cursorclass" kwarg can be specified here as a string for json compatability. "Cursor" or "SSCursor" are the only ones that the tool can use. 2.) "reducer": dict defining the behavior of the query reducer (preprocessor). The default config.json has reducer options that are more or less suitable for LSST database developers. Keys and values - "ignore_queries": List of full queries to discard "ignore_users": List of users whose queries will be discarded "unwanted_terms": List of strings. If any of these strings are found in a query, that query will be discarded "unwanted_starts": List of strings. If any of these strings are found at the start of a query, that query will be discarded 3.) "numtop": Number of top queries to show for each user (defaults to 200 if not specified) 4.) "plot_dir": directory to write data files, gnuplot scripts, and plot image files. Can be absolute (start with '/') or relative. ================================================================================ PREPARING THE LOG To create the reduced_log schema ('reduced_log' db and tables 'users' and 'servers' therein): $ python create_reduced_log.py --initialize To reduce tables from general_log into their reduced versions in reduced_log: $ python create_reduced_log.py --reduce To create the 'unified' table in the reduced_log db $ python create_reduced_log.py --create_unified Note: this requires at least 2 tables to be already reduced. Their data will go into the unified table. To unify all the individual month tables in the reduced_log db (add a new month(s)'s data in) $ python create_reduced_log.py --unify To display these commands: $ python create_reduced_log.py ================================================================================ USING THE TOOL Run tool.py *** Filters *** Most of the filters are pretty self explanatory. Here are a few notes: Enter dates in mm/dd/yyyy format (single digit month/day are okay). Be careful using group by week, because weeks can get cut off at year boundaries. Deselecting all of a certain filter allows anything to pass for that category (ie, is equivalent to selecting all). The number in parenthesis next to each user/server is the number of queries in the current data set from that user/server. For user/server search strings: typing in the text field will automatically select the users/servers for which your search string is a substring. IMPORTANT: On some systems, this may be one keystroke behind, so if the checkboxes don't light up when expected, hit enter (or another key). For more info, see footnote (1). Query search strings are passed to mysql's LIKE, so appropriate wildcards are needed (ie, "%yFluxSigma%", not just "yFluxSigma"). If you don't type anything in a query search string box, it won't be included (leave the text as "Query Search String #"). If you start typing and decide you don't want to include that term, erase the text: leaving it empty will also exclude it. Be careful using the "Negate Filter" button above the "Refresh" and "Update" buttons - this puts a "NOT" in front of the "WHERE" clause of the sql statement that selects data. It's usually safer to use the individual negations/inversions. *** "Refresh" vs "Update" *** The "Refresh" button simply updates the graphics and top query lists based on the currently defined filter. The "Update" button does this and also causes the next filter to be cascaded on top of this one (ie, the next filter will select data from the new result set). Once data has been filtered out with "Update", future filters will execute faster, but any data that was filtered out won't show up. *** The graph panel *** The graph panel contains visualizations of the current result set. The radio buttons below the graph select the data view. They are as follows: All users: total queries over time -> queries by type over time -> breakdown of total queries by type -> breakdown of total queries by user Per user, query type: Breakdown of each user's query traffic by query type Per user, time: Line graph showing each user's total queries over time Per user, query type and time: Line graph showing each user's queries over time with a separate line for each query type To cycle between the different graphs in a view, click the graph. *** The top queries panel *** The top queries panel shows the top 200 most common queries and the number of times each query was run. To change between each user and the total, use the "prev" and "next" buttons. ================================================================================ ================================================================================ (1): Some systems (eg Windows) require events to be handled by the main system event loop, in which case the value of the text field isn't updated immediately upon the keystroke. However, the code that checks for updates in the textbox runs immediately upon the keystroke, which causes the discrepancy. See http://mail.python.org/pipermail/pygui/2010-November/000102.html and the "next message" (2): See https://mailman.cae.wisc.edu/pipermail/help-octave/2005-August/017633.html and http://www.physics.buffalo.edu/phy410-505/tools/install/
About
A tool for viewing and analyzing MySQL's general_log for LSST database developers. Developed at SLAC, summer 2013 as a SULI intern
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published