Merge pull request #2 from joshgabriel/python-R-bokeh

Python r bokeh
usnistgov · Mar 1, 2017 · bfaeaa3 · bfaeaa3
2 parents ec5e8ab + 29ae7a1
commit bfaeaa3
Show file tree

Hide file tree

Showing 37 changed files with 44,959 additions and 49 deletions.
diff --git a/APP_USAGE.md b/APP_USAGE.md
@@ -0,0 +1,33 @@
+Crossfiltering user interactive workflow:
+
+1. Looks at the periodic table and structure widget
+
+2. Selects the structure widget
+   * updates the elements that can be selected get highlighted.
+
+3. Selects the element widget (periodic table)
+   * updates the property choices, code, exchange that can
+    be selected in the respective widgets
+
+4. Selects the property widget
+   * updates the code widget (will remain the same mostly)
+
+5. Selects the code widgets
+   * updates the exchange widget
+
+6. Selects the exchange widgets
+   * final selection, updates the plottables (mostly fixed)
+     - value vs. k-point density
+     - value_error vs
+
+     this means
+     x = ['k-point density', 'value', 'value_error']
+
+7. Selects the plottables for x-y, in future x-y-z
+   - options are values vs. k-point
+   - update what statistical tools can be used on the data
+   - update what plot types are available too
+
+8. Selects plot type
+   * histogram for data in the database of the chosen specs
+   * scatter
diff --git a/SETUP.md b/SETUP.md
@@ -8,7 +8,7 @@ environment where we will install all the paltform dependencies.
 
 We recommand that you have three terminal windows to run these following
 sections. Each will producing a trace in the terminal that you might want
-to look into. All three of them are servers and will not return the 
+to look into. All three of them are servers and will not return the
 terminal input unless you detach them with '&'.  Every section commands
 should be run after activating the environment we just created.
 
@@ -43,14 +43,24 @@ Then run it.
     $ pip install -r requirements.txt
     $ python run.py --host 0.0.0.0 --port 7000
 
-In a browser go to [API data entry](http://0.0.0.0:7000/bench/push/csv). 
+In a browser go to [API data entry](http://0.0.0.0:7000/bench/push/csv).
 This is the api frontend for uploading the dft data. Click on 'Choose File'
 and navigate to: dft-crossfilter/benchmark/data/francesca_data_full.csv.
 This will push this dft data set into the mongodb database 'benchmark-production'.
 To test it out go to [Data description dictionnary](http://0.0.0.0:7000/bench/desc/all).
 
 ## bokeh setup
 
+(under testing but this should work, it worked on anaconda on windows)
+
+    $ conda install bokeh
+    $ cd benchmark-view
+    $ bokeh serve crossfilter_app --show
+
+This should automatically start a browser window with the app rendered on it, this can be
+host - port tuned later.
+
+(old setup which may not be necessary now)
 At this point you are set for the data access part. For the visualizatio part
 you will have to build this modified bokeh snapshot. You will need to have gulp
 installed. When asked, select the full install with the option:

diff --git a/benchmark-api/requirements.txt b/benchmark-api/requirements.txt
@@ -5,13 +5,13 @@ Flask-Mongoengine==0.7.1
 Flask-Wtf==0.11
 Flask-Testing==0.4.2
 Pymongo==2.8
-Twill==1.8.0
+#Twill==1.8.0
 Cssselect==0.9.1
 Mongoengine==0.8.7
 Nose==1.3.4
 Werkzeug==0.9.6
 itsdangerous==0.24
-wsgiref==0.1.2
+#wsgiref==0.1.2
 Requests==2.4.1
 hurry.filesize == 0.9
-python-daemon==2.0.1
+#python-daemon==2.0.1
diff --git a/benchmark-db/requirements.txt b/benchmark-db/requirements.txt
@@ -1,11 +1,11 @@
 Docopt==0.6.2
 Pymongo==2.8
-Twill==1.8.0
+#Twill==1.8.0
 Cssselect==0.9.1
 Mongoengine==0.8.7
 Nose==1.3.4
 Werkzeug==0.9.6
 itsdangerous==0.24
-wsgiref==0.1.2
+#wsgiref==0.1.2
 Requests==2.4.1
-Click==3.3
+Click==3.3
diff --git a/benchmark-view/Big_Picture_To_Do.md b/benchmark-view/Big_Picture_To_Do.md
@@ -1,8 +1,7 @@
 What we want to do in this branch:
 
-  "create a flask app using the new bokeh server that runs on Apache and can render an Iframe from Shiny R"
-
-    - which uses bokeh's crossfilter model classes (that makes use of pandas dataframe tools)
+  create a flask app using the new bokeh server that runs on Apache and can render an Iframe from Shiny R
+    - which uses bokehs crossfilter model classes (that makes use of pandas dataframe tools)
       to crossfilter data
     - use the new bokeh server to interact directly with the REST api of benchmark-db
     - use Shiny R to create html that can be Iframed into python served bokeh app for the data analysis.
@@ -21,3 +20,8 @@ What we want to do in this branch:
              statistics.
       - an about page linked that summarizes the project
       - a contact page linked that summarizes whom to contact
+
+  ** widgets
+     - plotting checkboxes (log ? )
+     - zoom/pan/download
+     - text entry of queries
diff --git a/benchmark-view/ShinyApps/Numerical_Precs_Methods_Scripts/Pade.py b/benchmark-view/ShinyApps/Numerical_Precs_Methods_Scripts/Pade.py
@@ -0,0 +1,212 @@
+## python script to crossfilter out and run Pade approximation on a database 
+## inputs : Pade script nls.R, 
+##          and one of database source csv or path to crossfiltered named
+##          files            
+## the necessary details to fit a Pade approximation for
+## variety of functions.
+
+import pandas as pd
+import numpy as np
+import os
+from glob import glob
+import sys
+
+def crossfilters(database):
+    """
+    crossfilter out completely a collection
+    """
+    database = database #pd.read_csv('MainCollection.csv')
+
+    # crossfilter down to VASP's fcc Nb Bulk modulus
+
+    names = []
+
+    codes = np.unique(database['code'])    
+
+    for c in codes:
+        code = database[database['code']==c]
+        structures = np.unique(code['structure'])
+        for struct in structures:
+            struct_code = code[code['structure']==struct]
+            exchanges = np.unique(struct_code['exchange'])
+            for ex in exchanges:
+                ex_struct_code = struct_code[struct_code['exchange']==ex]
+                elements = np.unique(ex_struct_code['element'])
+                for el in elements:
+                     el_ex_struct_code = ex_struct_code[ex_struct_code['element']==el]
+                     properties = el_ex_struct_code['property']
+                     for pr in properties:
+                         pr_el_ex_struct_code = el_ex_struct_code[el_ex_struct_code['property']==pr]
+
+                         prop = list(pr_el_ex_struct_code['value'])
+                         kpts = list(pr_el_ex_struct_code['k-point'])
+
+                         k_atom = [ k**3 for k in kpts ]
+
+                         Pade_df = pd.DataFrame({'Kpts_atom': k_atom, 'P': prop})
+
+                         TAG =   {'element':el,
+                                  'structure':struct,
+                                  'exchange':ex,
+                                  'code':c,
+                                  'property':pr}
+
+                         NAME = '_'.join([pr, el, ex, struct, c])+'.csv'
+                         names.append( (NAME,TAG) )
+                         print ("Writing {} ..".format(NAME))
+                         Pade_df.to_csv('Crossfilts/'+NAME, index=False)
+
+    return names
+
+
+def read_crossfilts_from_file(filename):
+    """
+    reads the crossfiltered file and also decomposes the filename 
+    into the tags and sends the crossfilt and the tags 
+    """
+
+    if len(filename[11:-4].split('_')) == 6:
+        pr, el, ex, _,  struct, c = filename[11:-4].split('_')
+        ex = '_'.join([ex,_])
+    else:
+        pr, el, ex, struct, c = filename[11:-4].split('_')
+
+    tags = {'element': el,
+            'property': pr,
+            'exchange': ex, 
+            'code': c,
+            'structure':struct}
+    return filename, tags
+
+def run_pade_through_R(rscript, crossfilt, tags):
+    """
+    runs the Pade through a python subprocess call to nls.R
+    on the input crossfilt
+    - copies the input to Rdata.csv for input to nls.R
+    - retrieves the output of nls.R that is pasted out into csv file
+      that can be read back into pandas
+      .. element, structure, exchange, code, property, extrapolate, fit error
+      which can serve as another reference collection for calculation of
+      the precision from the main database.
+    """
+
+    result = {'element':tags['element'],
+              'structure':tags['structure'],
+              'exchange':tags['exchange'],
+              'code':tags['code'],
+              'property':tags['property']}
+
+    os.system('cp {} Rdata.csv'.format(crossfilt))
+    # for making the first database 
+    # os.system('cp Crossfilts/{} Rdata.csv'.format(crossfilt))
+    # os.mkdir(crossfilt) 
+    #os.chdir(crossfilt)   
+    #os.system('cp ../{} Rdata.csv'.format(crossfilt))
+    #os.system('cp ../{0} {0}'.format(rscript))
+
+    print ('copied {}'.format(crossfilt))
+
+    try:
+       os.system('Rscript {}'.format(rscript))
+       print ('R executed')
+       R_result = pd.read_csv('Result.csv')
+       key = list(R_result['Error']).index(min(list(R_result['Error'])))
+       result['extrapolate'] = list(R_result['Extrapolate'])#[key]
+       result['best_extrapolate'] = list(R_result['Extrapolate'])[key]
+       result['best_error'] = list(R_result['Error'])[key]
+       result['best_order'] = list(R_result['Order'])[key]
+       result['fit_error'] = list(R_result['Error'])#[key]
+       result['pade_order'] = list(R_result['Order'])#[key]
+       #result['precision'] = list(R_result['Precisions'])
+       print ("R success")
+
+    except:
+       print ("R failure")
+       result['best_extrapolate'] = 'xxx'
+       result['best_error'] = 'xxx'
+       result['best_order'] = 'xxx'
+       result['extrapolate'] = 'xxx'
+       result['fit_error'] = 'xxx'
+       result['pade_order'] = 'xxx'
+
+    # os.chdir('../')
+    #print (result, type(result))
+    #pade_result = pd.DataFrame(result)
+
+    return result
+
+
+
+if __name__=='__main__':
+    """
+    calculate the fit for a given crossfiltered set
+    for different Pade sets
+
+    first Milestone - one crossfiltered set :
+     Nb B for m+n orders (m, n =2-4)  .. output file Pade.csv
+
+    
+    """
+
+    #database_path = 'MainCollection_v2moreclean.csv'
+
+    rscript = 'hennig_nls.R'#'nls_kpts_choices.R'
+    database_path = None
+    crossfilts_path = 'Crossfilts/*.csv'
+    #crossfilts_path = None
+
+    output_filename = 'Pade_extrapolates_v2.csv'#'Pade_kpts_choices_leave3_10.csv'
+
+    if database_path:
+        print ("Performing crossfiltering on {}..".format(database_path))
+        filetags = crossfilters(pd.read_csv(database_path))
+    elif crossfilts_path:
+        print ("Reading crossfilters from {}..".format(crossfilts_path))
+        filetags = [read_crossfilts_from_file(f) for f in glob(crossfilts_path) ]
+        length_crossfilts = len(filetags)
+    else:
+        print ('input not provided')
+        sys.exit(0)
+
+    records = []
+
+    print ("Running Pade..")
+
+    for n, (f,t) in enumerate(filetags):
+        print ("Running through {0} of {1}".format(n, length_crossfilts))
+        records.append( run_pade_through_R(rscript, f, t) )
+
+    Pade_analysis= pd.DataFrame({'element': [r['element'] for r in records], 
+                  'structure': [r['structure'] for r in records],
+                  'exchange': [r['exchange'] for r in records],
+                  'code': [r['code'] for r in records],
+                  'property': [r['property'] for r in records],
+                  'best_extrapolate': [r['best_extrapolate'] for r in records],
+                  'best_error': [r['best_error'] for r in records], 
+                  'best_order': [r['best_order'] for r in records],
+                  'extrapolate': [r['extrapolate'] for r in records],
+                  'fit_error': [r['fit_error'] for r in records],
+                  'pade_order': [r['pade_order'] for r in records]  })
+
+#    pade_analysis = pd.concat(records)
+
+    # remove the index and duplicates 
+
+    print ("Writing out Pade analysis... ")
+
+    Pade_analysis.to_csv(output_filename)
+
+    Pade_analysis = pd.read_csv(output_filename)
+
+    del Pade_analysis['Unnamed: 0']
+
+    Pade_analysis.drop_duplicates(inplace=True)
+
+    Pade_analysis.to_csv(output_filename)
+
+
+
+
+
+
+
diff --git a/benchmark-view/ShinyApps/Numerical_Precs_Methods_Scripts/README.md b/benchmark-view/ShinyApps/Numerical_Precs_Methods_Scripts/README.md
@@ -0,0 +1,13 @@
+Sequence of data processing to widget activated analysis and plotting
+
+* Pade.py run in crossfilter mode on main collecrtion
+
+* Pade.py run in Pade extrapolate mode -> intermediate Pade_extrapolates.csv file 
+
+* create_precisions.py run to calculate numerical precisions from the Pade_extrapolates and Main collection files 
+
+* calculate power law fits with histogram_debug.py 
+
+* plot histogram for complete dataset with histogram_percs_plotter.py
+
+* transform for creating k-points densoity choice recommendations