Skip to content

Commit

Permalink
Merge pull request #2 from joshgabriel/python-R-bokeh
Browse files Browse the repository at this point in the history
Python r bokeh
  • Loading branch information
joshgabriel authored Mar 1, 2017
2 parents ec5e8ab + 29ae7a1 commit bfaeaa3
Show file tree
Hide file tree
Showing 37 changed files with 44,959 additions and 49 deletions.
33 changes: 33 additions & 0 deletions APP_USAGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Crossfiltering user interactive workflow:

1. Looks at the periodic table and structure widget

2. Selects the structure widget
* updates the elements that can be selected get highlighted.

3. Selects the element widget (periodic table)
* updates the property choices, code, exchange that can
be selected in the respective widgets

4. Selects the property widget
* updates the code widget (will remain the same mostly)

5. Selects the code widgets
* updates the exchange widget

6. Selects the exchange widgets
* final selection, updates the plottables (mostly fixed)
- value vs. k-point density
- value_error vs

this means
x = ['k-point density', 'value', 'value_error']

7. Selects the plottables for x-y, in future x-y-z
- options are values vs. k-point
- update what statistical tools can be used on the data
- update what plot types are available too

8. Selects plot type
* histogram for data in the database of the chosen specs
* scatter
14 changes: 12 additions & 2 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ environment where we will install all the paltform dependencies.

We recommand that you have three terminal windows to run these following
sections. Each will producing a trace in the terminal that you might want
to look into. All three of them are servers and will not return the
to look into. All three of them are servers and will not return the
terminal input unless you detach them with '&'. Every section commands
should be run after activating the environment we just created.

Expand Down Expand Up @@ -43,14 +43,24 @@ Then run it.
$ pip install -r requirements.txt
$ python run.py --host 0.0.0.0 --port 7000

In a browser go to [API data entry](http://0.0.0.0:7000/bench/push/csv).
In a browser go to [API data entry](http://0.0.0.0:7000/bench/push/csv).
This is the api frontend for uploading the dft data. Click on 'Choose File'
and navigate to: dft-crossfilter/benchmark/data/francesca_data_full.csv.
This will push this dft data set into the mongodb database 'benchmark-production'.
To test it out go to [Data description dictionnary](http://0.0.0.0:7000/bench/desc/all).

## bokeh setup

(under testing but this should work, it worked on anaconda on windows)

$ conda install bokeh
$ cd benchmark-view
$ bokeh serve crossfilter_app --show

This should automatically start a browser window with the app rendered on it, this can be
host - port tuned later.

(old setup which may not be necessary now)
At this point you are set for the data access part. For the visualizatio part
you will have to build this modified bokeh snapshot. You will need to have gulp
installed. When asked, select the full install with the option:
Expand Down
6 changes: 3 additions & 3 deletions benchmark-api/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ Flask-Mongoengine==0.7.1
Flask-Wtf==0.11
Flask-Testing==0.4.2
Pymongo==2.8
Twill==1.8.0
#Twill==1.8.0
Cssselect==0.9.1
Mongoengine==0.8.7
Nose==1.3.4
Werkzeug==0.9.6
itsdangerous==0.24
wsgiref==0.1.2
#wsgiref==0.1.2
Requests==2.4.1
hurry.filesize == 0.9
python-daemon==2.0.1
#python-daemon==2.0.1
6 changes: 3 additions & 3 deletions benchmark-db/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Docopt==0.6.2
Pymongo==2.8
Twill==1.8.0
#Twill==1.8.0
Cssselect==0.9.1
Mongoengine==0.8.7
Nose==1.3.4
Werkzeug==0.9.6
itsdangerous==0.24
wsgiref==0.1.2
#wsgiref==0.1.2
Requests==2.4.1
Click==3.3
Click==3.3
10 changes: 7 additions & 3 deletions benchmark-view/Big_Picture_To_Do.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
What we want to do in this branch:

"create a flask app using the new bokeh server that runs on Apache and can render an Iframe from Shiny R"

- which uses bokeh's crossfilter model classes (that makes use of pandas dataframe tools)
create a flask app using the new bokeh server that runs on Apache and can render an Iframe from Shiny R
- which uses bokehs crossfilter model classes (that makes use of pandas dataframe tools)
to crossfilter data
- use the new bokeh server to interact directly with the REST api of benchmark-db
- use Shiny R to create html that can be Iframed into python served bokeh app for the data analysis.
Expand All @@ -21,3 +20,8 @@ What we want to do in this branch:
statistics.
- an about page linked that summarizes the project
- a contact page linked that summarizes whom to contact

** widgets
- plotting checkboxes (log ? )
- zoom/pan/download
- text entry of queries
212 changes: 212 additions & 0 deletions benchmark-view/ShinyApps/Numerical_Precs_Methods_Scripts/Pade.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
## python script to crossfilter out and run Pade approximation on a database
## inputs : Pade script nls.R,
## and one of database source csv or path to crossfiltered named
## files
## the necessary details to fit a Pade approximation for
## variety of functions.

import pandas as pd
import numpy as np
import os
from glob import glob
import sys

def crossfilters(database):
"""
crossfilter out completely a collection
"""
database = database #pd.read_csv('MainCollection.csv')

# crossfilter down to VASP's fcc Nb Bulk modulus

names = []

codes = np.unique(database['code'])

for c in codes:
code = database[database['code']==c]
structures = np.unique(code['structure'])
for struct in structures:
struct_code = code[code['structure']==struct]
exchanges = np.unique(struct_code['exchange'])
for ex in exchanges:
ex_struct_code = struct_code[struct_code['exchange']==ex]
elements = np.unique(ex_struct_code['element'])
for el in elements:
el_ex_struct_code = ex_struct_code[ex_struct_code['element']==el]
properties = el_ex_struct_code['property']
for pr in properties:
pr_el_ex_struct_code = el_ex_struct_code[el_ex_struct_code['property']==pr]

prop = list(pr_el_ex_struct_code['value'])
kpts = list(pr_el_ex_struct_code['k-point'])

k_atom = [ k**3 for k in kpts ]

Pade_df = pd.DataFrame({'Kpts_atom': k_atom, 'P': prop})

TAG = {'element':el,
'structure':struct,
'exchange':ex,
'code':c,
'property':pr}

NAME = '_'.join([pr, el, ex, struct, c])+'.csv'
names.append( (NAME,TAG) )
print ("Writing {} ..".format(NAME))
Pade_df.to_csv('Crossfilts/'+NAME, index=False)

return names


def read_crossfilts_from_file(filename):
"""
reads the crossfiltered file and also decomposes the filename
into the tags and sends the crossfilt and the tags
"""

if len(filename[11:-4].split('_')) == 6:
pr, el, ex, _, struct, c = filename[11:-4].split('_')
ex = '_'.join([ex,_])
else:
pr, el, ex, struct, c = filename[11:-4].split('_')

tags = {'element': el,
'property': pr,
'exchange': ex,
'code': c,
'structure':struct}
return filename, tags

def run_pade_through_R(rscript, crossfilt, tags):
"""
runs the Pade through a python subprocess call to nls.R
on the input crossfilt
- copies the input to Rdata.csv for input to nls.R
- retrieves the output of nls.R that is pasted out into csv file
that can be read back into pandas
.. element, structure, exchange, code, property, extrapolate, fit error
which can serve as another reference collection for calculation of
the precision from the main database.
"""

result = {'element':tags['element'],
'structure':tags['structure'],
'exchange':tags['exchange'],
'code':tags['code'],
'property':tags['property']}

os.system('cp {} Rdata.csv'.format(crossfilt))
# for making the first database
# os.system('cp Crossfilts/{} Rdata.csv'.format(crossfilt))
# os.mkdir(crossfilt)
#os.chdir(crossfilt)
#os.system('cp ../{} Rdata.csv'.format(crossfilt))
#os.system('cp ../{0} {0}'.format(rscript))

print ('copied {}'.format(crossfilt))

try:
os.system('Rscript {}'.format(rscript))
print ('R executed')
R_result = pd.read_csv('Result.csv')
key = list(R_result['Error']).index(min(list(R_result['Error'])))
result['extrapolate'] = list(R_result['Extrapolate'])#[key]
result['best_extrapolate'] = list(R_result['Extrapolate'])[key]
result['best_error'] = list(R_result['Error'])[key]
result['best_order'] = list(R_result['Order'])[key]
result['fit_error'] = list(R_result['Error'])#[key]
result['pade_order'] = list(R_result['Order'])#[key]
#result['precision'] = list(R_result['Precisions'])
print ("R success")

except:
print ("R failure")
result['best_extrapolate'] = 'xxx'
result['best_error'] = 'xxx'
result['best_order'] = 'xxx'
result['extrapolate'] = 'xxx'
result['fit_error'] = 'xxx'
result['pade_order'] = 'xxx'

# os.chdir('../')
#print (result, type(result))
#pade_result = pd.DataFrame(result)

return result



if __name__=='__main__':
"""
calculate the fit for a given crossfiltered set
for different Pade sets
first Milestone - one crossfiltered set :
Nb B for m+n orders (m, n =2-4) .. output file Pade.csv
"""

#database_path = 'MainCollection_v2moreclean.csv'

rscript = 'hennig_nls.R'#'nls_kpts_choices.R'
database_path = None
crossfilts_path = 'Crossfilts/*.csv'
#crossfilts_path = None

output_filename = 'Pade_extrapolates_v2.csv'#'Pade_kpts_choices_leave3_10.csv'

if database_path:
print ("Performing crossfiltering on {}..".format(database_path))
filetags = crossfilters(pd.read_csv(database_path))
elif crossfilts_path:
print ("Reading crossfilters from {}..".format(crossfilts_path))
filetags = [read_crossfilts_from_file(f) for f in glob(crossfilts_path) ]
length_crossfilts = len(filetags)
else:
print ('input not provided')
sys.exit(0)

records = []

print ("Running Pade..")

for n, (f,t) in enumerate(filetags):
print ("Running through {0} of {1}".format(n, length_crossfilts))
records.append( run_pade_through_R(rscript, f, t) )

Pade_analysis= pd.DataFrame({'element': [r['element'] for r in records],
'structure': [r['structure'] for r in records],
'exchange': [r['exchange'] for r in records],
'code': [r['code'] for r in records],
'property': [r['property'] for r in records],
'best_extrapolate': [r['best_extrapolate'] for r in records],
'best_error': [r['best_error'] for r in records],
'best_order': [r['best_order'] for r in records],
'extrapolate': [r['extrapolate'] for r in records],
'fit_error': [r['fit_error'] for r in records],
'pade_order': [r['pade_order'] for r in records] })

# pade_analysis = pd.concat(records)

# remove the index and duplicates

print ("Writing out Pade analysis... ")

Pade_analysis.to_csv(output_filename)

Pade_analysis = pd.read_csv(output_filename)

del Pade_analysis['Unnamed: 0']

Pade_analysis.drop_duplicates(inplace=True)

Pade_analysis.to_csv(output_filename)







13 changes: 13 additions & 0 deletions benchmark-view/ShinyApps/Numerical_Precs_Methods_Scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Sequence of data processing to widget activated analysis and plotting

* Pade.py run in crossfilter mode on main collecrtion

* Pade.py run in Pade extrapolate mode -> intermediate Pade_extrapolates.csv file

* create_precisions.py run to calculate numerical precisions from the Pade_extrapolates and Main collection files

* calculate power law fits with histogram_debug.py

* plot histogram for complete dataset with histogram_percs_plotter.py

* transform for creating k-points densoity choice recommendations
Loading

0 comments on commit bfaeaa3

Please sign in to comment.