Skip to content

Latest commit

 

History

History
101 lines (94 loc) · 4.9 KB

tmva.md

File metadata and controls

101 lines (94 loc) · 4.9 KB

Using TMVA

Ostap hosts couple of classes, that simplifies the training and using of TMVA.

Training TMVA

tSignal  = ... ## signal     TTree/TChain
tBkg     = ... ## background TTree/TChain
## book TMVA trainer     
from Ostap.PyTMVA import Trainer 
trainer = Trainer (
   name    = 'TestTMVA' ,   
   methods = [
   # type                   name   configuration
   ( ROOT.TMVA.Types.kMLP        , 'MLP'        , 'H:!V:EstimatorType=CE:VarTransform=N:NCycles=200:HiddenLayers=N+3:TestRate=5:!UseRegulator' ) ,
   ( ROOT.TMVA.Types.kBDT        , 'BDTG'       , 'H:!V:NTrees=100:MinNodeSize=2.5%:BoostType=Grad:Shrinkage=0.10:UseBaggedBoost:BaggedSampleFraction=0.5:nCuts=20:MaxDepth=2' ) , 
   ( ROOT.TMVA.Types.kCuts       , 'Cuts'       , 'H:!V:FitMethod=MC:EffSel:SampleSize=200000:VarProp=FSmart' ) ,
   ( ROOT.TMVA.Types.kFisher     , 'Fisher'     , 'H:!V:Fisher:VarTransform=None:CreateMVAPdfs:PDFInterpolMVAPdf=Spline2:NbinsMVAPdf=50:NsmoothMVAPdf=10' ),
   ( ROOT.TMVA.Types.kLikelihood , 'Likelihood' , 'H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50' )
   ] ,
   variables  = [ 'var1' , 'var2' ,  'var3' ] , ## Variables to be used for training 
   signal     = tSignal                       , ## ``Signal'' sample 
   background = tBkg                          , ## ``Background'' sample  
   verbose    = False )

Optionally one can specify also signal_cuts and background_cuts.

Traing TMVA itself is trivial, one needs to invoke the method train:

weights_files = trainer.train ()

It returs the list/tuple of weight-XML-files, the output of TMVA trainer. Optionally one can retrieve also the list of _C++-class-files, using the proeprty class_files or everything together in a form of tar-file using the property tar_file:

weight_files = trainer.weight_files ## XML weights  
class_files  = trainer.class_files  ## C++ classes 
tar_file     = trainer.tar_file     ## everything together

Using TMVA

To use trained TMVA one exploits TMVA reader:

from Ostap.PyTMVA import Reader
reader = Reader( 
   'MyMLP' ,
   variables     = [ ('var1' , lambda s : s.var1 )   ,
                     ('var2' , lambda s : s.var2 )   ,
                     ('var3' , lambda s : s.var3 ) ] ,
  weights_files = tar_file  )

{% discussion "What is lambda s : s.var1 here?" %} The the element of the pair is, obviously, the variable name. The second argument is accessor function. It will be applied for 1-argument call of the method. E.g. in this example, one can apply it to TTree/TChain/RooAbsData/RooArgSet and the variable var1 from this TTree/TChain/RooAbsData/RooArgSet will be used as 'var1' for the TMVA. Accessor functions coudl be trivial, as on this case, but they also can be less trivial:

variables     = [ ('var1' , lambda s : s.rapidity     )   , ## use another name 
                  ('var2' , lambda s : s.pt/1000      )   , ## make some rescaling
                  ('var3' , lambda s : atan2(s.y,s.x) ) ] , ## make more complicated calculations

If one wants to use other objects for 1-argument call, other set of accessor functions need to be supplied. E.g. if data are expected to be supplied as a tuple/list/std::vector<...>, one can use

variables     = [ ('var1' , lambda s : s[0] )   , ## use another name 
                  ('var2' , lambda s : s[1] )   , ## make some rescaling
                  ('var3' , lambda s : a[2] ) ] , ## make more complicated calculations

One can also use just the plain list of variable names:

variables     = [ 'var1' , 'var2' , 'var3' ] 

This list will be automatically transformed into

variables     = [ ('var1' , lambda s : getattr( s , 'var1') ) ,
                  ('var2' , lambda s : getattr( s , 'var2') ) ,
                  ('var3' , lambda s : getattr( s , 'var3') ) ]

{% enddiscussion %}

As weight_files arguments one can use either the list of weights-files from the trainer, or, much easier, use the single 'tar'-file from the trainer. The methods, available from the weight files can be checked as

print reader.methods 

And the usage of the reader is rather trivial, e.g. one can explicitly request the responce for certain set of arguments:

v1,  v2,  v3 = .... 
mlp  = reader['MLP']                     ## get  one method 
print 'MLP value is %s'  % mlp ( v1 , v2 ,  v3 ) 

In practice, one practially always uses it with TTree/TChain/RooAbsData/RooArgSet, in this case one use 1-argument call, assuming then proper accessor functions are supplied:

tree = ... ##  the tree 
mlp  = reader['MLP']                     ## get  one method
for i in tree :                          ## loop over the entries 
    print 'MLP value is %s'  % mlp ( i ) ## get the  value