Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random start points option for profiled POIs #713

Merged
merged 41 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
45fc551
Add ability to try rand start pts for prof POIs
kmohrman Dec 3, 2021
3326195
Option for specifying number of random start points
kmohrman Dec 3, 2021
cf60fa4
If using pointsRandProf_, turn on saveInactivePOI_
kmohrman Dec 3, 2021
4c10a7c
Update nonstandard.md
nucleosynthesis May 31, 2022
018fe0e
Update nonstandard.md
nucleosynthesis May 31, 2022
eff8006
Update commonstatsmethods.md
nucleosynthesis Jun 10, 2022
032dc17
Update index.md
nucleosynthesis Jun 10, 2022
6ed9c9a
Update commonstatsmethods.md
nucleosynthesis Jun 14, 2022
8feba2d
Do not save nll if fit fails, also add ability to pass ranges for ran…
kmohrman Jul 5, 2022
2f19e02
Add option for skipping default start point
kmohrman Jul 5, 2022
e1f2b66
Save each point if best nll so far to avoid rerunning fit at end
kmohrman Jul 6, 2022
3177041
Pass rand seed for rand start points
kmohrman Jul 6, 2022
2c1e05b
Small cleanups
kmohrman Jul 8, 2022
4657a55
Update systematicsAnalyzer.py
nucleosynthesis Jul 21, 2022
a22bb47
Include ATLAS ref for Impacts
nucleosynthesis Nov 15, 2022
a17df94
Update commonstatsmethods.md
nucleosynthesis Nov 28, 2022
42c3c82
Update commonstatsmethods.md
nucleosynthesis Nov 28, 2022
1889f09
Update index.md
nucleosynthesis Dec 7, 2022
56c4471
Update runningthetool.md
nucleosynthesis Dec 7, 2022
1491724
Update runningthetool.md
nucleosynthesis Dec 7, 2022
ce5c9d2
Update runningthetool.md
nucleosynthesis Dec 13, 2022
b34142c
Update commonstatsmethods.md
nucleosynthesis Dec 22, 2022
abb232a
Allow ChannelCompatibilityCheck to skip channels with no signal templ…
Dominic-Stafford Jan 5, 2023
8cfc616
Merge pull request #805 from Dominic-Stafford/102x
ajgilbert Jan 12, 2023
0963769
random starting point implementation for 2d grid scan
abasnet97 Mar 3, 2023
58bbff0
Add ability to try rand start pts for prof POIs
kmohrman Dec 3, 2021
0af5422
Option for specifying number of random start points
kmohrman Dec 3, 2021
09ad136
If using pointsRandProf_, turn on saveInactivePOI_
kmohrman Dec 3, 2021
3360d4c
Do not save nll if fit fails, also add ability to pass ranges for ran…
kmohrman Jul 5, 2022
709c637
Add option for skipping default start point
kmohrman Jul 5, 2022
8e7e17d
Save each point if best nll so far to avoid rerunning fit at end
kmohrman Jul 6, 2022
e2fda6e
Pass rand seed for rand start points
kmohrman Jul 6, 2022
c0fcac9
Small cleanups
kmohrman Jul 8, 2022
2c31c72
random starting point implementation for 2d grid scan
abasnet97 Mar 3, 2023
b6b0125
Merge branch 'rand-start-pts-v2' of github.com:kmohrman/HiggsAnalysis…
abasnet97 Mar 5, 2023
1f7b676
created a separate class for random starting point implementation
abasnet97 Apr 10, 2023
98c9983
fixed merge conflicts
abasnet97 Jun 9, 2023
0e06eb3
Making changes to be able to compile with CMSSW_11_X plus some other …
abasnet97 Jan 2, 2024
1932f09
resolved merge conflicts after merging changes from main
abasnet97 Jan 8, 2024
2561169
edited behavior of random starting point range to not override POI ra…
abasnet97 Jan 24, 2024
25db367
Update MultiDimFit.cc
nucleosynthesis Jan 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions interface/MultiDimFit.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ class MultiDimFit : public FitterAlgoBase {
static std::string robustHesseLoad_;
static std::string robustHesseSave_;

static int pointsRandProf_;

static std::string saveSpecifiedFuncs_;
static std::string saveSpecifiedNuis_;
static std::string saveSpecifiedIndex_;
Expand Down
157 changes: 139 additions & 18 deletions src/MultiDimFit.cc
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ float MultiDimFit::centeredRange_ = -1.0;
bool MultiDimFit::robustHesse_ = false;
std::string MultiDimFit::robustHesseLoad_ = "";
std::string MultiDimFit::robustHesseSave_ = "";
int MultiDimFit::pointsRandProf_ = 0;


std::string MultiDimFit::saveSpecifiedFuncs_;
Expand Down Expand Up @@ -110,6 +111,7 @@ MultiDimFit::MultiDimFit() :
("robustHesse", boost::program_options::value<bool>(&robustHesse_)->default_value(robustHesse_), "Use a more robust calculation of the hessian/covariance matrix")
("robustHesseLoad", boost::program_options::value<std::string>(&robustHesseLoad_)->default_value(robustHesseLoad_), "Load the pre-calculated Hessian")
("robustHesseSave", boost::program_options::value<std::string>(&robustHesseSave_)->default_value(robustHesseSave_), "Save the calculated Hessian")
("pointsRandProf", boost::program_options::value<int>(&pointsRandProf_)->default_value(pointsRandProf_), "Number of random start points to try for the profiled POIs")
;
}

Expand Down Expand Up @@ -139,6 +141,11 @@ void MultiDimFit::applyOptions(const boost::program_options::variables_map &vm)
if (vm["floatOtherPOIs"].defaulted()) floatOtherPOIs_ = true;
if (vm["saveInactivePOI"].defaulted()) saveInactivePOI_ = true;
} else throw std::invalid_argument(std::string("Unknown algorithm: "+algo));
if (pointsRandProf_ > 0) {
// Probably not the best way of doing this
// But right now the pointsRandProf_ option relies on saveInactivePOI_ being true to include the profiled POIs in specifiedVars_
saveInactivePOI_ = true;
}
fastScan_ = (vm.count("fastScan") > 0);
squareDistPoiStep_ = (vm.count("squareDistPoiStep") > 0);
skipInitialFit_ = (vm.count("skipInitialFit") > 0);
Expand Down Expand Up @@ -186,6 +193,7 @@ bool MultiDimFit::runSpecific(RooWorkspace *w, RooStats::ModelConfig *mc_s, RooS
if (verbose <= 3) RooAbsReal::setEvalErrorLoggingMode(RooAbsReal::CountErrors);
bool doHesse = (algo_ == Singles || algo_ == Impact) || (saveFitResult_) ;
if ( !skipInitialFit_){
std::cout << "Doing initial fit: " << std::endl;
res.reset(doFit(pdf, data, (doHesse ? poiList_ : RooArgList()), constrainCmdArg, (saveFitResult_ && !robustHesse_), 1, true, false));
if (!res.get()) {
std::cout << "\n " <<std::endl;
Expand Down Expand Up @@ -606,7 +614,10 @@ void MultiDimFit::doGrid(RooWorkspace *w, RooAbsReal &nll)
//minim.setStrategy(minimizerStrategy_);
std::auto_ptr<RooArgSet> params(nll.getParameters((const RooArgSet *)0));
RooArgSet snap; params->snapshot(snap);
//snap.Print("V");
if (verbose > 1) {
std::cout << "Print snap: " << std::endl;
snap.Print("V");
}

// check if gridPoints are defined
std::vector<unsigned int> pointsPerPoi;
Expand All @@ -626,8 +637,12 @@ void MultiDimFit::doGrid(RooWorkspace *w, RooAbsReal &nll)
}

if (n == 1) {
if (verbose > 1) std::cout << "\nStarting n==1. The nll0 from initial fit: " << nll0 << std::endl;
unsigned int points = pointsPerPoi.size() == 0 ? points_ : pointsPerPoi[0];

// Set seed for random points
srand(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure right now if this could have other knock-on effects, but perhaps it would be better to do this only if pointsRandProf > 1 just to be safe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.


double xspacing = (pmax[0]-pmin[0]) / points;
double xspacingOffset = 0.5;
if (alignEdges_) {
Expand Down Expand Up @@ -665,42 +680,148 @@ void MultiDimFit::doGrid(RooWorkspace *w, RooAbsReal &nll)
}

//if (verbose > 1) std::cout << "Point " << i << "/" << points << " " << poiVars_[0]->GetName() << " = " << x << std::endl;
std::cout << "Point " << i << "/" << points << " " << poiVars_[0]->GetName() << " = " << x << std::endl;
std::cout << "Point " << i << "/" << points << " " << poiVars_[0]->GetName() << " = " << x << std::endl;
*params = snap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here to L788 or so I think this should also be wrapped in an if (pointsRandProf > 1) statement, to avoid this being invoked when the option is not explicitly called.

Here I think it would also good to print a warning / log message to state that this option is active (hopefully avoids people playing with random options and being surprised the fit is taking much longer than before)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a certain caveat to this suggestion. In our current implementation, the default behavior (i.e. just using the default starting point) is incorporated within the random starting point code block. If one wants to retrieve this behavior, one can either choose to set pointsRandProf to 0 or just entirely omit the option while running combine MultiDimFit. The reason why we chose to have the implementation set up this way is to avoid a lot of code repetition. In order to make sure that the users do not get confused about this, we have added some comment blocks in MultiDimFit method stating what I just explained above. Also, there are some printout code blocks that tell the user when pointsRandProf is set to a non-zero value (and also when the option is omitted). Let us know what you think about this approach.

poiVals_[0] = x;
poiVars_[0]->setVal(x);


///////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////// Loop over rand points for each profiled POI to get best nll ///////////////////

bool ok;
int n_prof_params = specifiedVars_.size();
std::vector<float> nll_at_alt_start_pts;

// Get vector of points to try
float prof_start_pt_range_max = 20.0; // Is this what we want?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a generic version of this algorithm, I think you'd want to potentially have a different range for each of the POIs, maybe the range can be read from the ws for each POI? (or the range set by --setParameterRanges)

std::vector<std::vector<float>> wc_vals_vec_of_vec = {};

// Append the defualt start pt to the list of points to try
std::vector<float> default_start_pt_vec;
for (int prof_param_idx=0; prof_param_idx<n_prof_params; prof_param_idx++) {
default_start_pt_vec.push_back(specifiedVars_[prof_param_idx]->getVal());
}
wc_vals_vec_of_vec.push_back(default_start_pt_vec);

// Append the random points to the vecotr of points to try
for (int pt_idx=0; pt_idx<pointsRandProf_; pt_idx++) {
std::vector<float> wc_vals_vec;
for (int prof_param_idx=0; prof_param_idx<n_prof_params; prof_param_idx++) {
// Get a random number in the range [-prof_start_pt_range_max,prof_start_pt_range_max]
float rand_num = (rand()*2.0*prof_start_pt_range_max)/RAND_MAX - prof_start_pt_range_max;
wc_vals_vec.push_back(rand_num);
}
wc_vals_vec_of_vec.push_back(wc_vals_vec);
}

// Print vector of points to try
if (verbose > 1) {
std::cout << "List of points to try:" << std::endl;
for (auto vals_vec: wc_vals_vec_of_vec) {
std::cout << "\tThe vals at this point: " << std::endl;
for (auto val: vals_vec) {
std::cout << "\t\tPoint val: " << val << std::endl;
}
}
}

// Loop over starting points to try for the prof WCs
for (unsigned int start_pt_idx=0; start_pt_idx<wc_vals_vec_of_vec.size(); start_pt_idx++){
*params = snap;
poiVals_[0] = x;
poiVars_[0]->setVal(x);

// Loop over prof POIs and set their values
if (verbose > 1) std::cout << "\n\tStart pt idx: " << start_pt_idx << std::endl;
for (unsigned int var_idx=0; var_idx<specifiedVars_.size(); var_idx++){
if (verbose > 1) std::cout << "\t\tThe var name: " << specifiedVars_[var_idx]->GetName() << std::endl;
if (strcmp(specifiedVars_[var_idx]->GetName(),"r")==0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be skipped in general?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of skipping "r" has been removed. We agreed that there is no need to skip this in general.

// Don't bother setting r to anything
if (verbose > 1) std::cout << "\t\t\tSkipping var " << specifiedVars_[var_idx]->GetName() << std::endl;
continue;
}
if (verbose > 1) std::cout << "\t\t\tRange before: " << specifiedVars_[var_idx]->getMin() << " " << specifiedVars_[var_idx]->getMax() << std::endl;
if (verbose > 1) std::cout << "\t\t\t" << specifiedVars_[var_idx]->GetName() << " before setting: " << specifiedVars_[var_idx]->getVal() << " += " << specifiedVars_[var_idx]->getError() << std::endl;
if (pointsRandProf_ > 0) {
specifiedVars_[var_idx]->removeRange(); // Do not impose a range if we're trying multiple random start points
}
specifiedVars_[var_idx]->setVal(wc_vals_vec_of_vec.at(start_pt_idx).at(var_idx));
if (verbose > 1) std::cout << "\t\t\tRange after: " << specifiedVars_[var_idx]->getMin() << " " << specifiedVars_[var_idx]->getMax() << std::endl;
if (verbose > 1) std::cout << "\t\t\t" << specifiedVars_[var_idx]->GetName() << " after setting: " << specifiedVars_[var_idx]->getVal() << " += " << specifiedVars_[var_idx]->getError() << std::endl;
}

// now we minimize
nll.clearEvalErrorLog();
deltaNLL_ = nll.getVal() - nll0;
if (nll.numEvalErrors() > 0) {
deltaNLL_ = 9990;
for(unsigned int j=0; j<specifiedNuis_.size(); j++){
specifiedVals_[j]=specifiedVars_[j]->getVal();
}
for(unsigned int j=0; j<specifiedFuncNames_.size(); j++){
specifiedFuncVals_[j]=specifiedFunc_[j]->getVal();
}
Combine::commitPoint(true, /*quantile=*/0);
continue;
}
ok = fastScan_ || (hasMaxDeltaNLLForProf_ && (nll.getVal() - nll0) > maxDeltaNLLForProf_) || utils::countFloating(*params)==0 ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just from looking at this I think this instance of ok is not used at all as L803 is always invoked afterwards, thus no need to set it here?

true :
minim.minimize(verbose-1);

if (verbose > 1) std::cout << "\t\tThe nll.getVal() is: " << nll.getVal() << std::endl;
nll_at_alt_start_pts.push_back(nll.getVal());

}

// Rerun the fit for the point with the min nll
int min_nll_idx;
min_nll_idx = std::min_element(nll_at_alt_start_pts.begin(),nll_at_alt_start_pts.end()) - nll_at_alt_start_pts.begin();
if (verbose > 1) std::cout << "\tMin nll at idx: " << min_nll_idx << " " << nll_at_alt_start_pts.at(min_nll_idx) << std::endl;
if (verbose > 1) std::cout << "\tResetting for rerunning at the point that gives best nll:" << std::endl;
*params = snap;
poiVals_[0] = x;
poiVars_[0]->setVal(x);
for (unsigned int var_idx=0; var_idx<specifiedVars_.size(); var_idx++){
if (verbose > 1) std::cout << "\t\tSetting " << specifiedVars_[var_idx]->GetName() << " to " << wc_vals_vec_of_vec.at(min_nll_idx).at(var_idx) << std::endl;
specifiedVars_[var_idx]->setVal(wc_vals_vec_of_vec.at(min_nll_idx).at(var_idx));
}
// now we minimize
nll.clearEvalErrorLog();
deltaNLL_ = nll.getVal() - nll0;
if (nll.numEvalErrors() > 0) {
deltaNLL_ = 9990;
for(unsigned int j=0; j<specifiedNuis_.size(); j++){
specifiedVals_[j]=specifiedVars_[j]->getVal();
}
for(unsigned int j=0; j<specifiedFuncNames_.size(); j++){
specifiedFuncVals_[j]=specifiedFunc_[j]->getVal();
}
for(unsigned int j=0; j<specifiedNuis_.size(); j++){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore original spacing?

specifiedVals_[j]=specifiedVars_[j]->getVal();
}
for(unsigned int j=0; j<specifiedFuncNames_.size(); j++){
specifiedFuncVals_[j]=specifiedFunc_[j]->getVal();
}
Combine::commitPoint(true, /*quantile=*/0);
continue;
}
bool ok = fastScan_ || (hasMaxDeltaNLLForProf_ && (nll.getVal() - nll0) > maxDeltaNLLForProf_) || utils::countFloating(*params)==0 ?
ok = fastScan_ || (hasMaxDeltaNLLForProf_ && (nll.getVal() - nll0) > maxDeltaNLLForProf_) || utils::countFloating(*params)==0 ?
true :
minim.minimize(verbose-1);

///////////////////////////////////////////////////////////////////////////////////////////////////

if (ok) {
deltaNLL_ = nll.getVal() - nll0;
double qN = 2*(deltaNLL_);
double prob = ROOT::Math::chisquared_cdf_c(qN, n+nOtherFloatingPoi_);
for(unsigned int j=0; j<specifiedNuis_.size(); j++){
specifiedVals_[j]=specifiedVars_[j]->getVal();
}
for(unsigned int j=0; j<specifiedFuncNames_.size(); j++){
specifiedFuncVals_[j]=specifiedFunc_[j]->getVal();
}
for(unsigned int j=0; j<specifiedCatNames_.size(); j++){
specifiedCatVals_[j]=specifiedCat_[j]->getIndex();
}
for(unsigned int j=0; j<specifiedNuis_.size(); j++){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore original spacing?

specifiedVals_[j]=specifiedVars_[j]->getVal();
}
for(unsigned int j=0; j<specifiedFuncNames_.size(); j++){
specifiedFuncVals_[j]=specifiedFunc_[j]->getVal();
}
for(unsigned int j=0; j<specifiedCatNames_.size(); j++){
specifiedCatVals_[j]=specifiedCat_[j]->getIndex();
}
Combine::commitPoint(true, /*quantile=*/prob);
}

}
} else if (n == 2) {
RooAbsReal::setEvalErrorLoggingMode(RooAbsReal::CountErrors);
Expand Down