Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discrete choice modeling blogpost #11

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

drbenvincent
Copy link
Contributor

@drbenvincent drbenvincent commented Oct 31, 2024

This PR adds a notebooks which will form the second Colgate client write-up blog posts.

The first post was Causal sales analytics: Are my sales incremental or cannibalistic?

NOTE: I'll be pretty aggressive about hiding most of the code cells in the final blogpost in order to maximise readability.

Current state: At this point (2024/10/31) I've basically written the first half of the blog post. It outlines the basic discrete choice model and sets up the core limitation of producing uninteresting cannibalization effects.

TODO

  • We might want to play with the random seed to get the synthetic data nice
  • We might also want to tweak the synthetic price data to allow for better parameter identiability
  • Potentially add a manufacturer (or benefit) effect to really show the lack of interesting cannibalization effects.
  • I'm hoping that either @ricardoV94 or @lucianopaz or @cluhmann will take over the reigns and continue the blog post to talk about the core innovations of what we did. We are allowed to talk about the maths of the nested logit, but we're not allowed to present code to implement it.
  • Hoping someone can write a nice overview of the cool new stuff that was done. I'll then come back in and wrap it up with the executive summary at the start and a conclusion summary at the end.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ricardoV94
Copy link

ricardoV94 commented Nov 18, 2024

@drbenvincent I'm leaving some comments from your part of the blogpost, before I start on the second part:

  1. A single intercept doesn't make sense. I assume you meant B_0_^i (one intercept per item), as you have in the model description below?
  2. In the model description the line u_{i,t} = a = b, dosen't make math sense. You introduce a log of the price, which is clearly not equivalent to the original expression on price. Also the brackets are strange. You are not multiplying the intercept by the price, I assume?
  3. Multinomial model is very confident because total_sales is high. We obviously had more noise in our data, and that's why we had to switch the the DirichletMultinomial model instead, as the Multinomial cannot expalin so large errors with these high total sales. May be worth mentioning?
  4. "So this is all great, but it's the kind of output that data scientists would enjoy." Is there irony in this sentence or missing a "not"?
  5. What-if scenario. Needs a bit more text explaining what results we can see from the 5 plots?
  6. I don't like the plot showing the market share before and after as the distance from the x=y line. This will never show anything interesting unless you remove an item that has a sizeable portion of the market-share (which you would never do anyway). It's also mostly wasted white space on the plot. I would rather show the ratio of market share before or after as a plot-bar, which will clearly show everything going up by the same %. Conversely: imagine you remove an item with a 1% market share and another item takes all this market share (very interesting perfect cannibalization), it would go from x to x+1%, which would still look super boring on the plot you defined. The plot is not good to show what you want.
  7. Prior for intercept should be zerosum, otherwise there's one too many parameters.

Obligatory message: I think overall the blog is in a pretty nice shape!

@ricardoV94
Copy link

I'm going to push a second NB that uses pre-generated data according to the NLM. I think this will streamline the blogpost, showing where it fails and why the NLM can address it. Not changing the original NB so we can compare, because git changes suck for NBs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants