Lesson10 - Difference between the OLS results and sns.lmplot graph #745

wz2000xx · 2023-11-20T00:34:52Z

Hi,

In lesson 10, we fitted the relationship between child mortality and children per woman and income group, and get the following results.

In my understanding, for example, for the Lower-middle income group, the intercept should be -22.8577 + 15.7559 = -7.1018, however, the graph created using sns.lmplot shows an intercept of around 6 for that income group as shown below.

I don't understand why this is the case. This problem also exists for other income groups and categorical variables, such as species in the iris dataset.

Thank you very much!

RayNele · 2023-11-22T23:50:44Z

pretty sure the coefficient column in the table is the slope, not the intercept.

it's only the intercept for the row labelled intercept.

source

wz2000xx · 2023-11-23T00:04:12Z

pretty sure the coefficient column in the table is the slope, not the intercept.

it's only the intercept for the row labelled intercept.

source

Yes, but if I understand this correctly, the calculation for the child mortality would be:

child mortality = 16.3017*(children per woman) + 22.2875*(income group - low) + 15.7559*(income group - lower middle) - 5.337*(income group - upper middle) - 22.8577

and if we are calculating for the Lower-middle income group, for example, then the calculation above becomes:

child mortality = 16.3017*(children per woman) + 22.2875*(0) + 15.7559*(1) - 5.337*(0) - 22.8577
child mortality = 16.3017*(children per woman) + 15.7559 - 22.8577
child mortality = 16.3017*(children per woman) - 7.1018

and the intercept becomes -7.1018, which is not the case in the plot. Also since the slope would all become 16.3017 in the front, why the lines in the plot do not have the same slope?

RayNele · 2023-11-23T02:12:44Z

Before I answer, I'm going to rebuttal with the fact that + 22.2875*(income group - low) doesn't make any sense because income group is a categorical variable and low is one of its categories, so 'income group - low' is not a continous variable with which you can assign numbers, such as 0. i.e. what does income group - low = 0 mean to you?

I'm also going to disclaimer with the fact that stats is way out of field for me, but this is my understanding:

child_mortality = (lower-middle coef)(children_per_woman) - (intercept)?(children_per_woman_coef)
child_mortality(lower-middle) = 15.75(children_per_woman) - ?

I put a question mark there because I'm going to admit that I don't know exactly how this intercept value is calculated. So to show you that at least the claim about the slop I made is true, I plotted two lines on the lmplot facegrit using the map_dataframe function

fig.map_dataframe(lambda data, **kws: plt.axline((0, 0), slope=15.75,  color='green'))
fig.map_dataframe(lambda data, **kws: plt.axline((0, 0), slope=22.28, color='blue'))

which you will be able to see below:

you can see that the lines I plotted, blue for low, and green for lower-middle, are both parallel to their respective lines.

again, I'm going to admit I am unsure, and this is what I came up with from my existing understanding, I'm happy to hear any corrections though. As for the intercept, I found evidence that there may be some estimations going on as to what their intercept value should be based on those two numbers, but I'm not sure enough to make claims about that

wz2000xx · 2023-11-23T03:31:30Z

Hi, thank you for your explanation.

However, I think I have some other opinions. I understand that "income group - low" is a categorical variable and is not continuous, I am just saying that if the sample comes from the "low" group, then the "income group - low" variable would be 1, and otherwise 0. So the calculation for the specific term "22.2875*(income group - low)" would be 22.2875 for samples in the "low" group, or it will just be zero.

I said that the values are calculated as

child mortality = 16.3017*(children per woman) + 22.2875*(income group - low) + 15.7559*(income group - lower middle) - 5.337*(income group - upper middle) - 22.8577

with rather high certainty because I tried to re-calculate the fitted values in the above way and got the identical results successfully. I did that by first fitting the model and retrieve the parameters:

resuls = smf.ols('child_mortality ~ children_per_woman + income_group', world_data_2014).fit()
parameters = resuls.params

and subset the dataset to only include the data from lower-middle group, and calculated using the equation I assumed above and compare it to the fitted values of the ols results:

W2014_subset = world_data_2014[world_data_2014.income_group == "Lower middle"]
LW_child_mor = W2014_subset["children_per_woman"]*parameters[4]+parameters[2]*1+parameters[0]
LW_fitted = resuls.fittedvalues[world_data_2014.income_group == "Lower middle"]

compare_table = pd.DataFrame({'My_cal': LW_child_mor, 'Fitted Values': LW_fitted}
compare_table.head(10)

and got the following results:

Same for the low group:

W2014_subset2 = world_data_2014[world_data_2014.income_group == "Low"]
LW_child_mor2 = W2014_subset2["children_per_woman"]*parameters[4]+parameters[1]*1+parameters[0]
LW_fitted2 = resuls.fittedvalues[world_data_2014.income_group == "Low"]
compare_table = pd.DataFrame({'My_cal': LW_child_mor2, 'Fitted Values': LW_fitted2})
compare_table.head(10)

and got the following results:

Therefore, I think my assumed calculation equation is correct. And if that's the case, I don't understand why the graph plotted using sns.Implot doesn't match up with the slope and intercept in the ols results. Maybe it's because they are using different underlying fitting mechanisms?

Lastly, I used the markup function in Macbooks and drew lines to cover the original blue and green lines on the graph plotted using sns.Implot, and move the line to match the lines you made, I don't think the are really parallel, as shown below:

original plot:

green line:

blue line:

RayNele · 2023-11-23T04:06:09Z

haha yeah you got me there. I'm not sure why lmplot deviates from your calculations. Maybe the OLS fit is different from the one used in lmplot. tried to find documentation on this but came up blank. great question! maybe someone more experienced has a better answer.

…

On Wed, Nov 22, 2023, 10:31 PM Amber ***@***.***> wrote: Hi, thank you for your explanation. However, I think I have some other opinions. I understand that "income group - low" is a categorical variable and is not continuous, I am just saying that if the sample comes from the "low" group, then the "income group - low" variable would be 1, and otherwise 0. So the calculation for the specific term "22.2875*(income group - low)" would be 22.2875 for samples in the "low" group, or it will just be zero. I said that the values are calculated as child mortality = 16.3017*(children per woman) + 22.2875*(income group - low) + 15.7559*(income group - lower middle) - 5.337*(income group - upper middle) - 22.8577 with rather high certainty because I tried to re-calculate the fitted values in the above way and got the identical results successfully. I did that by first fitting the model and retrieve the parameters: resuls = smf.ols('child_mortality ~ children_per_woman + income_group', world_data_2014).fit() parameters = resuls.params and subset the dataset to only include the data from lower-middle group, and calculated using the equation I assumed above and compare it to the fitted values of the ols results: W2014_subset = world_data_2014[world_data_2014.income_group == "Lower middle"] LW_child_mor = W2014_subset["children_per_woman"]*parameters[4]+parameters[2]*1+parameters[0] LW_fitted = resuls.fittedvalues[world_data_2014.income_group == "Lower middle"] compare_table = pd.DataFrame({'My_cal': LW_child_mor, 'Fitted Values': LW_fitted} compare_table.head(10) and got the following results: [image: image] <https://user-images.githubusercontent.com/145036115/285092140-bbfbf854-b683-44d5-b2d4-3dbf31a29f80.png> Same for the low group: W2014_subset2 = world_data_2014[world_data_2014.income_group == "Low"] LW_child_mor2 = W2014_subset2["children_per_woman"]*parameters[4]+parameters[1]*1+parameters[0] LW_fitted2 = resuls.fittedvalues[world_data_2014.income_group == "Low"] compare_table = pd.DataFrame({'My_cal': LW_child_mor2, 'Fitted Values': LW_fitted2}) compare_table.head(10) and got the following results: [image: image] <https://user-images.githubusercontent.com/145036115/285093021-3a4c76e0-f44e-42b8-af35-b38f7f031577.png> Therefore, I think my assumed calculation equation is correct. And if that's the case, I don't understand why the graph plotted using sns.Implot doesn't match up with the slope and intercept in the ols results. Maybe it's because they are using different underlying fitting mechanisms? Lastly, I used the markup function in Macbooks and drew lines to cover the original blue and green lines on the graph plotted using sns.Implot, and move the line to match the lines you made, I don't think the are really parallel, as shown below: original plot: [image: image] <https://user-images.githubusercontent.com/145036115/285093459-8336946e-a9b5-4c0b-823d-c935d93f04e2.png> green line: [image: image] <https://user-images.githubusercontent.com/145036115/285093518-eac2e00c-e0a8-461c-9c02-c327c2e4c3dc.png> blue line: [image: image] <https://user-images.githubusercontent.com/145036115/285093584-3d13f350-c321-432d-a201-e2cf612c682e.png> — Reply to this email directly, view it on GitHub <#745 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJLCXQGUNSCTK4XU73D275TYF27R3AVCNFSM6AAAAAA7SDVK7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRTG44DCNJTGU> . You are receiving this because you commented.Message ID: ***@***.***>

RayNele · 2023-11-23T15:55:15Z

Ya unfortunately there's no way to access the underlying stats.

However, your suggested formula still doesn't really make sense to me since as you mentioned, it would result in all four lines having the same slope. I'm not saying you're wrong, since you've checked your work and it seems correct, but I just can't understand how the mortality-child# relationship could be the same across the four conditions.

wz2000xx · 2023-11-27T00:03:27Z

Hi, sorry I forgot to reply, I was reading it on my phone on my way to a lecture....Sorry!

Yes, I understand what you said. I will try to figure it out and do more research. Thank you anyway and have a great night!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lesson10 - Difference between the OLS results and sns.lmplot graph #745

Lesson10 - Difference between the OLS results and sns.lmplot graph #745

wz2000xx commented Nov 20, 2023

RayNele commented Nov 22, 2023 •

edited

Loading

wz2000xx commented Nov 23, 2023

RayNele commented Nov 23, 2023

wz2000xx commented Nov 23, 2023

RayNele commented Nov 23, 2023 via email

RayNele commented Nov 23, 2023

wz2000xx commented Nov 27, 2023

Lesson10 - Difference between the OLS results and sns.lmplot graph #745

Lesson10 - Difference between the OLS results and sns.lmplot graph #745

Comments

wz2000xx commented Nov 20, 2023

RayNele commented Nov 22, 2023 • edited Loading

wz2000xx commented Nov 23, 2023

RayNele commented Nov 23, 2023

wz2000xx commented Nov 23, 2023

RayNele commented Nov 23, 2023 via email

RayNele commented Nov 23, 2023

wz2000xx commented Nov 27, 2023

RayNele commented Nov 22, 2023 •

edited

Loading