-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PanelOLS: Produced different Std. errors from Stata when clustered by the same variable in linearmodels #477
Comments
You need to change the |
Hi bashtage, thanks for your quick reply. I follow that. For example,
The standard errors from the If I change it to Could you please help me with it? How can I get both robust- and clustered-results? I appreciate your time. |
What is formula? The definition of robust depends on whether entity effects are included. Clustered std errors are robust to heteroskedasticity. |
Thanks for that. |
What Stata command are you using? |
|
What you have looks correct to me
You could also use
and the results should be the same. |
Thank you Kevin, for all your efforts. I still got different results. I noticed that the std. error from Stata is robust (pls have a look at the scrnshot below). Sorry for the mess data shown below, they are shown in different orders. Would you please look at the last line of pic 1 (black background) and the first line of pic 2 (white background). They are the same variable. |
What happends when you take the ratio of the parameter variance from stata to that from linear models? Stata has a log of magic small sample adjustments it makes. If this ratio is the same for all parameters, this indicates that it is a scalar adjustment. I did a quick check and this looks to be the issue. What does changing the value of |
I just did a check, and most of the std. errors from Sata is 70% of the std. errors from linearmodels. For the |
How large are your clusters and how many do you have? The ratio of the
variances looks a lot like 2. If use use xtreg in Stata, do you get the
same as reg? IME the adjustments can differ across the different estimators
in Stata.
…On Wed, Oct 12, 2022, 19:35 yuz0101 ***@***.***> wrote:
I just did a check, and most of the std. errors from Sata is 70% of the
std. errors from linearmodels. For the debiased, I did the check and
found nothing affecting the results.
—
Reply to this email directly, view it on GitHub
<#477 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKTSRNET7M7XEYRXJPVKB3WC4AH3ANCNFSM6AAAAAARDOA3WE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Kevin, you are right. Massive thanks for your time. The std, errors in my model are clustered by only one dummy variable. However, I tried that with xtreg and found that it produced errors in xtreg, mainly due to the usage of clusters. Then I tried to cluster the id, which completely worked in both linearmodels and Stata (using a command of Q: Is it reasonable to cluster the std. errors with a dummy variable rather than a variable consisting of a number of clusters? |
Hi Kevin, adding my issue here because it is most likely related. By the way, thanks for all your work on linearmodels (and the awesome documentation!), it was the missing piece to my Python workflow. I'm preparing notes for my students in which I was hoping to replicate Mitch Petersen's results using his sample dataset, which he obtained with Stata using the following command: For reference, his Stata code is here, and the sample data and associated estimations are here. His sample dataset contains 500 firms but only 10 years, so that dimension is small. I'm able to replicate his results using statsmodels, but not with linearmodels. More specifically, I get the same results when clustering by firm, but not when clustering by year or by both firm and year. Are you aware of a small sample adjustment that is implemented in statsmodels but not in linearmodels? Here is the code to replicate the difference: import pandas as pd
import statsmodels.formula.api as smf
from linearmodels import PanelOLS
df = pd.read_table(
"http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt",
names=["firmid", "year", "x", "y"],
delim_whitespace=True,
)
smf.ols(formula="y ~ x", data=df).fit(
cov_type="cluster", cov_kwds={"groups": df["year"]}, use_t=True
).summary() With statsmodels:
df = df.set_index(["firmid", "year"])
mod = PanelOLS.from_formula("y ~ 1 + x", df)
res = mod.fit(cov_type="clustered", cluster_entity=False, cluster_time=True)
res.summary With linearmodels:
|
You probably need mod = PanelOLS.from_formula("y ~ 1 + x", df)
res = mod.fit(cov_type="clustered", cluster_entity=False, cluster_time=True, group_debias=True)
res.summary
|
Thanks, that's what I was missing! I get the same results now. |
Hi there, it seems that we can only use one cov_type. While comparing results based on a model clustered by a variable between linearmodels and Stata, I found that Stata's std. errors are robust adjusted while linearmodels are not (All coefficients are the same).
Please may I know how I can get the same results as Stata by using linearmodels?
The text was updated successfully, but these errors were encountered: