Why are the cut-offs used for Bayes factors and p-values so different?When should I be worried about the Jeffreys-Lindley paradox in Bayesian model choice?Bayesian analysis and Lindley's paradox?Likelihood ratio vs Bayes FactorDo Bayes factors require multiple comparison correction?Why are 0.05 < p < 0.95 results called false positives?Is it possible to accept the alternative hypothesis?Marginal Likelihoods for Bayes Factors with Multiple Discrete HypothesisBayes factor (B) vs p-values: sensitive (H0/H1) vs insensitive dataThe true meaning/difference of alpha values and p-valuesWald test and LRT arriving at different conclusionsBayes Factors and Type I error RateBayes Factor for model and variable selection and type I & II error rate
A non-technological, repeating, visible object in the sky, holding its position in the sky for hours
How to interpret these acf and pacf plots?
Is creating your own "experiment" considered cheating during a physics exam?
Phrase for the opposite of "foolproof"
How to creep the reader out with what seems like a normal person?
If 1. e4 c6 is considered as a sound defense for black, why is 1. c3 so rare?
Past Perfect Tense
Were there two appearances of Stan Lee?
How to stop co-workers from teasing me because I know Russian?
Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?
Why do Ichisongas hate elephants and hippos?
Are Boeing 737-800’s grounded?
How do I tell my manager that he's wrong?
Do I have an "anti-research" personality?
Pressure to defend the relevance of one's area of mathematics
Weird result in complex limit
Pawn Sacrifice Justification
What does "rf" mean in "rfkill"?
How to pass attribute when redirecting from lwc to aura component
Why “le” behind?
Why do I get a BootstrapMethodError when trying to call a super class's protected method using method reference from an inner class?
How can I get precisely a certain cubic cm by changing the following factors?
In the time of the mishna, were there Jewish cities without courts?
What word means "to make something obsolete"?
Why are the cut-offs used for Bayes factors and p-values so different?
When should I be worried about the Jeffreys-Lindley paradox in Bayesian model choice?Bayesian analysis and Lindley's paradox?Likelihood ratio vs Bayes FactorDo Bayes factors require multiple comparison correction?Why are 0.05 < p < 0.95 results called false positives?Is it possible to accept the alternative hypothesis?Marginal Likelihoods for Bayes Factors with Multiple Discrete HypothesisBayes factor (B) vs p-values: sensitive (H0/H1) vs insensitive dataThe true meaning/difference of alpha values and p-valuesWald test and LRT arriving at different conclusionsBayes Factors and Type I error RateBayes Factor for model and variable selection and type I & II error rate
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.
However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.
So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.
hypothesis-testing bayesian p-value bayes-factors
$endgroup$
add a comment |
$begingroup$
I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.
However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.
So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.
hypothesis-testing bayesian p-value bayes-factors
$endgroup$
2
$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17
$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26
$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20
$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36
$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43
add a comment |
$begingroup$
I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.
However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.
So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.
hypothesis-testing bayesian p-value bayes-factors
$endgroup$
I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.
However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.
So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.
hypothesis-testing bayesian p-value bayes-factors
hypothesis-testing bayesian p-value bayes-factors
edited 11 hours ago
amoeba
62.8k15208269
62.8k15208269
asked Apr 25 at 3:42
rnsornso
4,119103169
4,119103169
2
$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17
$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26
$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20
$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36
$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43
add a comment |
2
$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17
$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26
$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20
$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36
$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43
2
2
$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17
$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17
$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26
$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26
$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20
$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20
$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36
$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36
$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43
$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
A few things:
The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."
These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.
"At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.
$endgroup$
2
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
1
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
2
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
1
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
1
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
|
show 6 more comments
$begingroup$
The Bayes factor $B_01$ can be turned into a probability under equal weights as
$$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since
$P_01$ is a probability in the parameter space, not in the sampling space- its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)
- both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space
If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
$$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
$$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
but this does not imply that the same "default" criteria for rejection and significance should apply to this object.
$endgroup$
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
The formula looks simpler asP = B/(B+1)
$endgroup$
– rnso
2 days ago
add a comment |
$begingroup$
Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.
As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404933%2fwhy-are-the-cut-offs-used-for-bayes-factors-and-p-values-so-different%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
A few things:
The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."
These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.
"At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.
$endgroup$
2
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
1
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
2
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
1
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
1
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
|
show 6 more comments
$begingroup$
A few things:
The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."
These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.
"At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.
$endgroup$
2
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
1
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
2
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
1
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
1
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
|
show 6 more comments
$begingroup$
A few things:
The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."
These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.
"At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.
$endgroup$
A few things:
The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."
These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.
"At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.
edited Apr 25 at 6:28
Xi'an
60k897370
60k897370
answered Apr 25 at 4:15
TaylorTaylor
12.9k22147
12.9k22147
2
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
1
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
2
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
1
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
1
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
|
show 6 more comments
2
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
1
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
2
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
1
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
1
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
2
2
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
$begingroup$
Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
$endgroup$
– Frans Rodenburg
Apr 25 at 7:27
1
1
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
$begingroup$
Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:33
2
2
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
$begingroup$
Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
$endgroup$
– Frans Rodenburg
Apr 25 at 7:44
1
1
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
$begingroup$
@benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
$endgroup$
– Xi'an
Apr 25 at 12:23
1
1
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
$begingroup$
@benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
$endgroup$
– JiK
Apr 25 at 13:50
|
show 6 more comments
$begingroup$
The Bayes factor $B_01$ can be turned into a probability under equal weights as
$$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since
$P_01$ is a probability in the parameter space, not in the sampling space- its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)
- both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space
If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
$$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
$$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
but this does not imply that the same "default" criteria for rejection and significance should apply to this object.
$endgroup$
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
The formula looks simpler asP = B/(B+1)
$endgroup$
– rnso
2 days ago
add a comment |
$begingroup$
The Bayes factor $B_01$ can be turned into a probability under equal weights as
$$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since
$P_01$ is a probability in the parameter space, not in the sampling space- its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)
- both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space
If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
$$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
$$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
but this does not imply that the same "default" criteria for rejection and significance should apply to this object.
$endgroup$
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
The formula looks simpler asP = B/(B+1)
$endgroup$
– rnso
2 days ago
add a comment |
$begingroup$
The Bayes factor $B_01$ can be turned into a probability under equal weights as
$$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since
$P_01$ is a probability in the parameter space, not in the sampling space- its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)
- both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space
If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
$$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
$$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
but this does not imply that the same "default" criteria for rejection and significance should apply to this object.
$endgroup$
The Bayes factor $B_01$ can be turned into a probability under equal weights as
$$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since
$P_01$ is a probability in the parameter space, not in the sampling space- its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)
- both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space
If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
$$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
$$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
but this does not imply that the same "default" criteria for rejection and significance should apply to this object.
edited Apr 25 at 7:53
answered Apr 25 at 6:40
Xi'anXi'an
60k897370
60k897370
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
The formula looks simpler asP = B/(B+1)
$endgroup$
– rnso
2 days ago
add a comment |
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
The formula looks simpler asP = B/(B+1)
$endgroup$
– rnso
2 days ago
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
$endgroup$
– rnso
Apr 26 at 7:44
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
$endgroup$
– Xi'an
2 days ago
$begingroup$
The formula looks simpler as
P = B/(B+1)
$endgroup$
– rnso
2 days ago
$begingroup$
The formula looks simpler as
P = B/(B+1)
$endgroup$
– rnso
2 days ago
add a comment |
$begingroup$
Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.
As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.
$endgroup$
add a comment |
$begingroup$
Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.
As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.
$endgroup$
add a comment |
$begingroup$
Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.
As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.
$endgroup$
Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.
As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.
answered Apr 25 at 19:21
JamieJamie
213
213
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404933%2fwhy-are-the-cut-offs-used-for-bayes-factors-and-p-values-so-different%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17
$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26
$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20
$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36
$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43