Why are the cut-offs used for Bayes factors and p-values so different?When should I be worried about the Jeffreys-Lindley paradox in Bayesian model choice?Bayesian analysis and Lindley's paradox?Likelihood ratio vs Bayes FactorDo Bayes factors require multiple comparison correction?Why are 0.05 < p < 0.95 results called false positives?Is it possible to accept the alternative hypothesis?Marginal Likelihoods for Bayes Factors with Multiple Discrete HypothesisBayes factor (B) vs p-values: sensitive (H0/H1) vs insensitive dataThe true meaning/difference of alpha values and p-valuesWald test and LRT arriving at different conclusionsBayes Factors and Type I error RateBayes Factor for model and variable selection and type I & II error rate

A non-technological, repeating, visible object in the sky, holding its position in the sky for hours

How to interpret these acf and pacf plots?

Is creating your own "experiment" considered cheating during a physics exam?

Phrase for the opposite of "foolproof"

How to creep the reader out with what seems like a normal person?

If 1. e4 c6 is considered as a sound defense for black, why is 1. c3 so rare?

Past Perfect Tense

Were there two appearances of Stan Lee?

How to stop co-workers from teasing me because I know Russian?

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

Why do Ichisongas hate elephants and hippos?

Are Boeing 737-800’s grounded?

How do I tell my manager that he's wrong?

Do I have an "anti-research" personality?

Pressure to defend the relevance of one's area of mathematics

Weird result in complex limit

Pawn Sacrifice Justification

What does "rf" mean in "rfkill"?

How to pass attribute when redirecting from lwc to aura component

Why “le” behind?

Why do I get a BootstrapMethodError when trying to call a super class's protected method using method reference from an inner class?

How can I get precisely a certain cubic cm by changing the following factors?

In the time of the mishna, were there Jewish cities without courts?

What word means "to make something obsolete"?



Why are the cut-offs used for Bayes factors and p-values so different?


When should I be worried about the Jeffreys-Lindley paradox in Bayesian model choice?Bayesian analysis and Lindley's paradox?Likelihood ratio vs Bayes FactorDo Bayes factors require multiple comparison correction?Why are 0.05 < p < 0.95 results called false positives?Is it possible to accept the alternative hypothesis?Marginal Likelihoods for Bayes Factors with Multiple Discrete HypothesisBayes factor (B) vs p-values: sensitive (H0/H1) vs insensitive dataThe true meaning/difference of alpha values and p-valuesWald test and LRT arriving at different conclusionsBayes Factors and Type I error RateBayes Factor for model and variable selection and type I & II error rate






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








12












$begingroup$


I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.



However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.



So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.










share|cite|improve this question











$endgroup$







  • 2




    $begingroup$
    I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
    $endgroup$
    – Henry
    Apr 25 at 10:17










  • $begingroup$
    If we do not have any particular prior information, then what can we say about meaning of BF?
    $endgroup$
    – rnso
    Apr 25 at 11:26










  • $begingroup$
    Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
    $endgroup$
    – dnqxt
    Apr 25 at 18:20










  • $begingroup$
    In this case will BF of 5 indicate one hypothesis to be 5x more likely?
    $endgroup$
    – rnso
    Apr 25 at 18:36










  • $begingroup$
    Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
    $endgroup$
    – dnqxt
    Apr 25 at 18:43

















12












$begingroup$


I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.



However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.



So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.










share|cite|improve this question











$endgroup$







  • 2




    $begingroup$
    I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
    $endgroup$
    – Henry
    Apr 25 at 10:17










  • $begingroup$
    If we do not have any particular prior information, then what can we say about meaning of BF?
    $endgroup$
    – rnso
    Apr 25 at 11:26










  • $begingroup$
    Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
    $endgroup$
    – dnqxt
    Apr 25 at 18:20










  • $begingroup$
    In this case will BF of 5 indicate one hypothesis to be 5x more likely?
    $endgroup$
    – rnso
    Apr 25 at 18:36










  • $begingroup$
    Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
    $endgroup$
    – dnqxt
    Apr 25 at 18:43













12












12








12


4



$begingroup$


I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.



However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.



So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.










share|cite|improve this question











$endgroup$




I am trying to understand Bayes Factor (BF). I believe they are like likelihood ratio of 2 hypotheses. So if BF is 5, it means H1 is 5 times more likely than H0. And value of 3-10 indicates moderate evidence, while >10 indicates strong evidence.



However, for P-value, traditionally 0.05 is taken as cut-off. At this P value, H1/H0 likelihood ratio should be about 95/5 or 19.



So why a cut-off of >3 is taken for BF while a cut-off of >19 is taken for P values? These values are not anywhere close either.







hypothesis-testing bayesian p-value bayes-factors






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 11 hours ago









amoeba

62.8k15208269




62.8k15208269










asked Apr 25 at 3:42









rnsornso

4,119103169




4,119103169







  • 2




    $begingroup$
    I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
    $endgroup$
    – Henry
    Apr 25 at 10:17










  • $begingroup$
    If we do not have any particular prior information, then what can we say about meaning of BF?
    $endgroup$
    – rnso
    Apr 25 at 11:26










  • $begingroup$
    Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
    $endgroup$
    – dnqxt
    Apr 25 at 18:20










  • $begingroup$
    In this case will BF of 5 indicate one hypothesis to be 5x more likely?
    $endgroup$
    – rnso
    Apr 25 at 18:36










  • $begingroup$
    Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
    $endgroup$
    – dnqxt
    Apr 25 at 18:43












  • 2




    $begingroup$
    I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
    $endgroup$
    – Henry
    Apr 25 at 10:17










  • $begingroup$
    If we do not have any particular prior information, then what can we say about meaning of BF?
    $endgroup$
    – rnso
    Apr 25 at 11:26










  • $begingroup$
    Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
    $endgroup$
    – dnqxt
    Apr 25 at 18:20










  • $begingroup$
    In this case will BF of 5 indicate one hypothesis to be 5x more likely?
    $endgroup$
    – rnso
    Apr 25 at 18:36










  • $begingroup$
    Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
    $endgroup$
    – dnqxt
    Apr 25 at 18:43







2




2




$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17




$begingroup$
I am uncomfortable with saying "if BF is $5$, it means $H_1$ is $5$ times more likely than $H_0$". The Bayes factor may be a marginal likelihood ratio, but it is not a probability ratio or odds ratio, and needs to be combined with a prior to be useful
$endgroup$
– Henry
Apr 25 at 10:17












$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26




$begingroup$
If we do not have any particular prior information, then what can we say about meaning of BF?
$endgroup$
– rnso
Apr 25 at 11:26












$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20




$begingroup$
Certainly, one has "some" prior information even if saying that there is no any particular prior info. Namely, in that case it's reasonable to assign equal probabilities to each hypothesis according to the principle of indifference. That is a simple example of a so called non-informative prior (admittedly a misnomer).
$endgroup$
– dnqxt
Apr 25 at 18:20












$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36




$begingroup$
In this case will BF of 5 indicate one hypothesis to be 5x more likely?
$endgroup$
– rnso
Apr 25 at 18:36












$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43




$begingroup$
Yes, but this problem is much more complicated than it might seem and goes into the area of model selection in statistics. You've been warned :))
$endgroup$
– dnqxt
Apr 25 at 18:43










3 Answers
3






active

oldest

votes


















10












$begingroup$

A few things:



The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."



These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.



"At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.






share|cite|improve this answer











$endgroup$








  • 2




    $begingroup$
    Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:27







  • 1




    $begingroup$
    Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:33






  • 2




    $begingroup$
    Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:44






  • 1




    $begingroup$
    @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
    $endgroup$
    – Xi'an
    Apr 25 at 12:23






  • 1




    $begingroup$
    @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
    $endgroup$
    – JiK
    Apr 25 at 13:50


















8












$begingroup$

The Bayes factor $B_01$ can be turned into a probability under equal weights as
$$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since




  1. $P_01$ is a probability in the parameter space, not in the sampling space

  2. its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)

  3. both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space

If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
$$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
$$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
but this does not imply that the same "default" criteria for rejection and significance should apply to this object.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
    $endgroup$
    – rnso
    Apr 26 at 7:44











  • $begingroup$
    Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
    $endgroup$
    – Xi'an
    2 days ago










  • $begingroup$
    The formula looks simpler as P = B/(B+1)
    $endgroup$
    – rnso
    2 days ago



















2












$begingroup$

Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.



As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.






share|cite|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404933%2fwhy-are-the-cut-offs-used-for-bayes-factors-and-p-values-so-different%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    10












    $begingroup$

    A few things:



    The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."



    These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.



    "At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.






    share|cite|improve this answer











    $endgroup$








    • 2




      $begingroup$
      Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:27







    • 1




      $begingroup$
      Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:33






    • 2




      $begingroup$
      Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:44






    • 1




      $begingroup$
      @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
      $endgroup$
      – Xi'an
      Apr 25 at 12:23






    • 1




      $begingroup$
      @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
      $endgroup$
      – JiK
      Apr 25 at 13:50















    10












    $begingroup$

    A few things:



    The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."



    These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.



    "At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.






    share|cite|improve this answer











    $endgroup$








    • 2




      $begingroup$
      Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:27







    • 1




      $begingroup$
      Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:33






    • 2




      $begingroup$
      Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:44






    • 1




      $begingroup$
      @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
      $endgroup$
      – Xi'an
      Apr 25 at 12:23






    • 1




      $begingroup$
      @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
      $endgroup$
      – JiK
      Apr 25 at 13:50













    10












    10








    10





    $begingroup$

    A few things:



    The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."



    These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.



    "At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.






    share|cite|improve this answer











    $endgroup$



    A few things:



    The BF gives you evidence in favor of a hypothesis, while a frequentist hypothesis test gives you evidence against a (null) hypothesis. So it's kind of "apples to oranges."



    These two procedures, despite the difference in interpretations, may lead to different decisions. For example, a BF might reject while a frequentist hypothesis test doesn't, or vice versa. This problem is often referred to as the Jeffreys-Lindley's paradox. There have been many posts on this site about this; see e.g. here, and here.



    "At this P value, H1/H0 likelihood should be 95/5 or 19." No, this isn't true because, roughly $p(y mid H_1) neq 1- p(y mid H_0)$. Computing a p-value and performing a frequentist test, at a minimum, does not require you to have any idea about $p(y mid H_1)$. Also, p-values are often integrals/sums of densities/pmfs, while a BF doesn't integrate over the data sample space.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Apr 25 at 6:28









    Xi'an

    60k897370




    60k897370










    answered Apr 25 at 4:15









    TaylorTaylor

    12.9k22147




    12.9k22147







    • 2




      $begingroup$
      Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:27







    • 1




      $begingroup$
      Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:33






    • 2




      $begingroup$
      Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:44






    • 1




      $begingroup$
      @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
      $endgroup$
      – Xi'an
      Apr 25 at 12:23






    • 1




      $begingroup$
      @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
      $endgroup$
      – JiK
      Apr 25 at 13:50












    • 2




      $begingroup$
      Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:27







    • 1




      $begingroup$
      Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:33






    • 2




      $begingroup$
      Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
      $endgroup$
      – Frans Rodenburg
      Apr 25 at 7:44






    • 1




      $begingroup$
      @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
      $endgroup$
      – Xi'an
      Apr 25 at 12:23






    • 1




      $begingroup$
      @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
      $endgroup$
      – JiK
      Apr 25 at 13:50







    2




    2




    $begingroup$
    Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:27





    $begingroup$
    Taylor is saying the threshold for evidence against one hypothesis ($textH_0$) can't be directly compared to the threshold of evidence for another hypothesis ($textH_1$), also not approximately. When you stop believing in a null-effect need not relate to when you start believing in an alternative. This is exactly why the $p$-value shouldn't be interpreted as $1 - (textbelief in H_1)$
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:27





    1




    1




    $begingroup$
    Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:33




    $begingroup$
    Maybe this can be clarifying: en.wikipedia.org/wiki/Misunderstandings_of_p-values The frequentist $p$-value is not a measure of evidence for anything.
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:33




    2




    2




    $begingroup$
    Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:44




    $begingroup$
    Sorry, last comment: The reason you can't see it as evidence in favor of $textH_1$ is that it is the chance of observing this large an effect size if $textH_0$ were true. If $textH_0$ is indeed true, the $p$-value should be uniformly random, so its value has no meaning on the probability of $textH_1$. This subtlety in interpretation is by the way one of the reasons $p$-values see so much misuse.
    $endgroup$
    – Frans Rodenburg
    Apr 25 at 7:44




    1




    1




    $begingroup$
    @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
    $endgroup$
    – Xi'an
    Apr 25 at 12:23




    $begingroup$
    @benxyzzy: the distribution of a $p$-value is only uniform under the null hypothesis, not under the alternative where it is heavily skewed towards zero.
    $endgroup$
    – Xi'an
    Apr 25 at 12:23




    1




    1




    $begingroup$
    @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
    $endgroup$
    – JiK
    Apr 25 at 13:50




    $begingroup$
    @benxyzzy To add to others: The point of using a $p$-value is that under null hypothesis it is uniformly random, so if you get a very small $p$-value, it hints that maybe it wasn't uniformly random so maybe the null hypothesis wasn't true.
    $endgroup$
    – JiK
    Apr 25 at 13:50













    8












    $begingroup$

    The Bayes factor $B_01$ can be turned into a probability under equal weights as
    $$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since




    1. $P_01$ is a probability in the parameter space, not in the sampling space

    2. its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)

    3. both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space

    If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
    $$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
    where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
    $$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
    but this does not imply that the same "default" criteria for rejection and significance should apply to this object.






    share|cite|improve this answer











    $endgroup$












    • $begingroup$
      Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
      $endgroup$
      – rnso
      Apr 26 at 7:44











    • $begingroup$
      Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
      $endgroup$
      – Xi'an
      2 days ago










    • $begingroup$
      The formula looks simpler as P = B/(B+1)
      $endgroup$
      – rnso
      2 days ago
















    8












    $begingroup$

    The Bayes factor $B_01$ can be turned into a probability under equal weights as
    $$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since




    1. $P_01$ is a probability in the parameter space, not in the sampling space

    2. its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)

    3. both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space

    If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
    $$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
    where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
    $$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
    but this does not imply that the same "default" criteria for rejection and significance should apply to this object.






    share|cite|improve this answer











    $endgroup$












    • $begingroup$
      Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
      $endgroup$
      – rnso
      Apr 26 at 7:44











    • $begingroup$
      Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
      $endgroup$
      – Xi'an
      2 days ago










    • $begingroup$
      The formula looks simpler as P = B/(B+1)
      $endgroup$
      – rnso
      2 days ago














    8












    8








    8





    $begingroup$

    The Bayes factor $B_01$ can be turned into a probability under equal weights as
    $$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since




    1. $P_01$ is a probability in the parameter space, not in the sampling space

    2. its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)

    3. both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space

    If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
    $$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
    where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
    $$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
    but this does not imply that the same "default" criteria for rejection and significance should apply to this object.






    share|cite|improve this answer











    $endgroup$



    The Bayes factor $B_01$ can be turned into a probability under equal weights as
    $$P_01=frac11+frac1large B_01$$but this does not make them comparable with a $p$-value since




    1. $P_01$ is a probability in the parameter space, not in the sampling space

    2. its value and range depend on the choice of the prior measure, they are thus relative rather than absolute (and Taylor's mention of the Lindley-Jeffreys paradox is appropriate at this stage)

    3. both $B_01$ and $P_01$ contain a penalty for complexity (Occam's razor) by integrating out over the parameter space

    If you wish to consider a Bayesian equivalent to the $p$-value, the posterior predictive $p$-value (Meng, 1994) should be investigated
    $$Q_01=mathbb P(B_01(X)le B_01(x^textobs))$$
    where $x^textobs$ denotes the observation and $X$ is distributed from the posterior predictive
    $$Xsim int_Theta f(x|theta) pi(theta|x^textobs),textdtheta$$
    but this does not imply that the same "default" criteria for rejection and significance should apply to this object.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Apr 25 at 7:53

























    answered Apr 25 at 6:40









    Xi'anXi'an

    60k897370




    60k897370











    • $begingroup$
      Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
      $endgroup$
      – rnso
      Apr 26 at 7:44











    • $begingroup$
      Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
      $endgroup$
      – Xi'an
      2 days ago










    • $begingroup$
      The formula looks simpler as P = B/(B+1)
      $endgroup$
      – rnso
      2 days ago

















    • $begingroup$
      Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
      $endgroup$
      – rnso
      Apr 26 at 7:44











    • $begingroup$
      Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
      $endgroup$
      – Xi'an
      2 days ago










    • $begingroup$
      The formula looks simpler as P = B/(B+1)
      $endgroup$
      – rnso
      2 days ago
















    $begingroup$
    Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
    $endgroup$
    – rnso
    Apr 26 at 7:44





    $begingroup$
    Using your formula, P for BF of 3 and 10 come out to be 0.75 and 0.91, respectively. Why should we accept these as moderate evidence since for P value we keep cut-off of 0.95 ?
    $endgroup$
    – rnso
    Apr 26 at 7:44













    $begingroup$
    Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
    $endgroup$
    – Xi'an
    2 days ago




    $begingroup$
    Why is $0.95$ relevant in this framework? or at all? Deciding when large is large enough depends on your utility function.
    $endgroup$
    – Xi'an
    2 days ago












    $begingroup$
    The formula looks simpler as P = B/(B+1)
    $endgroup$
    – rnso
    2 days ago





    $begingroup$
    The formula looks simpler as P = B/(B+1)
    $endgroup$
    – rnso
    2 days ago












    2












    $begingroup$

    Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.



    As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.






    share|cite|improve this answer









    $endgroup$

















      2












      $begingroup$

      Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.



      As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.






      share|cite|improve this answer









      $endgroup$















        2












        2








        2





        $begingroup$

        Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.



        As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.






        share|cite|improve this answer









        $endgroup$



        Some of your confusion might stem from taking the number 95/5 directly from the fact that the p value is 0.05 - is this what you are doing? I do not believe this is correct. The p value for a t-test, for example, reflects the chance of getting the observed difference between means or a more extreme difference if the null hypothesis is in fact true. If you get a p value of 0.02, you say 'ah, there is only a 2% chance of getting a difference like this, or a greater difference, if the null is true. That seems very improbable, so I propose that the null is not true!'. These numbers are just not the same thing that goes into the Bayes factor, which is the ratio of the posterior probabilities given to each competing hypothesis. These posterior probabilities are not computed in the same way as the p-value, and so thinking of 95/5 as being like posterior probabilities that would give a BF of 19 is not correct.



        As a side note, I would suggest strongly guarding against thinking of different BF values as meaning particular things. These assignments are completely arbitrary, just like the .05 significance level. Problems such as p-hacking will occur just as readily with Bayes Factors if people start to believe that only particular numbers warrant consideration. Try to understand them for what they are, which are something like relative probabilities, and use your own sense to determine whether you find a BF number convincing evidence or not.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Apr 25 at 19:21









        JamieJamie

        213




        213



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404933%2fwhy-are-the-cut-offs-used-for-bayes-factors-and-p-values-so-different%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Sum ergo cogito? 1 nng

            三茅街道4182Guuntc Dn precexpngmageondP