Statistical analysis applied to methods coming out of Machine Learning [on hold]What are the main theorems in Machine (Deep) Learning?Book for broad and conceptual overview of statistical methodsIncremental learning methods in RWhat are basic differences between Kernel Approaches to Unsupervised and Supervised Machine LearningWhat methods are used to solving regression problems in Machine Learning (like GLMs)?How can machine learning models (GBM, NN etc.) be used for survival analysis?Why do Statistics, Machine learning and Operations research stand out as separate entitiesDifference between supervised machine learning and design of experiments?Linear Model Diagnostics via Machine LearningCategorical Data Analysis or ML methods?Machine learning to detect wear on a machine axis

Do I need to watch Ant-Man and the Wasp and Captain Marvel before watching Avengers: Endgame?

A ​Note ​on ​N!

Can someone publish a story that happened to you?

As an international instructor, should I openly talk about my accent?

"The cow" OR "a cow" OR "cows" in this context

Why do distances seem to matter in the Foundation world?

Are there moral objections to a life motivated purely by money? How to sway a person from this lifestyle?

How bug prioritization works in agile projects vs non agile

What was Apollo 13's "Little Jolt" after MECO?

Does a large simulator bay have standard public address announcements?

Work requires me to come in early to start computer but wont let me clock in to get paid for it

Combinatorics problem, right solution?

Why do games have consumables?

Can a stored procedure reference the database in which it is stored?

std::unique_ptr of base class holding reference of derived class does not show warning in gcc compiler while naked pointer shows it. Why?

How to have a sharp product image?

Was Dennis Ritchie being too modest in this quote about C and Pascal?

Multiple fireplaces in an apartment building?

Who's the random kid standing in the gathering at the end?

Cayley's Matrix Notation

Check if a string is entirely made of the same substring

Older movie/show about humans on derelict alien warship which refuels by passing through a star

Is it acceptable to use working hours to read general interest books?

Why did Rep. Omar conclude her criticism of US troops with the phrase "NotTodaySatan"?



Statistical analysis applied to methods coming out of Machine Learning [on hold]


What are the main theorems in Machine (Deep) Learning?Book for broad and conceptual overview of statistical methodsIncremental learning methods in RWhat are basic differences between Kernel Approaches to Unsupervised and Supervised Machine LearningWhat methods are used to solving regression problems in Machine Learning (like GLMs)?How can machine learning models (GBM, NN etc.) be used for survival analysis?Why do Statistics, Machine learning and Operations research stand out as separate entitiesDifference between supervised machine learning and design of experiments?Linear Model Diagnostics via Machine LearningCategorical Data Analysis or ML methods?Machine learning to detect wear on a machine axis






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








5












$begingroup$


Most of the recent famous methods coming out of the machine learning, are supervised learning methods like Decision Trees, Random Forests, Deep Learning, SVMs.



The more traditional supervised learning methods, like linear and logistic regression, with or without regularization, have had a long history of analysis of their nuances (eg assumptions for reliable use like normality, confidence intervals, hypothesis tests, optimal estimators).



Though the traditional stats models and the more modern ML ones come out of different disciplines (for statistics theoretically associated with mathematics departments and practically agronomy, medicine, social science, and econometrics, and for machine learning out of computer science with applications in vision, NLP, and AI), they have the same ends.



It seems like the ML models, as wildly successful as they seem, also seem to have very little theoretical support.



In contrast, linear regression can have a p-value analysis of each variable, F-test for the entire fit, has (the classic five assumptions). I've never seen such analysis of the more complicated ML tests.



There doesn't seem to be a treatment of machine learning models with the rigor of analysis of the statistical models. http://www.fharrell.com/post/stat-ml/



Is there any attempt to apply classic statistical analysis techniques to assessing the newer ML regression models?










share|cite|improve this question











$endgroup$



put on hold as unclear what you're asking by Sycorax, Michael Chernick, usεr11852, Frans Rodenburg, COOLSerdash Apr 22 at 8:18


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.













  • 1




    $begingroup$
    There are four standard regression assumptions. The author mentions no or low multicollinearity which is not an assumption for regression, although people commonly and incorrectly say it is an assumption. I would re evaluate that reference.
    $endgroup$
    – LSC
    Apr 21 at 17:12






  • 1




    $begingroup$
    I am not well-versed to answer that but in contrast to Asymptotic theory that usually pertains statistics I would say that ML is attacking a lot of its methods through generalisation bounds.
    $endgroup$
    – usεr11852
    Apr 21 at 17:23










  • $begingroup$
    I think I've added some detail to address the close voters. Surely part of the cause of lack of clarity on my part is lack of knowledge. My motivation for this question is that I feel like success in the complicated methods of ML is offset by lack of statistical rigor (and dually the great rigor in statistics is offset by lack of progress in new more successful methods). Or is it just that historically, methods were devised first and justifications and analysis came way later, and that is just as much the case for Random Forests now as it was for Logistic Regression in the thirties?
    $endgroup$
    – Mitch
    Apr 21 at 18:39






  • 1




    $begingroup$
    I've read your edit and comments but it's not clear what properties of machine learning you want proved. It seems you wish to reason by analogy about $p$-values in some manner. But what would that mean for a random forest model? A $p$-value for regression coefs tests the hypothesis that the coefficient is not statistically different from zero. Random forests don't estimate a coefficient for each variable, and it's not clear what hypothesis you are interested in testing. "Machine learning" usually cares more about making good predictions, which is why @usεr11852 mentions generalization bounds.
    $endgroup$
    – Sycorax
    Apr 21 at 18:51










  • $begingroup$
    We have this related thread, which might be of interest stats.stackexchange.com/questions/321851/…
    $endgroup$
    – Sycorax
    yesterday

















5












$begingroup$


Most of the recent famous methods coming out of the machine learning, are supervised learning methods like Decision Trees, Random Forests, Deep Learning, SVMs.



The more traditional supervised learning methods, like linear and logistic regression, with or without regularization, have had a long history of analysis of their nuances (eg assumptions for reliable use like normality, confidence intervals, hypothesis tests, optimal estimators).



Though the traditional stats models and the more modern ML ones come out of different disciplines (for statistics theoretically associated with mathematics departments and practically agronomy, medicine, social science, and econometrics, and for machine learning out of computer science with applications in vision, NLP, and AI), they have the same ends.



It seems like the ML models, as wildly successful as they seem, also seem to have very little theoretical support.



In contrast, linear regression can have a p-value analysis of each variable, F-test for the entire fit, has (the classic five assumptions). I've never seen such analysis of the more complicated ML tests.



There doesn't seem to be a treatment of machine learning models with the rigor of analysis of the statistical models. http://www.fharrell.com/post/stat-ml/



Is there any attempt to apply classic statistical analysis techniques to assessing the newer ML regression models?










share|cite|improve this question











$endgroup$



put on hold as unclear what you're asking by Sycorax, Michael Chernick, usεr11852, Frans Rodenburg, COOLSerdash Apr 22 at 8:18


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.













  • 1




    $begingroup$
    There are four standard regression assumptions. The author mentions no or low multicollinearity which is not an assumption for regression, although people commonly and incorrectly say it is an assumption. I would re evaluate that reference.
    $endgroup$
    – LSC
    Apr 21 at 17:12






  • 1




    $begingroup$
    I am not well-versed to answer that but in contrast to Asymptotic theory that usually pertains statistics I would say that ML is attacking a lot of its methods through generalisation bounds.
    $endgroup$
    – usεr11852
    Apr 21 at 17:23










  • $begingroup$
    I think I've added some detail to address the close voters. Surely part of the cause of lack of clarity on my part is lack of knowledge. My motivation for this question is that I feel like success in the complicated methods of ML is offset by lack of statistical rigor (and dually the great rigor in statistics is offset by lack of progress in new more successful methods). Or is it just that historically, methods were devised first and justifications and analysis came way later, and that is just as much the case for Random Forests now as it was for Logistic Regression in the thirties?
    $endgroup$
    – Mitch
    Apr 21 at 18:39






  • 1




    $begingroup$
    I've read your edit and comments but it's not clear what properties of machine learning you want proved. It seems you wish to reason by analogy about $p$-values in some manner. But what would that mean for a random forest model? A $p$-value for regression coefs tests the hypothesis that the coefficient is not statistically different from zero. Random forests don't estimate a coefficient for each variable, and it's not clear what hypothesis you are interested in testing. "Machine learning" usually cares more about making good predictions, which is why @usεr11852 mentions generalization bounds.
    $endgroup$
    – Sycorax
    Apr 21 at 18:51










  • $begingroup$
    We have this related thread, which might be of interest stats.stackexchange.com/questions/321851/…
    $endgroup$
    – Sycorax
    yesterday













5












5








5


3



$begingroup$


Most of the recent famous methods coming out of the machine learning, are supervised learning methods like Decision Trees, Random Forests, Deep Learning, SVMs.



The more traditional supervised learning methods, like linear and logistic regression, with or without regularization, have had a long history of analysis of their nuances (eg assumptions for reliable use like normality, confidence intervals, hypothesis tests, optimal estimators).



Though the traditional stats models and the more modern ML ones come out of different disciplines (for statistics theoretically associated with mathematics departments and practically agronomy, medicine, social science, and econometrics, and for machine learning out of computer science with applications in vision, NLP, and AI), they have the same ends.



It seems like the ML models, as wildly successful as they seem, also seem to have very little theoretical support.



In contrast, linear regression can have a p-value analysis of each variable, F-test for the entire fit, has (the classic five assumptions). I've never seen such analysis of the more complicated ML tests.



There doesn't seem to be a treatment of machine learning models with the rigor of analysis of the statistical models. http://www.fharrell.com/post/stat-ml/



Is there any attempt to apply classic statistical analysis techniques to assessing the newer ML regression models?










share|cite|improve this question











$endgroup$




Most of the recent famous methods coming out of the machine learning, are supervised learning methods like Decision Trees, Random Forests, Deep Learning, SVMs.



The more traditional supervised learning methods, like linear and logistic regression, with or without regularization, have had a long history of analysis of their nuances (eg assumptions for reliable use like normality, confidence intervals, hypothesis tests, optimal estimators).



Though the traditional stats models and the more modern ML ones come out of different disciplines (for statistics theoretically associated with mathematics departments and practically agronomy, medicine, social science, and econometrics, and for machine learning out of computer science with applications in vision, NLP, and AI), they have the same ends.



It seems like the ML models, as wildly successful as they seem, also seem to have very little theoretical support.



In contrast, linear regression can have a p-value analysis of each variable, F-test for the entire fit, has (the classic five assumptions). I've never seen such analysis of the more complicated ML tests.



There doesn't seem to be a treatment of machine learning models with the rigor of analysis of the statistical models. http://www.fharrell.com/post/stat-ml/



Is there any attempt to apply classic statistical analysis techniques to assessing the newer ML regression models?







regression machine-learning references






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Apr 21 at 18:28







Mitch

















asked Apr 21 at 16:26









MitchMitch

80911127




80911127




put on hold as unclear what you're asking by Sycorax, Michael Chernick, usεr11852, Frans Rodenburg, COOLSerdash Apr 22 at 8:18


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









put on hold as unclear what you're asking by Sycorax, Michael Chernick, usεr11852, Frans Rodenburg, COOLSerdash Apr 22 at 8:18


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.









  • 1




    $begingroup$
    There are four standard regression assumptions. The author mentions no or low multicollinearity which is not an assumption for regression, although people commonly and incorrectly say it is an assumption. I would re evaluate that reference.
    $endgroup$
    – LSC
    Apr 21 at 17:12






  • 1




    $begingroup$
    I am not well-versed to answer that but in contrast to Asymptotic theory that usually pertains statistics I would say that ML is attacking a lot of its methods through generalisation bounds.
    $endgroup$
    – usεr11852
    Apr 21 at 17:23










  • $begingroup$
    I think I've added some detail to address the close voters. Surely part of the cause of lack of clarity on my part is lack of knowledge. My motivation for this question is that I feel like success in the complicated methods of ML is offset by lack of statistical rigor (and dually the great rigor in statistics is offset by lack of progress in new more successful methods). Or is it just that historically, methods were devised first and justifications and analysis came way later, and that is just as much the case for Random Forests now as it was for Logistic Regression in the thirties?
    $endgroup$
    – Mitch
    Apr 21 at 18:39






  • 1




    $begingroup$
    I've read your edit and comments but it's not clear what properties of machine learning you want proved. It seems you wish to reason by analogy about $p$-values in some manner. But what would that mean for a random forest model? A $p$-value for regression coefs tests the hypothesis that the coefficient is not statistically different from zero. Random forests don't estimate a coefficient for each variable, and it's not clear what hypothesis you are interested in testing. "Machine learning" usually cares more about making good predictions, which is why @usεr11852 mentions generalization bounds.
    $endgroup$
    – Sycorax
    Apr 21 at 18:51










  • $begingroup$
    We have this related thread, which might be of interest stats.stackexchange.com/questions/321851/…
    $endgroup$
    – Sycorax
    yesterday












  • 1




    $begingroup$
    There are four standard regression assumptions. The author mentions no or low multicollinearity which is not an assumption for regression, although people commonly and incorrectly say it is an assumption. I would re evaluate that reference.
    $endgroup$
    – LSC
    Apr 21 at 17:12






  • 1




    $begingroup$
    I am not well-versed to answer that but in contrast to Asymptotic theory that usually pertains statistics I would say that ML is attacking a lot of its methods through generalisation bounds.
    $endgroup$
    – usεr11852
    Apr 21 at 17:23










  • $begingroup$
    I think I've added some detail to address the close voters. Surely part of the cause of lack of clarity on my part is lack of knowledge. My motivation for this question is that I feel like success in the complicated methods of ML is offset by lack of statistical rigor (and dually the great rigor in statistics is offset by lack of progress in new more successful methods). Or is it just that historically, methods were devised first and justifications and analysis came way later, and that is just as much the case for Random Forests now as it was for Logistic Regression in the thirties?
    $endgroup$
    – Mitch
    Apr 21 at 18:39






  • 1




    $begingroup$
    I've read your edit and comments but it's not clear what properties of machine learning you want proved. It seems you wish to reason by analogy about $p$-values in some manner. But what would that mean for a random forest model? A $p$-value for regression coefs tests the hypothesis that the coefficient is not statistically different from zero. Random forests don't estimate a coefficient for each variable, and it's not clear what hypothesis you are interested in testing. "Machine learning" usually cares more about making good predictions, which is why @usεr11852 mentions generalization bounds.
    $endgroup$
    – Sycorax
    Apr 21 at 18:51










  • $begingroup$
    We have this related thread, which might be of interest stats.stackexchange.com/questions/321851/…
    $endgroup$
    – Sycorax
    yesterday







1




1




$begingroup$
There are four standard regression assumptions. The author mentions no or low multicollinearity which is not an assumption for regression, although people commonly and incorrectly say it is an assumption. I would re evaluate that reference.
$endgroup$
– LSC
Apr 21 at 17:12




$begingroup$
There are four standard regression assumptions. The author mentions no or low multicollinearity which is not an assumption for regression, although people commonly and incorrectly say it is an assumption. I would re evaluate that reference.
$endgroup$
– LSC
Apr 21 at 17:12




1




1




$begingroup$
I am not well-versed to answer that but in contrast to Asymptotic theory that usually pertains statistics I would say that ML is attacking a lot of its methods through generalisation bounds.
$endgroup$
– usεr11852
Apr 21 at 17:23




$begingroup$
I am not well-versed to answer that but in contrast to Asymptotic theory that usually pertains statistics I would say that ML is attacking a lot of its methods through generalisation bounds.
$endgroup$
– usεr11852
Apr 21 at 17:23












$begingroup$
I think I've added some detail to address the close voters. Surely part of the cause of lack of clarity on my part is lack of knowledge. My motivation for this question is that I feel like success in the complicated methods of ML is offset by lack of statistical rigor (and dually the great rigor in statistics is offset by lack of progress in new more successful methods). Or is it just that historically, methods were devised first and justifications and analysis came way later, and that is just as much the case for Random Forests now as it was for Logistic Regression in the thirties?
$endgroup$
– Mitch
Apr 21 at 18:39




$begingroup$
I think I've added some detail to address the close voters. Surely part of the cause of lack of clarity on my part is lack of knowledge. My motivation for this question is that I feel like success in the complicated methods of ML is offset by lack of statistical rigor (and dually the great rigor in statistics is offset by lack of progress in new more successful methods). Or is it just that historically, methods were devised first and justifications and analysis came way later, and that is just as much the case for Random Forests now as it was for Logistic Regression in the thirties?
$endgroup$
– Mitch
Apr 21 at 18:39




1




1




$begingroup$
I've read your edit and comments but it's not clear what properties of machine learning you want proved. It seems you wish to reason by analogy about $p$-values in some manner. But what would that mean for a random forest model? A $p$-value for regression coefs tests the hypothesis that the coefficient is not statistically different from zero. Random forests don't estimate a coefficient for each variable, and it's not clear what hypothesis you are interested in testing. "Machine learning" usually cares more about making good predictions, which is why @usεr11852 mentions generalization bounds.
$endgroup$
– Sycorax
Apr 21 at 18:51




$begingroup$
I've read your edit and comments but it's not clear what properties of machine learning you want proved. It seems you wish to reason by analogy about $p$-values in some manner. But what would that mean for a random forest model? A $p$-value for regression coefs tests the hypothesis that the coefficient is not statistically different from zero. Random forests don't estimate a coefficient for each variable, and it's not clear what hypothesis you are interested in testing. "Machine learning" usually cares more about making good predictions, which is why @usεr11852 mentions generalization bounds.
$endgroup$
– Sycorax
Apr 21 at 18:51












$begingroup$
We have this related thread, which might be of interest stats.stackexchange.com/questions/321851/…
$endgroup$
– Sycorax
yesterday




$begingroup$
We have this related thread, which might be of interest stats.stackexchange.com/questions/321851/…
$endgroup$
– Sycorax
yesterday










1 Answer
1






active

oldest

votes


















5












$begingroup$

I guess the main part of an answer depends on what, precisely, you mean by "classical statistical analysis" but if we interpret it broadly to mean applying theorems and results from probability and statistics, then we can come up with a good bibliography.



Three references off the top of my head:



  • Hastie et al. Elements of Statistical Learning

  • Bishop Pattern Recognition and Machine Learning

  • Murphy Machine Learning: A Probabilistic Perspective


Aside: It's worth remarking that the difference between machine learning and statistics has more to do with marketing rather than any underlying mathematical principles.



For example, random forests were first proposed by Leo Breiman, who was a statistics professor at University of California, Berkeley.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
    $endgroup$
    – Mitch
    Apr 21 at 17:00






  • 1




    $begingroup$
    Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
    $endgroup$
    – Sycorax
    Apr 21 at 17:02







  • 1




    $begingroup$
    To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
    $endgroup$
    – Sycorax
    Apr 21 at 17:12







  • 2




    $begingroup$
    This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
    $endgroup$
    – LSC
    Apr 21 at 17:18

















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









5












$begingroup$

I guess the main part of an answer depends on what, precisely, you mean by "classical statistical analysis" but if we interpret it broadly to mean applying theorems and results from probability and statistics, then we can come up with a good bibliography.



Three references off the top of my head:



  • Hastie et al. Elements of Statistical Learning

  • Bishop Pattern Recognition and Machine Learning

  • Murphy Machine Learning: A Probabilistic Perspective


Aside: It's worth remarking that the difference between machine learning and statistics has more to do with marketing rather than any underlying mathematical principles.



For example, random forests were first proposed by Leo Breiman, who was a statistics professor at University of California, Berkeley.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
    $endgroup$
    – Mitch
    Apr 21 at 17:00






  • 1




    $begingroup$
    Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
    $endgroup$
    – Sycorax
    Apr 21 at 17:02







  • 1




    $begingroup$
    To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
    $endgroup$
    – Sycorax
    Apr 21 at 17:12







  • 2




    $begingroup$
    This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
    $endgroup$
    – LSC
    Apr 21 at 17:18















5












$begingroup$

I guess the main part of an answer depends on what, precisely, you mean by "classical statistical analysis" but if we interpret it broadly to mean applying theorems and results from probability and statistics, then we can come up with a good bibliography.



Three references off the top of my head:



  • Hastie et al. Elements of Statistical Learning

  • Bishop Pattern Recognition and Machine Learning

  • Murphy Machine Learning: A Probabilistic Perspective


Aside: It's worth remarking that the difference between machine learning and statistics has more to do with marketing rather than any underlying mathematical principles.



For example, random forests were first proposed by Leo Breiman, who was a statistics professor at University of California, Berkeley.






share|cite|improve this answer











$endgroup$












  • $begingroup$
    Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
    $endgroup$
    – Mitch
    Apr 21 at 17:00






  • 1




    $begingroup$
    Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
    $endgroup$
    – Sycorax
    Apr 21 at 17:02







  • 1




    $begingroup$
    To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
    $endgroup$
    – Sycorax
    Apr 21 at 17:12







  • 2




    $begingroup$
    This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
    $endgroup$
    – LSC
    Apr 21 at 17:18













5












5








5





$begingroup$

I guess the main part of an answer depends on what, precisely, you mean by "classical statistical analysis" but if we interpret it broadly to mean applying theorems and results from probability and statistics, then we can come up with a good bibliography.



Three references off the top of my head:



  • Hastie et al. Elements of Statistical Learning

  • Bishop Pattern Recognition and Machine Learning

  • Murphy Machine Learning: A Probabilistic Perspective


Aside: It's worth remarking that the difference between machine learning and statistics has more to do with marketing rather than any underlying mathematical principles.



For example, random forests were first proposed by Leo Breiman, who was a statistics professor at University of California, Berkeley.






share|cite|improve this answer











$endgroup$



I guess the main part of an answer depends on what, precisely, you mean by "classical statistical analysis" but if we interpret it broadly to mean applying theorems and results from probability and statistics, then we can come up with a good bibliography.



Three references off the top of my head:



  • Hastie et al. Elements of Statistical Learning

  • Bishop Pattern Recognition and Machine Learning

  • Murphy Machine Learning: A Probabilistic Perspective


Aside: It's worth remarking that the difference between machine learning and statistics has more to do with marketing rather than any underlying mathematical principles.



For example, random forests were first proposed by Leo Breiman, who was a statistics professor at University of California, Berkeley.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Apr 21 at 17:20

























answered Apr 21 at 16:53









SycoraxSycorax

43.1k12112208




43.1k12112208











  • $begingroup$
    Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
    $endgroup$
    – Mitch
    Apr 21 at 17:00






  • 1




    $begingroup$
    Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
    $endgroup$
    – Sycorax
    Apr 21 at 17:02







  • 1




    $begingroup$
    To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
    $endgroup$
    – Sycorax
    Apr 21 at 17:12







  • 2




    $begingroup$
    This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
    $endgroup$
    – LSC
    Apr 21 at 17:18
















  • $begingroup$
    Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
    $endgroup$
    – Mitch
    Apr 21 at 17:00






  • 1




    $begingroup$
    Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
    $endgroup$
    – Sycorax
    Apr 21 at 17:02







  • 1




    $begingroup$
    To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
    $endgroup$
    – Sycorax
    Apr 21 at 17:12







  • 2




    $begingroup$
    This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
    $endgroup$
    – LSC
    Apr 21 at 17:18















$begingroup$
Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
$endgroup$
– Mitch
Apr 21 at 17:00




$begingroup$
Thank you for those references. I am well aware of the first two. I suppose I wasn't clear about what I don't think has been done for DL, SVM, RF, for example what kind of distribution assumptions are necessary for robust results.
$endgroup$
– Mitch
Apr 21 at 17:00




1




1




$begingroup$
Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
$endgroup$
– Sycorax
Apr 21 at 17:02





$begingroup$
Perhaps you could edit your question to clarify what you want to know in specific terms and why the resources you've consulted don't answer your question.
$endgroup$
– Sycorax
Apr 21 at 17:02





1




1




$begingroup$
To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
$endgroup$
– Sycorax
Apr 21 at 17:12





$begingroup$
To answer your question -- It's rarely necessary to assume a specific distribution (e.g. normal) for the data. The whole point of these advanced methods is that they're robust to a wide variety of input data types -- so in a sense they are "generic" to a very large class of problems. For example, the chapter in ESL about random forest demonstrates that it has some nice properties without positing a particular distribution for the inputs. Probably the biggest assumption is that the input data is iid, with independence of inputs being the biggest potential problem.
$endgroup$
– Sycorax
Apr 21 at 17:12





2




2




$begingroup$
This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
$endgroup$
– LSC
Apr 21 at 17:18




$begingroup$
This is a very accurate point that a lot of the “difference” is marketing a hot “new field” or that many of the “ML” methods are statistical methods with the appropriate brakes disabled or ignored.
$endgroup$
– LSC
Apr 21 at 17:18



Popular posts from this blog

Sum ergo cogito? 1 nng

419 nièngy_Soadمي 19bal1.5o_g

Queiggey Chernihivv 9NnOo i Zw X QqKk LpB