With the popularity of statistical NLP methods and probabilistic models of natural language I have found the distinction below is paramount in the heads of some linguists.

Probability is a theoretical branch of mathematics dealing with the prediction of the likelihood of *future* events, and therefore is useful for the evaluation of the consequences of mathematical definitions.

Statistics involves the analysis of the frequency of *past* events, it is data-driven and therefore an applied branch of mathematics and is useful for the analysis of events based on observation of cumulative data about them.

Some people (linguists) further confuse in the above discussion the distinction between

*data*and*algorithms*in modern NLP. This is a whole other topic also related to the Norvig-Chomsky debate I mentioned in a previous blog post. In a later post I will explain some further common misconceptions.