Probability is a theoretical branch of mathematics dealing with the prediction of the likelihood of *future* events, and therefore is useful for the evaluation of the consequences of mathematical definitions.
Statistics involves the analysis of the frequency of *past* events, it is data-driven and therefore an applied branch of mathematics and is useful for the analysis of events based on observation of cumulative data about them.
Some people (linguists) further confuse in the above discussion the distinction between data and algorithms in modern NLP. This is a whole other topic also related to the Norvig-Chomsky debate I mentioned in a previous blog post. In a later post I will explain some further common misconceptions.