Every time, one side is more likely to come up than the other. However, if we know the coin is not fair, but comes up heads or tails with probabilities p and q, then there is less uncertainty. This is the situation of maximum uncertainty as it is most difficult to predict the outcome of the next toss the result of each toss of the coin delivers a full 1 bit of information. The entropy of the unknown result of the next toss of the coin is maximised if the coin is fair (that is, if heads and tails both have equal probability 1/2). Information uncertainty can be used interchangeably.Ĭonsider tossing a coin with known, not necessarily fair, probabilities of coming up heads or tails. The higher the uncertainty or the surprise, i.e.Īnd is used as the definition of the information entropy (or distribution in the case of continuous random variable), we let In the case of a non-uniform probability mass function Now return to the case of playing with one die only (the first one) Thus the uncertainty of playing with two dice is obtained by adding Possible values ofįor the uninitiated, it is hard to develop a feel for the totallyĪbstract expression in Eq.(1), which could be a big turn-off forĪdditivity characteristic for uncertainty.įor example, consider appending to each value of the first Where K is a constant corresponding to a choice of measurement units.Ī measure of uncertainty (see further below) This implies that the entropy of a certain outcome is zero:Īny definition of entropy satisfying these assumptions has the form: If we mentally divide this ensemble into k boxes (sub-systems) with b i elements in each, the entropy can be calculated as a sum of individual entropies of the boxes weighed by the probability of finding oneself in that particular box PLUS the entropy of the system of boxes.įor positive integers b i where b 1 + … + b k = n, It demands that the entropy of a system can be calculated from the entropy of its sub-systems if we know how the sub-systems interact with each other.Īssume that we have an ensemble of n elements with a uniform distribution on them. This last functional relationship characterizes the entropy of a system with sub-systems. The amount of entropy should be the same independently of how the process is regarded as being divided into parts. In this case, the entropy increases with the number of outcomes. If all the outcomes are equally likely, then entropy should be maximal (uncertainty is highest when all possible events are equiprobable). The measure should be unchanged if the outcomes x i are re-ordered. The measure should be continuous - i.e., changing the value of one of the probabilities by a very small amount should only change the entropy by a small amount. Information entropy is characterised by these desiderata: P( x i) = Pr( X= x i) is the probability mass function of X. I( X) is the information content or self-information of X, which is itself a random variable and The information entropy of a discrete random variable X, that can take on possible values is Shannon in his 1948 paper "A Mathematical Theory of Communication". Įquivalently, the Shannon entropy is a measure of the average information content the recipient is missing when he does not know the value of the random variable. The entropy of English text is between 1.0 and 1.5 bits per letter. A long string of repeating characters has an entropy of 0, since every character is predictable. However, if the coin is not fair, then the uncertainty is lower (if asked to bet on the next outcome, we would bet preferentially on the most frequent result), and thus the Shannon entropy is lower. This also represents an absolute limit on the best possible lossless compression of any communication: treating a message as a series of symbols, the shortest number of bits necessary to transmit the message is the Shannon entropy in bits/symbol multiplied by the number of symbols in the original message.Ī fair coin has an entropy of one bit. It is the minimum message length necessary to communicate information. It quantifies the information contained in a message, usually in bits or bits/symbol. In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. 7 Extending discrete entropy to the continuous case: differential entropy.5.4 Limitations of entropy as information content.5.1 Relationship to thermodynamic entropy.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |