### Bayesian Programming

Probability as an extension of logic

Introduction

# A need for a new modeling methodology

The existence of a systematic and generic method to build models is a sine qua non requirement for the success of a modeling and computing paradigm. This is why algorithms are taught in the basic course of computer science giving students the elementary and necessary methods to develop classical programs.

We propose Bayesian Programming as this generic methodology to build subjective probabilistic models. It is very simple even if it is atypical and a bit worrisome at the beginning.

The purpose of Chapters 2 to 11 of the book Bayesian Programming is to present this new modeling methodology.

The presentation is intended for the general public and does not suppose any prerequisites other than a basic foundation in mathematics.

Its purpose is to introduce the fundamental concepts, to present the novelty and interest of the approach, and to initiate the reader to the subtle art of Bayesian modeling. Numerous simple examples of applications are presented in different fields.

It is divided in two parts, Chapters 2 to 6 which present the principles of Bayesian programming and Chapters 7 to 11 which offer a cookbook for the good practice of probabilistic modeling.

## Chapter 2 — Basic Concepts:

The purpose of this chapter is to gently introduce the basic concepts of Bayesian Programming.

We start with a simple example of Bayesian spam filtering, which helps to eliminate junk e-mails. Commercially available software is based on a similar approach.

The problem is very easy to formulate. We want to classify texts (e-mail) into one of two categories, either nonspam or spam. The only information we can use to classify the e-mails is their content: a set of words.

The classifier should furthermore be able to adapt to its user and to learn from experience. Starting from an initial standard setting, the classifier should modify its internal parameters when the user disagrees with its own decision. It will hence adapt to the user’s criteria to categorize nonspam and spam. It will improve its results as it encounters increasingly classified e-mails.

The classifier uses an N words dictionary. Each e-mail will be classified according to the presence or absence of each of the words.

## Chapter 3 — Incompleteness and Uncertainty:

The goal of this chapter is twofold: (i) to present the concept of incompleteness and (ii) to demonstrate how incompleteness is a source of uncertainty.

## Chapter 4 — Description = Specification + Identification:

In this chapter, we come back to the fundamental notion of description. A description is a probabilistic model of a given phenomenon. It is obtained after two phases of development:

1. A Specification phase where the programmer expresses in probabilistic terms his own knowledge about the modeled phenomenon.

2. An Identification phase where this starting probabilistic canvas is refined by learning from data.

Descriptions are the basic elements that are used, combined, composed, manipulated, computed, compiled, and questioned in different ways to build Bayesian programs.

## Chapter 5 — The Importance of Conditional Independence:

The goal of this chapter is both to explain the notion of Conditional Independence and to demonstrate its importance in actually solving and computing complex problems.

## Chapter 6 — Bayesian Program = Description + Question:

In the two previous chapters, as an example, we built a description (Bayesian model) of a water treatment center. In this chapter, we use this description to solve different problems: prediction of the output, choice of the best control strategy, and diagnosis of failures. This shows that multiple questions may be asked with the same description to solve very different problems. This clear separation between the model and its use is a very important feature of Bayesian Programming.

Chapters 2 to 6 present the concept of the Bayesian program. Chapters 7 to 11 are used to show how to combine elementary Bayesian programs to build more complex ones. Some analogies are stressed between this probabilistic mechanism and the corresponding algorithmic ones, for instance the use of subroutines or conditional and case operators.

## Chapter 7 — Information Fusion:

The most common application of Bayesian technics is to merge sensor information to estimate the state of a given phenomenon.

The situation is always the same: you want information about a given phenomenon; this phenomenon influences sensors that you can read and from these readings you try to estimate the phenomenon.

Usually the readings are neither completely informative about the phenomenon, nor completely consistent with one another. Consequently, you are compelled to a probabilistic approach and the question you want to address is what is the state knowing the readings.

A very common difficulty is the profusion of sensors which leads to a very high dimensionality state space for the joint distribution. A very common solution to break this curse of dimensionality is to make the very strong assumption that, knowing the phenomenon, the sensors may be considered to provide independent readings. Knowing the common cause, the different consequences are considered independent.

However, this hypothesis is often caricatured. In this chapter we present this basic approach but, also, different ways to relax this “naive” hypothesis.

## Chapter 8 — Bayesian Programming with Coherence Variables:

What does “equality” mean for Bayesian variables?

Two different calculi may lead to the same result. It is the case if you try to compute the same thing with two different methods in a “consistent” or “coherent” calculus system. You can impose it as a constraint of your model by specifying that a given equation should be respected. Solving the equation then consists in finding the conditions on the two terms of the equation in order to make them “equal.” It can finally be used as a programming notion when you “assign” the result of a calculus to a given variable in order to use it in a subsequent calculus. However, for all these fundamental notions of logic, mathematics and computing the results of the calculus are always values either Boolean, numeric, or symbolic.

In probabilistic computing, the basic objects that are manipulated are not values but rather probability distributions on variables. In this context, the “equality” has a different meaning as it should say that two variables have the same probability distribution. To realize this, we introduce in this chapter the notion of a “coherence variable” linking two variables of any nature.

A coherence variable is a Boolean variable. If the coherence variable is equal to 1 (or “true”) it imposes that the two variables are “coherent” which means that they should share the same probability distribution knowing the same premisses.

## Chapter 9 — Bayesian Programming Subroutines:

The purpose of this chapter is to exhibit a first mean to combine descriptions with one another in order to incrementally build more and more sophisticated probabilistic models. This is obtained by including in the decomposition, calls to Bayesian subroutines. We show that, as in standard programming, it is possible to use existing probabilistic models to build more complex ones and to further structure the definition of complex descriptions, as some reusability of a previously defined model is possible.

## Chapter 10 — Bayesian Programming Conditional Statement:

The purpose of this chapter is to introduce probabilistic branching statements. We will start by describing the probabilistic “if-then-else” statement which, as in standard programming, can naturally be extended to a probabilistic “case” statement. From an inference point of view, the probabilistic if-then-else statement is simply the integration over the probability distribution on a binary variable representing the truth value of the condition used in the classical “if” statement. The main difference with the classical approach is that a Bayesian program will explore both branches when the truth value of the condition is given by a probability distribution. This allows us to mix behaviors and to recognize models.

## Chapter 11 — Bayesian Programming Iteration:

In this chapter we propose to define a description with distributions indexed by integers. By setting the range of indexes, we define the number of distributions used in the description. This way we define generic descriptions only depending on the range of the indexes, just as we fixed the number of iterations in a “for” loop.

In pursuing this idea, we can revisit the notion of the filter, where each new evidence is incorporated into the result of a previous inference. If the index represents successive time intervals, we can then use these techniques to study time sequences and use the Markov assumption to simplify the description. The approach is useful for implementing dynamic Bayesian networks with the Bayesian programming formalism.