# Probabilistic Programming 101

This class will introduce a concept of probabilistic programming — a new declarative paradigm meant to represent and solve a wide class of problems involving uncertainty and lack of perfect knowledge.

Be warned! This class assumes that student knows at least basics of the probability theory, e.g. what a random variable is, how to use chain rule and Bayes theorem. A short (and a bit dense) introduction can be read here.

## Tools

In this class we will use ProbLog language, which is based on Prolog. If you haven't heard of this language, read this short and gentle introduction and later do the more comprehensive tutorial in your free time.

For now, you should now that you can use ProbLog in two different ways:

1. from command line: `problog <path to file with model>`

## First Model

We will start gently, by creating a fairly reasonable model of an academic teacher assessing an essay question.

First of all, there is a very high chance (let's say 90%), that the handwriting will be illegible. This is so called probabilistic fact and can be expressed in a one line of ProbLog code.

`0.9::illegible_handwriting.`

Notice that name of the fact starts with a small letter — it's important! The other thing is that every statement ends with a dot `.`.

Let's return to the problem… the only fair way to assess this type of answer is to throw a coin. Let's define a coin toss as another probabilistic fact.

`0.5::heads.`

And now we will write a first rule, that says: 'If the handwriting is illegible and I get heads, the student pass the exam'.

`pass_exam :- illegible_handwriting, heads.`

Here `:-` means 'if' and `,` means 'and' (student pass exam if handwriting is illegible and head is tossed).

Of course there is always a chance, that the handwriting will be nice and teacher will be able to comprehend the answer. Then everything depends on the student's intelligence. We will add it to model by introducing one fact and one more rule.

```0.4::student_knows_the_answer.

`\+` represents here the logical negation.

The last (but not the least) element of the model is query definition — what do we would like to know? In this case we are interested in probability of passing the exam.

`query(pass_exam).`

Now put all these lines into one file (e.g. test.pbl), and run:

`# problog test.pbl`

…or paste it in the web ide and press 'Evaluate'.

One way or another you should get a probability of the desired event.

### Bayesian Network The model we have written can be represented as a simple bayesian network as in the image on the right (just ignore constants `c1`, etc.). If you want to generate a bayesian network from a ProbLog model, you can do it using `problog bn` command, e.g.

`# problog bn -o test.dot --format dot test.pbl`

will generate a dot file, and:

`# dot -Tpng test.dot -o test.png`

will render the network as a png file.

In the previous model we have calculated what is the probability of passing the exam by a generic student. But what if we want to calculate the same for a more specific case… let's say that the student knows the answer. In Bayesian statistics we can represent it as evidence — something we know.

In ProbLog we can add evidences by writing simply:

`evidence(student_knows_the_answer, true).`

Add this line to the model a re-evaulate it. What is the impact of knowledge on the probability of passing the exam. Isn't it motivating?

Try to add different evidences, e.g. that student doesn't know the answer (use `false` instead of `true`).

## Students Live In Groups

Our model seems to be pretty realistic and shiny but has one major drawback — it models assessment of only one student, while in reality most of the students live in herds (or so called `groups`) and teacher has to assess one work per student.

The naive approach would be to duplicate facts and rules for every student, e.g.

```0.9::joan_has_illegible_writing.
0.9::maxwell_has_illegible_writing.

query(joan_pass_exam).
query(maxwell_pass_exam).```

It could work, but it certainly wouldn't be a manageable solution for more than three students… In order to model bigger herds we will use power of first-order logic!

### First-Order Logic To The Rescue

First things first — we need to redefine our probabilistic facts.

```0.9::has_illegible_writing(maxwell).
0.9::has_illegible_writing(joan).

The facts are predicates now and therefore have arguments. It's neat, but in comparison to our previous approach, it doesn't save too much space… But let's focus on the rules — we would like to say: “Student passes the exam if he/she has illegible handwriting and when tossed the coin for him/her, we got heads. In ProbLog (and Prolog) an unspecified object is represented by a variable, which starts with a big letter.

```pass_exam(Student) :- heads(Student), has_illegible_writing(Student).

Good advice: if you use negation (`\+`) together with variables in the rule body, be sure that the variable has been already used in the same rule body in a positive context, e.g. don't write

`\+ has_illegible_writing(Student), knows_the_answer(Student)`,

because variable `Student` is used here first in the negative context. If you want to know details, learn more about Prolog ;)

Try to run the following model and check what is the probability of passing the exam by Joan. Try to add some evidence to the model.

### Non-Deterministic Rules Save The Day

Still, we have some redundancy in the code. Our probabilistic facts are simply duplicated and every programmer should instantly think of a way to get rid of the duplicates. In order to do that we should first extract somewhere the list of students. We will use plain deterministic (probability equal 100% can be omitted) facts.

`student(maxwell). student(joan).`

Then we will rewrite our original probabilistic facts as probabilistic rules:

`0.9::has_illegible_writing(Name) :- student(Name).`

Do the same with the other fact and check the new model. Isn't it neat?

## Levels of illegibility

Okay. We haven't be fair — there is no way to simply classify all the handwriting in only two classes. There is at least third one: 'partly_legible'. In order to model that we will need to introduce a second argument to our probabilistic fact.

```0.5::handwriting(Name, illegible) :- student(Name).
0.4::handwriting(Name, partly_legible) :- student(Name).
0.1::handwriting(Name, legible) :- student(Name).```

We also have have to add a new rule, that takes care of what happens when teacher is able to read only a part of the answer.

```0.3::pass_exam(Student) :- handwriting(Student, partly_legible), \+ knows_the_answer(Student).

Update rest of the rules accordingly and check if the model still works. What is the chance of passing the exam by Maxwell, who knows the answer but his handwriting is only partly legible?

### Last Fixes

There is one problem with the approach we've used with `handwriting` predicate. Have you noticed? All three facts `handwriting(Student, illegible), handwriting(Student, partly_legible), handwriting(Student, legible)` can be true at the same time. In other words, our model has a flaw we need to fix!

There are to ways to take care of that:

1. (not recommended) we add additional rules that say that only one value can be true.
2. (recommended) use so called annotated disjunctions, which will be explained briefly.

If we have three facts that only one of them can be true at the time, we can write them as:

`fact1; fact2; fact3.`

The same we can due with a rule:

`fact1; fact2; fact3 :- rule_body.`

Which in our case translates to:

```0.5::handwriting(Name, illegible);
0.4::handwriting(Name, partly_legible);
0.1::handwriting(Name, legible ):- student(Name).```

Try to use that instead of the previous notation. Is there any difference?

## First Assignment

Based on the first ProbLog model you've just finished, you have to model a classical problem of burglary vs earthquake. The story is rather simple:

You live in the Los Angeles, beautiful city where earthquakes are not uncommon. You've installed a new alarm system in your house that was meant to protect you from burglaries. Unfortunately alarm is too sensitive and sometimes it get startled by an earthquake. Because of that you've set the alarm to not call the police. Instead of that your neighbors Joan and Maxwell are calling you as soon as they hear the alarm (if they are around). Unfortunately both Joan and Maxwell love pranks and sometimes call you only for fun.

Question: what is the probability of burglary, if you are called by Joan only and you didn't receive call from Maxwell? How much probable is that it was only an earthquake?

### Priors

In order to create the model, you will need some prior probabilities. So:

• probability of a burglary in your neighborhood is rather high and equal `0.7`.
• heavy earthquakes are very rare, probability of such a earthquake equals `0.01`.
• mild earthquakes are not uncommon, probability of a mild earthquake equals `0.19`.
• your alarm's sensitivity is specified by five probabilities:
• there is `0.90` probability that alarm start when there both burglary and heavy earthquake occur at the same time
• `0.85` when there both burglary and mild earthquake occur at the same time
• `0.80` during a burglary when there is no earthquake
• `0.30` during a heavy earthquake without any burglars
• `0.10` during a mild eathhquake without any burglars
• if there is an alarm going on, neighbor will call you with `0.8` probability
• there is small probability (`0.1`) that neighbor calls you only to prank you 