Introduction

In this blog post, I will explain the concept of a Random Variable from a software developer’s perspective.

Definitions

Before we dive in, I want to define a few terminologies, for those of you who are new to probability and statistics. Feel free to skip this section if you think you are too cool for this.

Experiment

An experiment is an experiment… What would you do if you are asked to test if a coin is fair? I would flip the coin 1000 times and see if the counts of head and tails are equal. If head occurs only 300 times and tail occurs 700 times, you probably can conclude that the coin is not fair!

So what is an experiment? In the context of the example described above, flipping the coin 1000 times is an Experiment.

Trial

A trial is one coin flip. The coin-flip experiment above consists of 1000 trials!

Outcome

An outcome is one possible outcome of a trial. As for the coin-flip experiment, the outcome of the trial is either head or tail.

Sample space

A sample space is a set of all possible outcomes of the experiment. Assume we only perform 3 trials for our coin-flip experiment. The sample space will be { HHH, HHT, HTH, HTT, THH, THT, TTH, TTT }, where H denotes head and T denotes tail.

Cool stuffs

So, I learnt about a thing called Random Variable this week in an introductory statistics course I am taking at the moment. According to the lecture note, a random variable is both a function and a variable. It can be assigned to a value and it also defines the mapping of an outcome in the sample space to a value.

For example, let’s consider the coin-flip example. We can define a Random Variable X to be a function that takes an outcome of the experiment and returns the count of heads.

eg.

X(HHH) = 3
X(HTH) = 2
X(THH) = 2
X(TTT) = 0

Now, let’s define another term called Range. The Range of a Random Variable function is a set that contains all possible outputs of the function. Let’s go back to our coin-flip experiment that consists of only three trials. The Range of the Random Variable X we defined above will be 0, 1, 2, 3. It cannot be larger than three since there are only 3 trials and it cannot be negative because monkeys like bananas.

If you understand static typing, the following snippet may be helpful. If you don’t, TOO BAD!

type Outcome = Head | Tail;
type ExperimentOutcome: (Outcome, Outcome, Outcome)
type Range = 0 | 1 | 2 | 3 | 4
type myRandomVar = (Outcome) => (Range)
val sampleSpace: Array[ExperimentOutcome] = { ... }

What if you want to select a subset of the sample space that satisfies a value of our random variable?

For example, how do you select a subset of the sample space such that the subset contains only outcomes that contain exactly two heads.

You can just do

sampleSpace.filter (event) => X(event) == 2

Isn’t it neat?

Thanks for reading LOL