“Intelligence is not just about pattern recognition and function approximation. It’s about modeling the world”. — Josh Tenenbaum, NeurIPS 2021.
Graph is denoted \(G=(V,E)\).
A path is a sequence of nodes where each pair of consecutive nodes are connected by an edge.
A simple path does not revisit nodes, whereas a general path may.
\[\begin{align} anc(v) &= \{u : \exists \text{ a path from } u \text{ to v} \} \\ dec(v) &= \{u : \exists \text{ a path from } v \text{ to u} \}. \end{align}\]
Ancestors and descendants include indirect relationships established through one or more directed edges.
A cycle is a path where the starting and ending nodes are the same, and no intermediate nodes are repeated.
DAG is a directed graph with no cycle.
A Directed Graphical Model (DGM) is a DAG in which
\[ X_v \bot X_u | X_{pa(v)} \text{ for } u \in anc(v) \setminus pa(v)\]
This reduces the complexity of the model (e.g., number of parameters) by leveraging conditional independence.
\[P(\boldsymbol{X}) = \prod_{v} P(X_v | X_{pa(v)}); v \in V\]
This factorization reflects the dependency structure encoded in the graph, allowing for efficient representation and inference of the joint distribution.
Time series data where the observation at time \(t\) depends on the past observations \(1, ..., t-1\).
\[ P(X_{1:T}) = P(X_1) \prod_{t=2}^{T} (X_t | X_{1:t-1})\]
First order Markov chain:
\[ P(X_{1:T}) = P(X_1) \prod_{t=2}^{T} (X_t | X_{t-1})\]
First-order Markov chain on the latent state.
Second-order Markov chain on the latent state.
Exercise: construct a DGM on these variables. Think about which variables are the “causes” and which are the “outcomes.”
Difficult of class and Intelligence affects your Grade. Intelligence affects SAT score. Letter is determined by Grade.
\[p(D, I, G, S, L) = p(I) p(D) p(G|D,I) p(S|I) p(L|G)\]
Let \(G = (V, E)\) be a DAG and \(A, B, C \subset V\). We want to determine if \(A \bot_G B | C\).
Bayes ball algorithm.
Chain (pipe): \(X \to Y \to Z\). Suppose we observe \(Y\): \(Y \in C\). Then, is \(X \bot Z | Y\)?
\[\begin{align} p(x, z | y) &= \frac{p(x, y, z)}{p(y)} \\ &= \frac{p(x) p(y | x) p(z | y)}{p(y)} \\ &= \frac{p(x, y)}{p(y)} p(z | y) \\ &= p(x | y) p(z | y). \end{align}\]
Therefore, if \(Y \in C\), the ball cannot bounce from \(X\) to \(Z\).
Consider a path \(D \to G \to L\) or \(I \to G \to L\). The letter depends on the grade and if it is observed, the values of \(I\) and \(D\) are irrelevant (good grade will result in a good letter).
Tent (fork): \(X \leftarrow Y \rightarrow Z\). Suppose we \(Y\) is observed. Then, is \(X \bot Z | Y\)?
\[\begin{align} p(x, z | y) &= \frac{p(y) p(x | y) p(z | y)}{p(y)} \\ &= p(x | y) p(z | y). \end{align}\]
So, YES! If we observe the root node, it separates the children and the ball cannot bounce from \(X\) to \(Z\).
The values taken by SAT score and Grade are independent given that we observe intelligence.
Collider (v-structure): \(X \to Y \leftarrow Z\). Suppose we \(Y\) is observed. Then, is \(X \bot Z | Y\)?
\[\begin{align} p(x, z | y) &= \frac{p(x) p(z) p(y | x, z)}{p(y)} \\ &= \frac{p(x, z) p(y | x, z)}{p(y)} \end{align}\]
If \(Y\) is observed, the ball can bounce from \(X\) to \(Z\) (and vice versa) via \(Y\).
Note that marginally, \(X \bot Z\)!! because \(p(x, z) = p(x) p(z)\).
Therefore, ball cannot bounce from \(X\) to \(Z\) if \(Y\) is unobserved.
This is referred to as explaining away or Berkson’ paradox, where the observed value can be due to either of the parent nodes.
Consider \(I \to G \leftarrow D\). Clearly intelligence of the student and difficulty of the class are unrelated (marginally independent). But if we observe \(G=A\), then is it because the student is intelligent or because the course is easy? It can be explained away by either.
This conditional dependence arises because \(Y\) introduces a dependency between \(X\) and \(Z\), even though they are marginally independent. This is a key property of colliders in DGMs.
For example, let \(G=A\). Then, if \(I\) is known, then it will change our belief on \(D\) and vice versa. If \(D=hard\) along with \(G=A\), then we will update our belief so that \(P(I=high)\) is high.
Can ball bounce from \(I\) to \(D\) and vice versa? Is \(I \bot D | L\)?
Essentially, observing \(L\) unblocks \(G\) as a collider so that ball can bounce from \(I\) to \(D\).