Skip to content

Example: Longest Common Subsequence

Roberto Fronteddu edited this page Feb 18, 2025 · 12 revisions
  • Given a sequence X={$x_1$,...,$x_m$} we say that Z={$z_1$,...,$z_k$} is a subsequence of X if there is a strictly increasing sequence {$i_1$,...,$i_k$} of indices of X such that $X_{i_j}$ = $Z_j$ for each j=1...k.

  • Given two sequences X and Y={$y_1$,...,$y_n$}, Z is a Common Subsequence if it is a subsequence of both.

  • In the Longest Common Subsequence problem we are given X and Y and are tasked to find the Longest Common Subsequence.

  • To formulate the problem, we observe that if Z is the LCS of X and Y then:

    1. If $x_m$ == $y_n$ then $z_k$ == $x_m$ == $y_n$ and $Z_{k-1}$ is the LCS of $X_{m-1}$ and $Y_{n-1}$.

The proof is divided in two parts.

  • If $z_k$ was different than $x_m$ then we could append $x_m$ to Z obtaining a LCS longer than Z, but that is a contradiction because Z is the LCS.

  • If $Z_{k-1}$ was not the LCS of $X_{m}$ and $Y_{n-1}$, then we could find a common subsequence longer than k-1, but the length of the LCS Z was k so that is not possible.

    1. If $x_{m}$ != $y_{n}$ and $x_{m}$ != $z_{k}$, then Z is LCS of $X_{m-1}$ and $Y_{n}$
    2. Symmetrically, If $x_{m}$ != $y_{n}$ and $y_{n}$ != $z_{k}$, then Z is LCS of $X_{m}$ and $Y_{n-1}$

To prove this, we can observe that if Z doesn't contain $x_{m}$ but is a LCS of X and Y, then Z must be a LCS of $X_{m-1}$ and $Y_{n}$ since there can't be a longer sequence without $x_{m}$.

  • From these three points we see that we must examine one of two problems:

    • If of $x_{m}$ != $y_{n}$ we must find the LCS of $X_{m-1}$ and $Y_{n-1}$, appending $x_{m}$ to that yields the LCS of X and Y
    • If $x_{m}$ != $y_{n}$ we must pick the longest LCS in $X_{m}$ $Y_{n-1}$ and $X_{m-1}$ $Y_{n}$
  • This formulation shows that we can get the final solution by using solutions of smaller problems.

  • Let c[i,j] be the length of the LCS of $X_{i}$ and $Y_{j}$:

    • if i == 0 or j == 0 then one of the two sequences is empty thus the len of the LCS is 0
    • if i,j>0 and $x_{i}$ == $y_{j}$ then c[i, j] = c[i-1,j-1] + 1
    • if i,j>0 and $x_{i}$ != $y_{j}$ then c[i, j] = Max(c[i-1,j], c[i,j-1]

Template:

// X = 1..m
// Y = 1..n
LCS(X,Y)
    m = X.len
    n = Y.len
    c[0..m, 0..n]
    b[1..m, 1..n] // we don't really need this

    for i = 1 to m
        for j = 1 to n
            if X[i] == Y[j]
                c[i,j] = 1 + c[i-1, j-1]
                // b[i, j] "up-left"       
            else if c[i-1, j] >= c[i, j-1]
                c[i, j] = c[i-1,j]
                // b[i, j] "up"  
            else
                c[i, j] = c[i,j-1]
                 // b[i, j] "left"              
PrintLCS_1(b, X, i, j)
    if b[i,j] == up-left
        PrintLCS_1(b, X, i-1, j-1)
        print X[i]
    else if b[i,j] == "up"
        PrintLCS_1(b, X, i-1, j)
    else
        PrintLCS_1(b, X, i, j-1)
PrintLCS_2(c, X, i, j)
    if i == 0 || j == 0 return
    if c[i, j] >= c[i-1,j] && c[i, j] >= c[i,j-1]
        // up left
        PrintLCS_2(c, X, i-1, j-1)
        print X[i]
    else if c[i-1,j] >= c[i,j-1]
        // up
        PrintLCS_2(c, X, i-1, j)
    else
        // left
        PrintLCS_2(c, X, i, j-1)
Clone this wiki locally