Longest Common Subsequence

The Longest Common Subsequence (LCS) problem is a classic computer science problem used to find the longest subsequence common to all sequences in a set of sequences (often just two sequences). A subsequence is a sequence derived from another sequence where some elements might be deleted without changing the order of the remaining elements.

Formal Definition

Given two sequences, X = x_1, x_2, ..., x_m and Y = y_1, y_2, ..., y_n, find a maximum-length common subsequence of X and Y.

Properties

The LCS problem is a typical example of a problem that can be solved using dynamic programming.
LCS is useful in data comparison applications, such as diff tools, bioinformatics for DNA sequencing, and version control systems.

Dynamic Programming Approach

The solution involves creating a 2D array dp where dp[i][j] represents the length of the LCS of the subarrays X[0:i] and Y[0:j].

Recurrence Relation

If x_i == y_j: $dp[i][j] = dp[i-1][j-1] + 1$
Otherwise: $dp[i][j] = \max (dp[i-1][j], dp[i][j-1])$

Time Complexity

The time complexity of this approach is $O(m \times n)$ , where $m$ and $n$ are the lengths of the two sequences. This is because every cell in the $m \times n$ table needs to be filled exactly once.

Space Complexity

The space complexity is also $O(m \times n)$ due to the 2D array used for storing intermediate results. However, it can be optimized to $O(\min(m, n))$ by using only the last and current row of the table.

Implementation

def LCS(X, Y):
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
    
    return dp[m][n]

Applications

Version control systems (like Git): Used to reconcile different versions of files.
Bioinformatics: Used to compare DNA and protein sequences.
Data comparison: To find similarities between different sets of data.

Formal Definition​

Properties​

Dynamic Programming Approach​

Recurrence Relation​

Time Complexity​

Space Complexity​

Implementation​

Applications​