Longest Common Subsequence
The Longest Common Subsequence (LCS) problem is a classic computer science problem used to find the longest subsequence common to all sequences in a set of sequences (often just two sequences). A subsequence is a sequence derived from another sequence where some elements might be deleted without changing the order of the remaining elements.
Formal Definition
Given two sequences, X = x_1, x_2, ..., x_m and Y = y_1, y_2, ..., y_n, find a maximum-length common subsequence of X and Y.
Properties
- The LCS problem is a typical example of a problem that can be solved using dynamic programming.
- LCS is useful in data comparison applications, such as diff tools, bioinformatics for DNA sequencing, and version control systems.
Dynamic Programming Approach
The solution involves creating a 2D array dp where dp[i][j] represents the length of the LCS of the subarrays X[0:i] and Y[0:j].
Recurrence Relation
- If
x_i == y_j: - Otherwise:
Time Complexity
The time complexity of this approach is , where and are the lengths of the two sequences. This is because every cell in the table needs to be filled exactly once.
Space Complexity
The space complexity is also due to the 2D array used for storing intermediate results. However, it can be optimized to by using only the last and current row of the table.
Implementation
def LCS(X, Y):
m, n = len(X), len(Y)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i-1] == Y[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[m][n]
Applications
- Version control systems (like Git): Used to reconcile different versions of files.
- Bioinformatics: Used to compare DNA and protein sequences.
- Data comparison: To find similarities between different sets of data.