Post

KMP(Knuth-Morris-Pratt) Algorithm

The KMP (Knuth-Morris-Pratt) algorithm is a fast string search algorithm.

Referenced from 『프로그래밍 대회에서 배우는 알고리즘 문제 해결 전략 세트2』 p.657

프로그래밍 대회에서 배우는 알고리즘 문제 해결 전략 세트

1. Overview

Let’s suppose there are two strings, A and B. The length of A is N and the length of B is M. We can find out where B is included in A using the KMP algorithm. The time complexity of this algorithm is \(O(N + M)\).

For example, if A is “abacabacabad” and B is “abacabad”, we can determine where B is included within A ( i=5 ) in \(O(\text{length of A + length of B})\) time complexity.


2. The Definition of the pi Array

Let’s define a substring of string B from index i to j as B[i…j].

pi[i] represents the maximum length of the string that can simultaneously serve as both a prefix and a suffix of the substring B[0…i].

The picture below will help you understand this definition.

definition

Let’s temporarily set aside how to compute pi and assume that we have already constructed the pi array.


3. The Main Process of KMP

Let’s suppose there are two strings A( = “abacabacabad”) and B( = “abacabad”).

In \(O(N^2)\) time complexity, you may write this code to find out where B is included in A.

1
2
3
4
5
6
7
8
9
10
vector<int> sample(string A, string B){
  int N=A.size();
  int M=B.size();
  for(int i=0;i<=N-M;i++){
    bool possible=true;
    for(int j=0;j<M;j++)
      if(A[i+j]!=B[j]) possible=false;
    if(possible) printf("B is include within index: %d\n",i);
  }
}

However, we don’t need to iterate from i=0 to i=N. What does this mean? Please refer to the explanation below.

KMP

Assuming we already have the pi array, when A[q] and B[q] do not match at i=0, the next step isn’t i=i+1 but rather i=p (where p = q - pi[q-1]). This transition is perfectly valid and crucial to understand. ( Here, pi[q-1] represents the maximum length of the string that can simultaneously serve as both a prefix and a suffix of “abacaba”. )

Furthermore, since A[p…(q-1)] = B[0…(p-q-1)], we can start comparing from A[q] and B[p-q].

Code: O(N)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void kmp(string &A, string &B){
  int N=A.size();
  int M=B.size();
  vector<int> whereBIsIncluded;
  vector<int> pi = getPartialMatch(B); //Let's postpone the discussion on how this function is implemented for now.
  
  int begin=0, matched=0;
  while(begin<=N-M){
    if(matched < M && A[begin+matched]==B[matched]){
      ++matched;
      if(matched==M) whereBIsIncluded.push_back(begin);
    } else{
      if(matched==0) ++begin;
      else {
        begin += matched - pi[matched-1];
        matched = pi[matched-1];
      }
    }
  }
}


4. How to construct the pi array?

When constructing the pi array, we can utilize the same principle used in the previous explanation. We compute the pi[] array by leveraging the concept of self-matching appearances while searching for itself.

Code: O(M)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
vector<int> getPartialMatch(const string &B) {
    int M = B.size();
    vector<int> pi(M, 0);
    int begin = 1, matched = 0;
    while (begin + matched < M) {
        if (B[begin + matched] == B[matched]) {
            ++matched;
            pi[begin + matched - 1] = matched;
        } else {
            if (matched == 0) ++begin;
            else {
                begin += matched - pi[matched - 1];
                matched = pi[matched - 1];
            }
        }
    }
    return pi;
}

5. Another Code: O(N+M)

This is an alternative implementation of KMP. The basic principle remains the same. It operates in O(N+M) time complexity.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
vector<int> getPartialMatch(string &B){
  int M=(int)B.size(), matched=0;
  vector<int> pi(M, 0);

  for(int i=1;i<M;i++){
    while(matched>0 && B[matched]!=B[i]) matched=pi[matched-1];
    if(B[matched]==B[i]) pi[i]=++matched;
  }
  return pi;
}

void kmp(string &A, string &B){
  int matched=0;
  int N = (int)A.size(), M = (int)B.size();
  vector<int> pi=getPartialMatch(B);
  vector<int> whereBIsIncluded;
  
  for(int i=0;i<N;i++){
    while(matched>0 && A[i]!=B[matched])
      matched=pi[matched-1];
    if(A[i]==B[matched]){
      ++matched;
      if(matched==M){
        whereBIsIncluded.push_back(i-matched+1);
        matched=pi[matched-1];
      }
    }
  }
}

Next Post

If you understand what is kmp and how to construct it. I recommend read Z-algorithm

This post is licensed under CC BY 4.0 by the author.