Subsequence of length 3

dragonzurfer
3 min readMay 25, 2017

what is a subsequence?

Subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. example: ACE is a subsequence of the sequence ABCDEF since you can delete B,D,F. But AEC is not a subsequence since the order is changed after deletion.

How many sub-sequences are there for a string of length N?

C(N,3)

What we need to do

Given a string T and another string S of length 3. We want to find the number of sub-sequences of length 3 from T which match the string S.

A Naive solution

One method would be to find all 2^N sub-sequences and check all the one of length 3 matches with S. This would be bad . Since 2^N is a huge number as N increases. Lets think of better naive solution.

As we know that we only have to find the sub-sequences of length 3(assume S=”ABC” you can substitute A,B,C with any letter of your choice). A naive code for checking if T[i] = ‘A’, T[j] = ‘B’, T[k] = ‘C’ subject to the condition that 1 <= i < j < k <= N

ans = 0
for i in xrange(1, n+1) :
if T[i] == ‘A’ :
for j in xrange(i+1, n+1) :
if T[j] == ‘O’ :
for k in xrange(j+1, n+1) : #1
if T[k] == ‘L’ : #2
ans += 1 #3

Complexity of the above code is O(n³) .

Can we do better

Main observation is that the sub-sequence is only of length three. how can we make use of this?

Now let us take the string T=”AACBCBCC” and our S=”ABC”

Now lets go to the first index of B that is 4. And isolate ourself. Now the question is how many ABC can we make using this B?.

If you count carefully you’ll get it as 6. What if I told you there are 2 A’s to the left and 3 C’s to the right of this B at index 4. We can simply multiply the number of A’s to the left and number of C’s to the right.

Reduction

So the problem has now reduced to looking at all the B’s indices and multiplying the number of A’s and C’s to the right and left respectively and then summing them up. This is out answer.

Time Complexity

We need to traverse our string for indices of B. This would take us O(N) time. And find the number of A’s and C’s to the left and right would take as another O(N) time. So the overall complexity is O(number of occurrences of B*N) ,in the case where all indices are B O(N²) . Yay !. Can we do better?

The traversing time for B cannot be reduced. But the time required to find number of A’s to the left and B’s to the right can be done as an O(1) operation.

We simply maintain a prefix array which keeps note of the number of A’s and a suffix array which keeps note of the number of C’s. We can then instantly find the number of C’s and A’s at any index i.

Hence the overall time complexity is reduced to O(N). Hope this was mind blowing :p

#include <bits/stdc++.h>
using namespace std;
#define mp make_pair
#define pb push_back
#define vll vector<ll>
#define F first
#define S second
#define pll pair<ll,ll>
#define FOR1(i,a) for(i=0;i<=a;i++)
#define FOR2(i,a,b) for(i=a;i<=b;i++)
#define endl '\n'
#define clr(a) memset(a,0,sizeof(a))
#define all(x) x.begin(),x.end()
typedef long long ll;
int main()
{
ll t;cin>>t;
while(t--)
{
ll n,i,j,k;
string s;cin>>s;n=s.length();
vector<ll>prefix(n,0),suffix(n,0);

if(s[0]=='A')
prefix[0]=1;
FOR2(i,1,n-1)
if(s[i]=='A')
prefix[i]=prefix[i-1]+1;
else
prefix[i]=prefix[i-1];
if(s[n-1]=='C')
suffix[n-1]=1;
for(i=n-2;i>=0;i--)
if(s[i]=='C')
suffix[i]=suffix[i+1]+1;
else
suffix[i]=suffix[i+1];
ll cnt=0;
FOR2(i,1,n-2)
if(s[i]=='B')
cnt+=(prefix[i-1]*suffix[i+1]);
cout<<cnt<<endl;
}
return 0;
}

--

--