Sunday, September 23, 2012

[DirectI Online test] Number of substrings containing atmost 2 different characters

Given an input string, find the number of substrings which contains atmost two different characters.
E.g - aabc Ans: 8 

Approach: 

Lets start with a simple approach,

The most naive solution would be to iterate over all the possible substrings of the array 
(~ n ^ 2) and then validate if the substrings have atmost two unique characters.
The complexity will go over to O(n^3), proof is left to the reader.

Can we do it better, indeed yes,





An important observation is that we should break when we have the distinct characters greater than 2 and we should not be computing the further substrings with the same prefix.
Also we need not find the distinct characters in each substring, rather we can just compute the largest length substring with a starting index with atmost 2 different characters, and length of the substring found is number of substrings which can be formed staring for a given index.
For e.g. aabbac
For the 0th index the aabba is the longest substring with atmost 2 different characters,
and indeed the number of strings which can be formed starting with 0th index is 5
which are a, aa, aab, aabb, aabba.
The above approach can be implemented by summation of the largest substrings of atmost 2 characters possible with each index element
The complexity boils down to O(n^2)


Can we still do better ?
A little pen and paper work will make us realize that there are lot of computations which are done repeatedly.

Once we know that the largest substring starting at index i is substring (i,j) that contains at most 2 elements.
Now to figure out the largest substring for the index i+1, should we again compute from i + 1 to j, but infact we just computed in previous iteration so why not store the result and use it back. This technique is often termed as memorisation in the computing terms.

So the algorithm looks like
Keep track of the characters and their count till the distinct characters are less than 2
At each successive iteration we need to remove the last previous index from the tracking list.
Return the result as the sum of the largest substrings formed at each index.
The running time complexity reduces to O(n)
Below is a implementation of the above logic in C

int isInArray(int a[2][2], int n, int k) {
   if(n > 2) return -1;
   if(n == 1) {
        if(a[0][0] == k) {
           a[0][1]++;     
           return 1;

        } else {
           a[1][0] = k;
           a[1][1] = 1;
           return 2;
        }
   } else if(n == 2) {
       if(a[0][0] == k) {a[0][1]++; return 2;}
       if(a[1][0] == k) {a[1][1]++; return 2;}

       return -1;
   }
   return -1;
}
int removeFromArray(int a[2][2], int k) {
    if(a[0][0] == k) {
        a[0][1]--;
        if(a[0][1] <= 0) {
           a[0][0] = a[1][0];
           a[0][1] = a[1][1];

           return 1;
        }
        return 2;
    } else if(a[1][0] == k) {
        a[1][1]--;
        if(a[1][1] <= 0) {
           return 1;
        }
        return 2;
    }
    return 2;        

}
int numberOfSubstrings ( string A ) {
    int len = A.length();
    int res = 0, j = 1, c = 1, a[2][2];
    a[0][0] = A[0]; a[0][1] = 1; 
    for(int i=0;i<len;i++) {
        for (;j<len; j++) {

           c = isInArray(a, c, A[j]);
           if(c == -1) break;  
        }
        c = removeFromArray(a,A[i]);
        res = (res + j-i);
    }
    return res;
}

5 comments :

  1. This Java method will count the number of substrings:

    private static int subStringCounter(String source) {
    if (source == null) {
    throw new NullPointerException("Invalid Input");
    }
    // Given String (as a char array)
    char[] charArray = source.toCharArray();

    int arrLength = charArray.length;

    // Initial Sub_String count, which is equal to the no of chars in the
    // given string
    int ssCount = arrLength;

    // DIfferent char count, which must be <= 2 in any sub string.
    int diffCharCnt;

    switch (arrLength) {
    // For input strings with length less than or equal to 1, sub string
    // count will be 0
    case 0:
    System.out.println("ss Count = " + 0);
    return 0;
    case 1:
    System.out.println("ss Count = " + 0);
    return 0;
    // For input strings with length equal to 2, sub string
    // count will always be 2
    case 2:
    System.out.println("ss Count = " + ssCount);
    return 2;
    default:
    for (int i = 0; i < arrLength; i++) {
    char c = charArray[i];

    diffCharCnt = 1;
    int j = i + 1;

    // ssCount++;

    // This loop will iterate from the next char
    while (j < charArray.length) {

    if (c != charArray[j]) {
    diffCharCnt++;
    }

    // If diffCharCnt > 2, we are breaking the inner loop, as
    // this
    // will not be considered as a valid sub string
    if (diffCharCnt > 2) {
    break;
    }

    ssCount++;

    // The sub string should not be same as the given string. So
    // the
    // sub string containing the first char of the given string
    // should not contain the last char of that string.
    if (++j == charArray.length - 1 && i == 0) {
    break;
    }

    }
    }
    System.out.println("ss Count = " + ssCount);
    return ssCount;
    }

    }

    ReplyDelete
  2. @Aviral: Please ignore my earlier post. It was wrong. Here I am posting a new solution with an explanation in my next post :

    private static int subStringCounter(String source) {

    if (source == null) {
    throw new NullPointerException("Invalid Input");
    }
    // Given String (as a char array)
    char[] charArray = source.toCharArray();

    int arrLength = charArray.length;

    // Initial Sub_String count, which is equal to the no of chars in the
    // given string
    int ssCount = arrLength;

    // DIfferent char count, which must be <= 2 in any sub string.
    int diffCharCnt;

    switch (arrLength) {
    // For input strings with length less than or equal to 1, sub string
    // count will be 0
    case 0:
    System.out.println("ss Count = " + 0);
    return 0;
    case 1:
    System.out.println("ss Count = " + 0);
    return 0;
    // For input strings with length equal to 2, sub string
    // count will always be 2
    case 2:
    System.out.println("ss Count = " + ssCount);
    return 2;
    default:
    int innerLoopLimit = 0;
    char[] uniqueChars = new char[2];
    for (int i = 0; i < arrLength; i++) {
    char c = charArray[i];
    uniqueChars[0] = c;
    uniqueChars[1] = c;

    diffCharCnt = 0;
    int j = i + 1;

    if (i == 0) {
    innerLoopLimit = arrLength - 1;
    } else {
    innerLoopLimit = arrLength;
    }

    while (j < innerLoopLimit) {
    diffCharCnt = 0;
    if (uniqueChars[0] == uniqueChars[1]) {
    if (uniqueChars[0] != charArray[j]) {
    uniqueChars[0] = charArray[j];
    }
    ssCount++;
    j++;
    continue;
    }
    if (uniqueChars[0] != charArray[j]) {
    diffCharCnt++;
    }
    if (uniqueChars[1] != charArray[j]) {
    diffCharCnt++;
    }
    if (diffCharCnt >= 2) {
    break;
    }
    ssCount++;
    j++;

    }
    }
    System.out.println("ss Count = " + ssCount);
    return ssCount;
    }

    }

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. My Approach:

    1. For Strings with length less than or equal to 1, substring count will be 0.
    2. For strings with length equal to 2, substring count will be 2.
    3. Below is the explanation for strings with length > 2:
    4. Initial sub string count will be equal to the length of the string, where each sub string is of length one. (For string “aabc”, it will be 4.)
    5. Created one char array from the given string (“charArray”).
    6. Start iterating through this array,

    For(int i = 0 ; i < arrLength ; i++ )

    7. A char array with length 2 is being used (“UniqueChars”), which will contain the two different chars in any sub string.

    Initially both the elements of this array will contain the ith element. (So, initially both the elements will have the same value).

    8. Start iterating the char array (“charArray”) starting from the (i + 1)th element.

    The first element that does not match with the elements of the char array (“UniqueChars”) will occupy the 0th position in that array. Now this char array (“UniqueChars”) has two different chars.

    If the next char matches with any of the chars in that array, these three chars will form one sub string. Again if next to next char matches with any of the chars in that array, these four chars will form one sub string. Like this we have to count the sub strings till we find a char that does not match with any of the chars in that 2 char array, in which case we have to break out from the inner loop and continue with the next ith value from step 6.

    ReplyDelete