Saturday, December 18, 2010

Subsequence again !!!

You are given a text string S, your task is to determine how many times a particular string T appears as a sub-sequence of the given string S i.e number of sub-sequences of the string S forming the string T.

18 comments :

  1. char s[100000],sub[100000];
    scanf("%s,%s",s,sub);
    l1=strlen(s);
    l2=strlen(sub);
    int count=0;
    for(int i=0;i<l1-l2+1;i++){
    if(!strncmp(s+i,sub,l2)){
    count++;
    }
    }
    printf("%d",count);


    here i m counting the subsequence even if it collides with the previous subsequence.

    If we have to find seperate subsequence count only, then we can increase the value of i by l2 every time a subsequence is found.

    ReplyDelete
  2. @jainendra tarun ..... there is a difference b/w the sub-sequence and sub-string... a sub-sequence of a string is not necessarily continuous.....

    Anyways.... for the number of sub-string you have other optimized algo like KMP, rabin karp e.t.c

    ReplyDelete
  3. oh i had no idea about it...thanx a lot sir.

    ReplyDelete
  4. The idea is to split the problem into smaller ones, i.e. find all occurrences
    of all the substrings of our pattern. If the pattern is `abca', then we will
    find all the occurrences of `a'. `ab', `abc', `abca'.

    Imagine at a certain point inside the text-string, you read an `a'. What would
    you do?
    1 increment the counter associated with the string `a' by one.
    2 increment the coutner associated with the string `abca' by the value
    associated with the string `abc'.

    That's all.

    Regards,
    Matteo

    ReplyDelete
  5. @landimatte ... a pseudo code can explain ur approach better ... u r using the term substrings(remember it is subsequence)...
    Regards

    ReplyDelete
  6. Ops, I forgot to say that we count all the occurrences of all the /substrings/
    of our pattern.

    pattern: `abca'
    substrings: `a', `ab', `abc', `abca'

    Initialize all these counters to 0, and start to process the text-string
    `abbccca':

    - read `a' at position 0: counter[`a'] += 1
    - read `b' at position 1: counter[`b'] += 1; we need also to increment
    counter[`ab'] because the key ends with a `b'. The amount of the increment is
    equal to the counter value associated with the key (`ab') without the read
    char (`b'), i.e. counter[`a']. Then counter[`ab'] += counter[`a'] ->
    counter[`ab'] = 1. Imagine you have read 100 times an `a', and you read
    a `b': how many times could you create the sub-sequence `ab'? 100 times:
    first read `a' with `b', second read `a' with `b' and so forth.
    - read `b' at position 2: counter[`b'] += 1; counter[`ab'] += counter[`a']
    - read `c' at position 3: counter[`c'] += 1; counter[`abc'] += counter[`ab']
    ...

    Hope it's clear.

    However, here is a Python script that implements the algorithm:
    http://pastebin.com/875xYhqw

    Regards,
    Matteo

    ReplyDelete
  7. @landimatte.... a brilliant solution a nice code too !!!

    ReplyDelete
  8. pd[i][j] = number os subsequences of s[1,...,i] forming t[1,..,j]

    pd[i][j] = pd[i-1][j] + pd[i-1][j-1], s[i] == t[j]
    pd[i-1][j], otherwise

    ReplyDelete
  9. @imavalon.... u misinterpreted the question with lcs ... can u elaborate a little ..

    ReplyDelete
  10. pd[i][j] = number os subsequences of s[1,...,i] forming t[1,..,j]

    we have two alternatives: or s[i] = t[j] holds or it does not.
    if s[i] is different of t[j], then the number of subsequences of s[1,...,i] forming t[1,..,j] is the same as the number of subsequences of s[1,...,i-1] forming t[1,..,j] (pd[i-1][j]).
    if s[i] = t[j] then additionally we can append the element t[j] to each subsequence from s[1,...,i-1] that forms t[1,...,j-1] (pd[i-1][j-1]), obtaining new subsequences not yet counted. thus:
    pd[i][j] = pd[i-1][j] + pd[i-1][j-1], if s[i] == t[j]
    pd[i][j] = pd[i-1][j], otherwise

    ReplyDelete
  11. @imavalon .. correct approach .. landimatte has provided a more lucid explanation for the same thing and you have provided a more cleaner code .... Cheers !!!

    ReplyDelete
  12. I think imavalon your solution is wrong .. imagine the example ladimatte sent - count abca in abbccca; i am getting the following table:

    j 1 2 3 4
    ------------------
    1 | 1 0 0 0
    2 | 1 1 0 0
    3 | 1 1 0 0
    4 | 1 1 0 0
    5 | 1 1 0 0
    6 | 1 1 0 0
    7 | 1 1 0 0
    i

    ReplyDelete
  13. my fault, sorry :-) The table should be looking that way:

    j 1 2 3 4
    ------------------
    1 | 1 0 0 0
    2 | 1 1 0 0
    3 | 1 2 0 0
    4 | 1 2 2 0
    5 | 1 2 4 0
    6 | 1 2 6 0
    7 | 1 2 6 6
    i

    ReplyDelete
  14. And all these solutions can be optimized to use just O(t) memory, letting time at O(st)

    ReplyDelete
  15. can u explain ur approach ?

    Regards

    ReplyDelete
  16. its easy to implement it with O(t) memory:
    s = sequence (indexed from 1 to |s|)
    t = subsequence (indexed from 1 to |t|)

    int pd[|t|];
    pd[0] = 1;
    for(int i = 1; i <= |s|; i++){
    for(int j = min(i, |t|); j >= 1; j--){
    if(s[i] == t[j]){
    pd[j] = pd[j]+pd[j-1];
    }
    }
    }

    ReplyDelete