Saturday, December 18, 2010

Subsequence again !!!

You are given a text string S, your task is to determine how many times a particular string T appears as a sub-sequence of the given string S i.e number of sub-sequences of the string S forming the string T.

1. char s,sub;
scanf("%s,%s",s,sub);
l1=strlen(s);
l2=strlen(sub);
int count=0;
for(int i=0;i<l1-l2+1;i++){
if(!strncmp(s+i,sub,l2)){
count++;
}
}
printf("%d",count);

here i m counting the subsequence even if it collides with the previous subsequence.

If we have to find seperate subsequence count only, then we can increase the value of i by l2 every time a subsequence is found.

2. @jainendra tarun ..... there is a difference b/w the sub-sequence and sub-string... a sub-sequence of a string is not necessarily continuous.....

Anyways.... for the number of sub-string you have other optimized algo like KMP, rabin karp e.t.c

3. 4. The idea is to split the problem into smaller ones, i.e. find all occurrences
of all the substrings of our pattern. If the pattern is `abca', then we will
find all the occurrences of `a'. `ab', `abc', `abca'.

Imagine at a certain point inside the text-string, you read an `a'. What would
you do?
1 increment the counter associated with the string `a' by one.
2 increment the coutner associated with the string `abca' by the value
associated with the string `abc'.

That's all.

Regards,
Matteo

5. @landimatte ... a pseudo code can explain ur approach better ... u r using the term substrings(remember it is subsequence)...
Regards

6. Ops, I forgot to say that we count all the occurrences of all the /substrings/
of our pattern.

pattern: `abca'
substrings: `a', `ab', `abc', `abca'

Initialize all these counters to 0, and start to process the text-string
`abbccca':

- read `a' at position 0: counter[`a'] += 1
- read `b' at position 1: counter[`b'] += 1; we need also to increment
counter[`ab'] because the key ends with a `b'. The amount of the increment is
equal to the counter value associated with the key (`ab') without the read
char (`b'), i.e. counter[`a']. Then counter[`ab'] += counter[`a'] ->
counter[`ab'] = 1. Imagine you have read 100 times an `a', and you read
a `b': how many times could you create the sub-sequence `ab'? 100 times:
first read `a' with `b', second read `a' with `b' and so forth.
- read `b' at position 2: counter[`b'] += 1; counter[`ab'] += counter[`a']
- read `c' at position 3: counter[`c'] += 1; counter[`abc'] += counter[`ab']
...

Hope it's clear.

However, here is a Python script that implements the algorithm:
http://pastebin.com/875xYhqw

Regards,
Matteo

7. @landimatte.... a brilliant solution a nice code too !!!

8. @aviral thanks!

Regards,
Matteo

9. pd[i][j] = number os subsequences of s[1,...,i] forming t[1,..,j]

pd[i][j] = pd[i-1][j] + pd[i-1][j-1], s[i] == t[j]
pd[i-1][j], otherwise

10. @imavalon.... u misinterpreted the question with lcs ... can u elaborate a little ..

11. pd[i][j] = number os subsequences of s[1,...,i] forming t[1,..,j]

we have two alternatives: or s[i] = t[j] holds or it does not.
if s[i] is different of t[j], then the number of subsequences of s[1,...,i] forming t[1,..,j] is the same as the number of subsequences of s[1,...,i-1] forming t[1,..,j] (pd[i-1][j]).
if s[i] = t[j] then additionally we can append the element t[j] to each subsequence from s[1,...,i-1] that forms t[1,...,j-1] (pd[i-1][j-1]), obtaining new subsequences not yet counted. thus:
pd[i][j] = pd[i-1][j] + pd[i-1][j-1], if s[i] == t[j]
pd[i][j] = pd[i-1][j], otherwise

12. what do you think?

13. @imavalon .. correct approach .. landimatte has provided a more lucid explanation for the same thing and you have provided a more cleaner code .... Cheers !!!

14. I think imavalon your solution is wrong .. imagine the example ladimatte sent - count abca in abbccca; i am getting the following table:

j 1 2 3 4
------------------
1 | 1 0 0 0
2 | 1 1 0 0
3 | 1 1 0 0
4 | 1 1 0 0
5 | 1 1 0 0
6 | 1 1 0 0
7 | 1 1 0 0
i

15. my fault, sorry :-) The table should be looking that way:

j 1 2 3 4
------------------
1 | 1 0 0 0
2 | 1 1 0 0
3 | 1 2 0 0
4 | 1 2 2 0
5 | 1 2 4 0
6 | 1 2 6 0
7 | 1 2 6 6
i

16. And all these solutions can be optimized to use just O(t) memory, letting time at O(st)

17. can u explain ur approach ?

Regards

18. its easy to implement it with O(t) memory:
s = sequence (indexed from 1 to |s|)
t = subsequence (indexed from 1 to |t|)

int pd[|t|];
pd = 1;
for(int i = 1; i <= |s|; i++){
for(int j = min(i, |t|); j >= 1; j--){
if(s[i] == t[j]){
pd[j] = pd[j]+pd[j-1];
}
}
}