You are given a text string S, your task is to determine how many times a particular string T appears as a sub-sequence of the given string S i.e number of sub-sequences of the string S forming the string T.

Subscribe to:
Post Comments
(
Atom
)

char s[100000],sub[100000];

ReplyDeletescanf("%s,%s",s,sub);

l1=strlen(s);

l2=strlen(sub);

int count=0;

for(int i=0;i<l1-l2+1;i++){

if(!strncmp(s+i,sub,l2)){

count++;

}

}

printf("%d",count);

here i m counting the subsequence even if it collides with the previous subsequence.

If we have to find seperate subsequence count only, then we can increase the value of i by l2 every time a subsequence is found.

@jainendra tarun ..... there is a difference b/w the sub-sequence and sub-string... a sub-sequence of a string is not necessarily continuous.....

ReplyDeleteAnyways.... for the number of sub-string you have other optimized algo like KMP, rabin karp e.t.c

oh i had no idea about it...thanx a lot sir.

ReplyDeleteThe idea is to split the problem into smaller ones, i.e. find all occurrences

ReplyDeleteof all the substrings of our pattern. If the pattern is `abca', then we will

find all the occurrences of `a'. `ab', `abc', `abca'.

Imagine at a certain point inside the text-string, you read an `a'. What would

you do?

1 increment the counter associated with the string `a' by one.

2 increment the coutner associated with the string `abca' by the value

associated with the string `abc'.

That's all.

Regards,

Matteo

@landimatte ... a pseudo code can explain ur approach better ... u r using the term substrings(remember it is subsequence)...

ReplyDeleteRegards

Ops, I forgot to say that we count all the occurrences of all the /substrings/

ReplyDeleteof our pattern.

pattern: `abca'

substrings: `a', `ab', `abc', `abca'

Initialize all these counters to 0, and start to process the text-string

`abbccca':

- read `a' at position 0: counter[`a'] += 1

- read `b' at position 1: counter[`b'] += 1; we need also to increment

counter[`ab'] because the key ends with a `b'. The amount of the increment is

equal to the counter value associated with the key (`ab') without the read

char (`b'), i.e. counter[`a']. Then counter[`ab'] += counter[`a'] ->

counter[`ab'] = 1. Imagine you have read 100 times an `a', and you read

a `b': how many times could you create the sub-sequence `ab'? 100 times:

first read `a' with `b', second read `a' with `b' and so forth.

- read `b' at position 2: counter[`b'] += 1; counter[`ab'] += counter[`a']

- read `c' at position 3: counter[`c'] += 1; counter[`abc'] += counter[`ab']

...

Hope it's clear.

However, here is a Python script that implements the algorithm:

http://pastebin.com/875xYhqw

Regards,

Matteo

@landimatte.... a brilliant solution a nice code too !!!

ReplyDelete@aviral thanks!

ReplyDeleteRegards,

Matteo

pd[i][j] = number os subsequences of s[1,...,i] forming t[1,..,j]

ReplyDeletepd[i][j] = pd[i-1][j] + pd[i-1][j-1], s[i] == t[j]

pd[i-1][j], otherwise

@imavalon.... u misinterpreted the question with lcs ... can u elaborate a little ..

ReplyDeletepd[i][j] = number os subsequences of s[1,...,i] forming t[1,..,j]

ReplyDeletewe have two alternatives: or s[i] = t[j] holds or it does not.

if s[i] is different of t[j], then the number of subsequences of s[1,...,i] forming t[1,..,j] is the same as the number of subsequences of s[1,...,i-1] forming t[1,..,j] (pd[i-1][j]).

if s[i] = t[j] then additionally we can append the element t[j] to each subsequence from s[1,...,i-1] that forms t[1,...,j-1] (pd[i-1][j-1]), obtaining new subsequences not yet counted. thus:

pd[i][j] = pd[i-1][j] + pd[i-1][j-1], if s[i] == t[j]

pd[i][j] = pd[i-1][j], otherwise

what do you think?

ReplyDelete@imavalon .. correct approach .. landimatte has provided a more lucid explanation for the same thing and you have provided a more cleaner code .... Cheers !!!

ReplyDeleteI think imavalon your solution is wrong .. imagine the example ladimatte sent - count abca in abbccca; i am getting the following table:

ReplyDeletej 1 2 3 4

------------------

1 | 1 0 0 0

2 | 1 1 0 0

3 | 1 1 0 0

4 | 1 1 0 0

5 | 1 1 0 0

6 | 1 1 0 0

7 | 1 1 0 0

i

my fault, sorry :-) The table should be looking that way:

ReplyDeletej 1 2 3 4

------------------

1 | 1 0 0 0

2 | 1 1 0 0

3 | 1 2 0 0

4 | 1 2 2 0

5 | 1 2 4 0

6 | 1 2 6 0

7 | 1 2 6 6

i

And all these solutions can be optimized to use just O(t) memory, letting time at O(st)

ReplyDeletecan u explain ur approach ?

ReplyDeleteRegards

its easy to implement it with O(t) memory:

ReplyDeletes = sequence (indexed from 1 to |s|)

t = subsequence (indexed from 1 to |t|)

int pd[|t|];

pd[0] = 1;

for(int i = 1; i <= |s|; i++){

for(int j = min(i, |t|); j >= 1; j--){

if(s[i] == t[j]){

pd[j] = pd[j]+pd[j-1];

}

}

}