Given a dictionary, on which the below operations are required

1. Update the dictionary

2. Find a word in a dictionary.

What are the possible options?

One of the most intuitive approach is to store the strings in a sorted manner.

Lets have a look at the complexity of the operations

If average length of the string in the dictionary is m and the number of entries in dictionary is n,

To build the sorted set of entries is O(m * n * log n)

and the complexity to find a word will be O(m * logn)

For large number of queries the algorithm is pathetically slow.

How can we improve this?

Can we have a deterministic algorithm, with the maximum complexity being dependent on the number of characters in the word and not on the number of entries.

Lets try to visualize how we search in the dictionary for a particular word.

Assuming the word to searched is 'program'

We first search if the word exist p, then inside p if r exists and so on...

So irrespective of the thickness of the dictionary we just searched based on some predefined marking of the characters which helps to search in the order of the length of the string.

Can we use the same concept in the programming?

If we build a tree of which each node has 26 child, each representing one of the characters, then we can uniquely identify each of the string starting from the root and following each of the characters.

The below figure speaks more than the thousand words,

Dont fall into the trap that only leaf nodes are the terminal nodes, infact each node can act as a terminal node.

So essentially the complexity to add a word or to query a word takes O(m), where m is the length of the string and is independent on the number of entries in the set!

1. Update the dictionary

2. Find a word in a dictionary.

What are the possible options?

One of the most intuitive approach is to store the strings in a sorted manner.

Lets have a look at the complexity of the operations

If average length of the string in the dictionary is m and the number of entries in dictionary is n,

To build the sorted set of entries is O(m * n * log n)

and the complexity to find a word will be O(m * logn)

For large number of queries the algorithm is pathetically slow.

How can we improve this?

Can we have a deterministic algorithm, with the maximum complexity being dependent on the number of characters in the word and not on the number of entries.

Lets try to visualize how we search in the dictionary for a particular word.

Assuming the word to searched is 'program'

We first search if the word exist p, then inside p if r exists and so on...

So irrespective of the thickness of the dictionary we just searched based on some predefined marking of the characters which helps to search in the order of the length of the string.

Can we use the same concept in the programming?

If we build a tree of which each node has 26 child, each representing one of the characters, then we can uniquely identify each of the string starting from the root and following each of the characters.

The below figure speaks more than the thousand words,

Dont fall into the trap that only leaf nodes are the terminal nodes, infact each node can act as a terminal node.

So essentially the complexity to add a word or to query a word takes O(m), where m is the length of the string and is independent on the number of entries in the set!

## No comments :

## Post a Comment