views:

186

answers:

5

Looking for an algorithm to find the length of longest sequence of blanks in a given string examining as few characters as possible?

Hint : Your program should become faster as the length of the sequence of blanks increases.

I know the solution which is O(n).. But looking for more optimal solution

+6  A: 

But in the worst case (when all characters are blank) you have to examine every character. So it can't be better than O(n) in complexity.

Rationale: assume the whole string is blank, you haven't examined N characters and your algorithms outputs n. Then if any non-examined character is not blank, your answer would be wrong. So for this particular input you have to examine the whole string.

Petar Minchev
(-1) That is wrong: you don't have to examine each character.
Pavel Shved
If the string contains only blank characters, please tell me how you will not examine every character.
Petar Minchev
With "at least" you meant "at most", I guess :)
Thomas
I am talking about complexity.
Petar Minchev
@Thomas, yes I will edit it.
Petar Minchev
@Pavel Shved: care to propose an algorithm that never examines each character? If not, please remove your vote.
IVlad
@IVlad, any algorithm doesn't need to examine last `X` characters if `str[length-X]` is non-blank, and it already found a sequence of `X` blanks long. It is not "never" but you really *have to* examine each character.
Pavel Shved
@Pavel Shved: yes, and I posted that optimization myself. However, there are inputs for which you DO need to examine each character, and this post talks about the worst case.
IVlad
@IVlad, please, check the revision history of the answer.
Pavel Shved
@Petar, there, I made your post deserving an upvote ;-)
Pavel Shved
@Pavel, Thanks:)
Petar Minchev
+7  A: 

You won't be able to find a solution which is a smaller complexity than O(n) because you need to pass through every character in the worst case with an input string that has at most 0 or 1 consecutive whitespace, or is completely whitespace.

You can do some optimizations though, but it'll still be considered O(n).

For example:

Let M be the current longest match so far as you go through your list. Also assume you can access input elements in O(1), for example you have an array as input.

When you see a non-whitespace you can skip M elements if the current + M is non whitespace. Surely no whitespace longer than M can be contained inside.

And when you see a whitepsace character, if current + M-1 is not whitespace you know you don't have the longest runs o you can skip in that case as well.

Brian R. Bondy
But if the character at *current* + *M* is a white space character, you’ll need to check the character from *current* to *current* + *M* anyway.
Gumbo
@Gumbo: You wouldn't do the skip in that case. This optimization is assuming you have constant access to any element. Like if your input is an array.
Brian R. Bondy
@Brian R. Bondy: Ah yes, you’re right.
Gumbo
@Gumbo: yes, but only in that case. The usual case is that _current + M_ is not a whitespace.
Vlad
and as you said... it doesn't change the complexity, just optimize it a little bit in several cases.
Dani
@Dani: yes but the OP asked to find a more optimal solution knowing the solution is O(n). So I think he knows it will stay O(n) already.
Brian R. Bondy
A: 

What ever you do, the worst case will always be o(n) - if those blanks are on the last part of the string... (or the last "checked" part of the string).

Dani
+1  A: 

The obvious idea: you can jump by K+1 places (where K is the current longest space sequence) and scan back if you found a space.

This way you have something about (n + n/M)/2 = n(M+1)/2M positions checked.

Edit:
Another idea would be to apply a kind of binary search. This is like follows: for a given k you make a procedure that checks whether there is a sequence of spaces with length >= k. This can be achieved in O(n/k) steps. Then, you try to find the maximal k with binary search.

Edit:
During the consequent searches, you can utilize the knowledge that the sequence of some length k already exist, and start skipping at k from the very beginning.

Vlad
and if it's on the last part of the string.. K will be 1 till then and it's n checks again.
Dani
@Dani: yes, the optimization is only in average.
Vlad
What's the point of doing binary search? That will make it `O(n log n)` won't it? As far as I can tell the procedure you're talking about isn't O(n / k), it's O(n).
IVlad
@IVlad: O(n) in average or in the worst case? :)
Vlad
@Vlad: it depends on the input and maybe the optimizations you use, but I would think it's O(n) on average and O(n / k) in the best case. I don't think binary search is an improvement really.
IVlad
@IVlad: I mean, average on different inputs. Of course, worst case requires looking through the whole sequence. But if I understand you correctly, you claim that the average case (average on all inputs, assuming that they are evenly distributed) is still O(n), so maybe you can somehow justify that claim?
Vlad
@Vlad: my point is that the algorithm to find a substring with k zeroes, combined with a binary search, gives you O(n log n) worst case time and O((n / k) log n) (or something like that) best case (and maybe average case - I can't prove that it's not n / k on average, maybe you can prove that it is?), which is still always worse than a linear algorithm with all the optimizations everyone proposed.
IVlad
+2  A: 

There's no way to make it faster than O(N) in the worst case. However, here are a few optimizations, assuming 0-based indexing.

  1. If you already have a complete sequence of L blanks (by complete I mean a sequence that is not a subsequence of a larger sequence), and L is at least as large as half the size of your string, you can stop.
  2. If you have a complete sequence of L blanks, once you hit a space at position i check if the character at position i + L is also a space. If it is, continue scanning from position i forwards as you might find a larger sequence - however, if you encounter a non-space until position i + L, then you can skip directly to i + L + 1. If it isn't a space, there's no way you can build a larger sequence starting at i, so scan forwards starting from i + L + 1.
  3. If you have a complete sequence of blanks of length L, and you are at position i and you have k positions left to examine, and k <= L, you can stop your search, as obviously there's no way you'll be able to find anything better anymore.

To prove that you can't make it faster than O(N), consider a string that contains no spaces. You will have to access each character once, so it's O(N). Same with a string that contains nothing but spaces.

IVlad