Algorithm to find lenth of longest sequence of blanks in a given string

+6 A:

But in the worst case (when all characters are blank) you have to examine every character. So it can't be better than O(n) in complexity.

Rationale: assume the whole string is blank, you haven't examined N characters and your algorithms outputs n. Then if any non-examined character is not blank, your answer would be wrong. So for this particular input you have to examine the whole string.

Petar Minchev 2010-04-11 12:09:44

(-1) That is wrong: you don't have to examine each character.

Pavel Shved 2010-04-11 12:21:24

If the string contains only blank characters, please tell me how you will not examine every character.

Petar Minchev 2010-04-11 12:27:23

With "at least" you meant "at most", I guess :)

Thomas 2010-04-11 12:27:33

I am talking about complexity.

Petar Minchev 2010-04-11 12:27:47

@Thomas, yes I will edit it.

Petar Minchev 2010-04-11 12:28:09

@Pavel Shved: care to propose an algorithm that never examines each character? If not, please remove your vote.

IVlad 2010-04-11 12:28:57

@IVlad, any algorithm doesn't need to examine last `X` characters if `str[length-X]` is non-blank, and it already found a sequence of `X` blanks long. It is not "never" but you really *have to* examine each character.

Pavel Shved 2010-04-11 13:06:33

@Pavel Shved: yes, and I posted that optimization myself. However, there are inputs for which you DO need to examine each character, and this post talks about the worst case.

IVlad 2010-04-11 13:08:34

@IVlad, please, check the revision history of the answer.

Pavel Shved 2010-04-11 13:15:44

@Petar, there, I made your post deserving an upvote ;-)

Pavel Shved 2010-04-11 13:19:40

@Pavel, Thanks:)

Petar Minchev 2010-04-11 13:20:40

+7 A:

You won't be able to find a solution which is a smaller complexity than O(n) because you need to pass through every character in the worst case with an input string that has at most 0 or 1 consecutive whitespace, or is completely whitespace.

You can do some optimizations though, but it'll still be considered O(n).

For example:

Let M be the current longest match so far as you go through your list. Also assume you can access input elements in O(1), for example you have an array as input.

When you see a non-whitespace you can skip M elements if the current + M is non whitespace. Surely no whitespace longer than M can be contained inside.

And when you see a whitepsace character, if current + M-1 is not whitespace you know you don't have the longest runs o you can skip in that case as well.

Brian R. Bondy 2010-04-11 12:11:10

But if the character at *current* + *M* is a white space character, you’ll need to check the character from *current* to *current* + *M* anyway.

Gumbo 2010-04-11 12:13:46

@Gumbo: You wouldn't do the skip in that case. This optimization is assuming you have constant access to any element. Like if your input is an array.

Brian R. Bondy 2010-04-11 12:15:28

@Brian R. Bondy: Ah yes, you’re right.

Gumbo 2010-04-11 12:16:33

@Gumbo: yes, but only in that case. The usual case is that _current + M_ is not a whitespace.

Vlad 2010-04-11 12:17:58

and as you said... it doesn't change the complexity, just optimize it a little bit in several cases.

Dani 2010-04-11 12:18:18

@Dani: yes but the OP asked to find a more optimal solution knowing the solution is O(n). So I think he knows it will stay O(n) already.

Brian R. Bondy 2010-04-11 12:22:26

A:

What ever you do, the worst case will always be o(n) - if those blanks are on the last part of the string... (or the last "checked" part of the string).

Dani 2010-04-11 12:13:43

+1 A:

The obvious idea: you can jump by K+1 places (where K is the current longest space sequence) and scan back if you found a space.

This way you have something about (n + n/M)/2 = n(M+1)/2M positions checked.

Edit:
Another idea would be to apply a kind of binary search. This is like follows: for a given k you make a procedure that checks whether there is a sequence of spaces with length >= k. This can be achieved in O(n/k) steps. Then, you try to find the maximal k with binary search.

Edit:
During the consequent searches, you can utilize the knowledge that the sequence of some length k already exist, and start skipping at k from the very beginning.

Vlad 2010-04-11 12:13:50

and if it's on the last part of the string.. K will be 1 till then and it's n checks again.

Dani 2010-04-11 12:17:19

@Dani: yes, the optimization is only in average.

Vlad 2010-04-11 12:21:49

What's the point of doing binary search? That will make it `O(n log n)` won't it? As far as I can tell the procedure you're talking about isn't O(n / k), it's O(n).

IVlad 2010-04-11 12:31:59

@IVlad: O(n) in average or in the worst case? :)

Vlad 2010-04-11 12:33:18

@Vlad: it depends on the input and maybe the optimizations you use, but I would think it's O(n) on average and O(n / k) in the best case. I don't think binary search is an improvement really.

IVlad 2010-04-11 12:36:17

@IVlad: I mean, average on different inputs. Of course, worst case requires looking through the whole sequence. But if I understand you correctly, you claim that the average case (average on all inputs, assuming that they are evenly distributed) is still O(n), so maybe you can somehow justify that claim?

Vlad 2010-04-11 12:50:59

@Vlad: my point is that the algorithm to find a substring with k zeroes, combined with a binary search, gives you O(n log n) worst case time and O((n / k) log n) (or something like that) best case (and maybe average case - I can't prove that it's not n / k on average, maybe you can prove that it is?), which is still always worse than a linear algorithm with all the optimizations everyone proposed.

IVlad 2010-04-11 12:58:19

+2 A:

There's no way to make it faster than O(N) in the worst case. However, here are a few optimizations, assuming 0-based indexing.

If you already have a complete sequence of L blanks (by complete I mean a sequence that is not a subsequence of a larger sequence), and L is at least as large as half the size of your string, you can stop.
If you have a complete sequence of L blanks, once you hit a space at position i check if the character at position i + L is also a space. If it is, continue scanning from position i forwards as you might find a larger sequence - however, if you encounter a non-space until position i + L, then you can skip directly to i + L + 1. If it isn't a space, there's no way you can build a larger sequence starting at i, so scan forwards starting from i + L + 1.
If you have a complete sequence of blanks of length L, and you are at position i and you have k positions left to examine, and k <= L, you can stop your search, as obviously there's no way you'll be able to find anything better anymore.

To prove that you can't make it faster than O(N), consider a string that contains no spaces. You will have to access each character once, so it's O(N). Same with a string that contains nothing but spaces.

IVlad 2010-04-11 12:26:36

ansaurus

tags:

views:

answers:

Algorithm to find lenth of longest sequence of blanks in a given string

related questions