tags:

views:

161

answers:

4

Hello,

This is a bit of a daft question, but out of curiousity would it be possibly to split a string on comma, perform a function on the string and then rejoin it on comma in one statement with C++?

This is what I have so far:

string dostuff(const string& a) {
  return string("Foo");
}

int main() {
  string s("a,b,c,d,e,f");

  vector<string> foobar(100);
  transform(boost::make_token_iterator<string>(s.begin(), s.end(), boost::char_separator<char>(",")),
            boost::make_token_iterator<string>(s.end(), s.end(), boost::char_separator<char>(",")),
            foobar.begin(),
            boost::bind(&dostuff, _1));
  string result = boost::algorithm::join(foobar, ",");
}

So this would result in turning "a,b,c,d,e,f" into "Foo,Foo,Foo,Foo,Foo,Foo"

I realise this is OTT, but was just looking to expand my boost wizardry.

+1  A: 

First, note that your program writes "Foo,Foo,Foo,Foo,Foo,Foo,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,," to your result string -- as already mentioned in comments, you wanted to use back_inserter there.

As for the answer, whenever there's a single value resulting from a range, I look at std::accumulate (since that is the C++ version of fold/reduce)

#include <string>
#include <iostream>
#include <numeric>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/bind.hpp>
std::string dostuff(const std::string& a) {
  return std::string("Foo");
}
int main() {
  std::string s("a,b,c,d,e,f");
  std::string result =
    accumulate(
     ++boost::make_token_iterator<std::string>(s.begin(), s.end(), boost::char_separator<char>(",")),
       boost::make_token_iterator<std::string>(s.end(), s.end(), boost::char_separator<char>(",")),
       dostuff(*boost::make_token_iterator<std::string>(s.begin(), s.end(), boost::char_separator<char>(","))),
       boost::bind(std::plus<std::string>(), _1,
         bind(std::plus<std::string>(), ",",
            bind(dostuff, _2)))); // or lambda, for slightly better readability
  std::cout << result << '\n';
}

Except now it's way over the top and repeats make_token_iterator twice. I guess boost.range wins.

Cubbi
That produces `foo,foo,foo,foo,foo,foo,` rather than `foo,foo,foo,foo,foo,foo` (Note the extra comma on the end)
Billy ONeal
@Billy ONeal: Oh.. true. I guess it's up to boost.range adepts to answer with a one-liner.
Cubbi
Fixed, but made utterly unreadable.
Cubbi
+1  A: 
void dostuff(string& a) {
    a = "Foo";
}

int main()
{
    string s("a,b,c,d,e,f");
    vector<string> tmp;
    s = boost::join(
          (
            boost::for_each(
              boost::split(tmp, s, boost::is_any_of(",")),
              dostuff
            ),
            tmp
          ),
          ","
        );

    return 0;
}

Unfortunately I can't eliminate mentioning tmp twice. Maybe I'll think of something later.

ybungalobill
I took the liberty to reformat the code, and since I now understand what happens, I was wondering if this really works. What bothers me is the first argument of `join`: you not only use `tmp` twice, you also use it on both side of a comma. I know it's modified on only one side but was wondering if a bug could result from the fact that a comma does not define a sequence point.
Matthieu M.
It's a comma operator, so it **does** define a sequence point.
ybungalobill
@ybungalobill: my mistake then! Still I am peeved that `boost::split` forces us to actually mention `tmp` twice, but yours is the most readable answer with "approved" libraries :)
Matthieu M.
Using a comma operator is kind-of cheating. Yes, this is technically still only one statement, but it acts as two statements
Chris Dodd
@Matthieu M.: It's not the `split`. The problem is that `for_each` should return the input range, then we could rewrite it with 'vector<string>()' as a temporary. But guess, what does `for_each` return? Can't guess? The predicate!
ybungalobill
@ybungalobill: I guess this was done in name of "compatibility" with `std::foreach`. I must have been quite tired yesterday ....
Matthieu M.
A: 

Okay, I guess it's possible, but please please don't really do this in production code.

Much better would be something like

std::string MakeCommaEdFoo(std::string input)
{
    std::size_t commas = std::count_if(input.begin(), input.end(),
        std::bind2nd(std::equal_to<char>(), ','));
    std::string output("foo");
    output.reserve((commas+1)*4-1);
    for(std::size_t idx = 1; idx < commas; ++idx)
        output.append(",foo");
    return output;
}

Not only will it perform better, it will is much easier for the next guy to read and understand.

Billy ONeal
That's what I think when I see scripting languages.
ybungalobill
@Billy: are you sure that actually counting the commas first is more efficient ? In the general case (of actually applying a predicate to the isolated string) I am afraid it's more of a hinder.
Matthieu M.
@Matthieu: Actually it should make no difference. It will perform better because A. you don't incur construction of token iterators, and B. (More importantly) you're only working with one buffer at a time. That will result in less cache pressure because only one of the two strings (the source and the one under construction) need be "near the processor" at any one point. It's also better than adding the comma anyway for the last item and removing it later (for reasons that should be obvious)
Billy ONeal
@Billy: you cache pressure assume that you're not actually reading the input string. Even the question is simplistic, the OP's predicate take a string in input.
Matthieu M.
+1  A: 

I am actually working on a library to allow writing code in a more readable fashion than iterators alone... don't know if I'll ever finish the project though, seems dead projects tend to accumulate on my computer...

Anyway the main reproach I have here is obviously the use of iterators. I tend to think of iterators as low-level implementation details, when coding you rarely want to use them at all.

So, let's assume that we have a proper library:

struct DoStuff { std::string operator()(std::string const&); };

int main(int argc, char* argv[])
{
  std::string const reference = "a,b,c,d,e,f";

  std::string const result = boost::join(
    view::transform(
      view::split(reference, ","),
      DoStuff()
    ),
    ","
  );
}

The idea of a view is to be a lightwrapper around another container:

  • from the user point of view it behaves like a container (minus the operations that actually modify the container structure)
  • from the implementation point of view, it's a lightweight object, containing as few data as possible --> the value is ephemeral here, and only lives as long as the iterator lives.

I already have the transform part working, I am wondering how the split could work (generally), but I think I'll get into it ;)

Matthieu M.
Sounds interesting, if it doesn't wind up dead could you notify me?
Georg Fritzsche
@Georg: I note down your name. Perhaps we could also get James McNellis to share his iterators collection ;)
Matthieu M.