views:

120

answers:

4

I want to have a structure token that has start/end pairs for position, sentence, and paragraph information. I also want the members to be accessible in two different ways: as a start/end pair and individually. Given:

struct token {
  struct start_end {
    int start;
    int end;
  };

  start_end pos;
  start_end sent;
  start_end para;

  typedef start_end token::*start_end_ptr;
};

I can write a function, say distance(), that computes the distance between any of the three start/end pairs like:

int distance( token const &i, token const &j, token::start_end_ptr mbr ) {
  return (j.*mbr).start - (i.*mbr).end;
}

and call it like:

  token i, j;
  int d = distance( i, j, &token::pos );

that will return the distance of the pos pair. But I can also pass &token::sent or &token::para and it does what I want. Hence, the function is flexible.

However, now I also want to write a function, say max(), that computes the maximum value of all the pos.start or all the pos.end or all the sent.start, etc.

If I add:

  typedef int token::start_end::*int_ptr;

I can write the function like:

int max( list<token> const &l, token::int_ptr p ) {
  int m = numeric_limits<int>::min();
  for ( list<token>::const_iterator i = l.begin(); i != l.end(); ++i ) {
    int n = (*i).pos.*p; // NOT WHAT I WANT: It hard-codes 'pos'
    if ( n > m )
      m = n;
  }
  return m;
}

and call it like:

  list<token> l;
  l.push_back( i );
  l.push_back( j );
  int m = max( l, &token::start_end::start );

However, as indicated in the comment above, I do not want to hard-code pos. I want the flexibility of accessible the start or end of any of pos, sent, or para that will be passed as a parameter to max().

I've tried several things to get this to work (tried using unions, anonymous unions, etc.) but I can't come up with a data structure that allows the flexibility both ways while having each value stored only once.

Any ideas how to organize the token struct so I can have what I want?


Attempt at clarification

Given struct of pairs of integers, I want to be able to "slice" the data in two distinct ways:

  1. By passing a pointer-to-member of a particular start/end pair so that the called function operates on any pair without knowing which pair. The caller decides which pair.
  2. By passing a pointer-to-member of a particular int (i.e., only one int of any pair) so that the called function operates on any int without knowing either which int or which pair said int is from. The caller decides which int of which pair.

Another example for the latter would be to sum, say, all para.end or all sent.start.

Also, and importantly: for #2 above, I'd ideally like to pass only a single pointer-to-member to reduce the burden on the caller. Hence, me trying to figure something out using unions.

For #2, the struct would be optimally laid out like:

struct token2 {
  int pos_start;
  int pos_end;
  int sent_start;
  int sent_end;
  int para_start;
  int para_end;
};

The trick is to have token and token2 overlaid somehow with a union, but it's not apparent if/how that can be done and yet satisfy the accessible requirements.

+2  A: 

Just a try.

int max( list<token> const &l,                                                  
         token::int_ptr p,                                                      
         token::start_end_ptr mbr ) {                                           
  int m = numeric_limits<int>::min();                                           
  for ( list<token>::const_iterator i = l.begin(); i != l.end(); ++i ) {        
    int n = ((*i).*mbr).*p;             
    if ( n > m )                                                                
      m = n;                                                                    
  }                                                                             
  return m;                                                                     
}                                
baol
Well, yes, this works, but, according to my clarification above, an initially unstated goal (sorry) was to be able to do this by passing only a single pointer-to-member.
Paul J. Lucas
This compiles fine on g++ 4.3.
baol
A: 

Take a look at the boost::bind or boost::lambda libraries. Or if you can use a compiler with C++0x support you might want to use some of the newer features instead of manually binding the member attributes. And then you can use the algorithms provided in the STL...

Anyway this can possibly do what you want (I did not even take time to try and compile it, so it might as well not compile):

int max( list<token> const &l, token::start_end_ptr m, token::int_ptr p ) {
  int m = numeric_limits<int>::min();
  for ( list<token>::const_iterator i = l.begin(); i != l.end(); ++i ) {
    int n = (*i).*m.*p;
    if ( n > m )
      m = n;
  }
  return m;
}
int main() {
   list<token> tks;
   int x = max( tks, &token::pos, &token::start_end::start );
}

Note that this is not the path to flexibility well understood: you are binding the algorithm to the types token, token::start_end and int...

C++0x:

list <token> tks;
int the_max = 0;
for_each( tks.begin(), tks.end(), 
      [&the_max]( token const & t ) { the_max = max( the_max, t.pos.start ); } );
David Rodríguez - dribeas
A: 
struct start_end {
    int x;
    int y;
};
struct pairs {
    struct start_end a;
    struct start_end b;
}

so is the idea is to slice the data to operate on X's or Y's dynamically?

 int distance(start_end m, start_end n, int member_offset){
     int val_a = *(&m + member_offset);
     int val_b = *(&n + member_offset);
     int distance = val_b - val_a; 
     return distance;
}
mikelong
Un-nesting the start_end struct doesn't change anything. Also, the use of the "struct" in "struct start_end a;" is unnecessary: this is C++, not C.
Paul J. Lucas
Paul J. Lucas
+1  A: 

I'm building upon the answer baol gave:

If we add a token_reference struct and some global (ick!) variables we can have this:

struct token_reference
{
    token::start_end_ptr start_end_ptr;
    token::int_ptr int_ptr;
};

token_reference pos_start =  { &token::pos, &token::start_end::start };
token_reference pos_end =    { &token::pos, &token::start_end::end };
token_reference sent_start = { &token::sent, &token::start_end::start };
token_reference sent_end =   { &token::sent, &token::start_end::end };
token_reference para_start = { &token::para, &token::start_end::start };
token_reference para_end =   { &token::para, &token::start_end::end };

int max( std::list<token> const &l, token_reference& ref ) {
    return max(l,ref.start_end_ptr,ref.int_ptr);
}

called like this:

tokenList aList;
int value = max(aList,pos_start);

you get a function taking a list and one more parameter.

quamrana
They don't have to be truly global: they can be static data members of token. That aside, the solution isn't bad.
Paul J. Lucas
But: is there no clever way to do this with unions (anonymous or not) inside the token struct?
Paul J. Lucas