tags:

views:

248

answers:

3

Hello everyone :)

I've got a bit of a problem. Essentially, I need to store a large list of whitelisted entries inside my program, and I'd like to include such a list directly -- I don't want to have to distribute other libraries and such, and I don't want to embed the strings into a Win32 resource, for a bunch of reasons I don't want to go into right now.

I simply included my big whitelist in my .cpp file, and was presented with this error:

1>ServicesWhitelist.cpp(2807): fatal error C1091: compiler limit: string exceeds 65535 bytes in length

The string itself is about twice this allowed limit by VC++. What's the best way to include such a large literal in a program?

EDIT:

I'm storing the string like this:

const std::wstring servicesWhitelist
(
 L".NETFRAMEWORK|"
 L"_IOMEGA_ACTIVE_DISK_SERVICE_|"
 L"{6080A529-897E-4629-A488-ABA0C29B635E}|"
 L"{834170A7-AF3B-4D34-A757-E05EB29EE96D}|"
 L"{85CCB53B-23D8-4E73-B1B7-9DDB71827D9B}|"
 L"{95808DC4-FA4A-4C74-92FE-5B863F82066B}|"
 L"{A7447300-8075-4B0D-83F1-3D75C8EBC623}|"
 L"{D31A0762-0CEB-444E-ACFF-B049A1F6FE91}|"
 L"{E2B953A6-195A-44F9-9BA3-3D5F4E32BB55}|"
 L"{EDA5F5D3-9E0F-4F4D-8A13-1D1CF469C9CC}|"
 L"2WIREPCP|"
//About 3800 more lines
);

EDIT2 It's used at runtime in a way similar to this:

static const boost::wregex servicesWhitelistRegex(servicesWhitelist);
std::wstring service;
//code to populate service
if (!boost::regex_match(service, servicesWhitelistRegex))
 //Do something to print service
+2  A: 

If it's only about twice the limit the obvious solution would seem to be to store 2 (or 3) such strings. :) I'm sure your code that reads them at runtime can deal with that easily enough.

EDIT: Do you need to use a regex for some reason? Could you break up the big strings into a list of individual tokens and do a simple string comparison?

Evgeny
+5  A: 

How about an array? (you would put the commas only after the legal limit for every element)

const std::wstring servicesWhitelist[] = {
 L".NETFRAMEWORK|",
 L"_IOMEGA_ACTIVE_DISK_SERVICE_|",
 L"{6080A529-897E-4629-A488-ABA0C29B635E}|",
 L"{834170A7-AF3B-4D34-A757-E05EB29EE96D}|",
 L"{85CCB53B-23D8-4E73-B1B7-9DDB71827D9B}|",
 L"{95808DC4-FA4A-4C74-92FE-5B863F82066B}|",
 L"{A7447300-8075-4B0D-83F1-3D75C8EBC623}|",
 L"{D31A0762-0CEB-444E-ACFF-B049A1F6FE91}|",
 L"{E2B953A6-195A-44F9-9BA3-3D5F4E32BB55}|",
 L"{EDA5F5D3-9E0F-4F4D-8A13-1D1CF469C9CC}|",
 L"2WIREPCP|",
...
};

You could use the below statement to get the combined string.

accumulate(servicesWhitelist, servicesWhitelist+sizeof(servicesWhitelist)/sizeof(servicesWhitelist[0]), "")
Sameer
A: 

You problem could be stripped down to (in Python):

whitelist_services = { ".NETFRAMEWORK", "_IOMEGA_ACTIVE_DISK_SERVICE_" }
if service in whitelist_services:
   print service, "is a whitelisted service"

A direct translation to C++ would be:

// g++ *.cc -std=c++0x && ./a.out
#include <iostream>
#include <unordered_set>

namespace {
  typedef const wchar_t* str_t;
  // or
  ////typedef std::wstring str_t;
  str_t servicesWhitelist[] = {
    L".NETFRAMEWORK",
    L"_IOMEGA_ACTIVE_DISK_SERVICE_",
  };
  const size_t N = sizeof(servicesWhitelist) / sizeof(*servicesWhitelist);

  // if you need to search for multiple services then a hash table
  // could speed searches up O(1). Otherwise std::find() on the array
  // might be sufficient O(N), or std::binary_search() on sorted array
  // O(log N) 
  const std::unordered_set<str_t> services
    (servicesWhitelist, servicesWhitelist + N);
}

int main() {
  str_t service = L".NETFRAMEWORK";
  if (services.find(service) != services.end())
    std::wcout << service << " is a whitelisted service" << std::endl;
}
J.F. Sebastian
1. That's nice for Python, but Python is not my target language. Sorry. 2. This seems to be a copy of Sameer's answer....
Billy ONeal
@Billy ONeal: 1. I've used Python as a pseudo-code (as a succinct illustration that shows you don't need regexs to solve your problem) 2. The essence of the answer is to drop regex and use one of the shown approaches. Sameer's answer is in the regex's root.
J.F. Sebastian