tags:

views:

121

answers:

4

Hi I am trying to import this:

http://en.wikipedia.org/wiki/List_of_countries_by_continent_%28data_file%29

which is of the format like:

AS AF AFG 004 Afghanistan, Islamic Republic of
EU AX ALA 248 Åland Islands
EU AL ALB 008 Albania, Republic of
AF DZ DZA 012 Algeria, People's Democratic Republic of
OC AS ASM 016 American Samoa
EU AD AND 020 Andorra, Principality of
AF AO AGO 024 Angola, Republic of
NA AI AIA 660 Anguilla

if i do

<? explode(" ",$data"); ?>

that works fine apart from countries with more than 1 word.

how can i split it so i get the first 4 bits of data (the chars/ints) and the 5th bit of data being whatever remains?

this is in php

thank you

+11  A: 

The explode function takes an optional limit parameter. Change your function call to:

<?php explode(" ", $data, 5); ?>

and you will get the country name as the last element in the array, containing spaces.

scompt.com
You beat me by four seconds…I think you deserve to get the top answer, not me.
htw
+1 but don't use short open tags
kemp
this won't work correctly if country is Czech Republic
nik
@kemp: Good call, I just copied the OP's code. I've changed it in my answer.@nik: why not?
scompt.com
@scompt : some countries have space in between so data[4] will give Check instead of Check Republic. Wat say
nik
@nik: That's the whole point of using the `limit` parameter. As long as there are 4 pieces of data before the country name, passing `5` to `explode` will stick "Czech Republic" in `data[4]`.
scompt.com
whoops! I thought there was description after countryname. totally misunderstood this
nik
A: 

You can use preg_match and your text will be in $match[5];

<?php
$str = 'AS AF AFG 004 Afghanistan, Islamic Republic of';
$chars = preg_match('/([A-Z]*)\ ([A-Z]*)\ ([A-Z]*)\ ([0-9]*)\ (.*)\ /', $str, $match);
print_r($match);
?>
falkon
There is no need to escape the space.
Gumbo
Also in this pattern everything other than spaces is optional (it matches a string with just 5 spaces too). It might not be an issue in this case, but being as specific as possible helps avoiding unexpected results.
kemp
+3  A: 

Using unpack:

$format = "A2cont/x/A2alpha2/x/A3alpha3/x/A3num/x/a*eng";
$line = "AS AF AFG 004 Afghanistan, Islamic Republic of";
$ar = unpack($format, $line);

It produces:

array (
  'cont' => 'AS',
  'alpha2' => 'AF',
  'alpha3' => 'AFG',
  'num' => '004',
  'eng' => 'Afghanistan, Islamic Republic of',
)

This has the advantage of producing an associative array (note the text before the slashes), and warning if the input is invalid.

Matthew Flaschen
+1 for showing me the use of the unpack function :)
Max
A: 

Maybe sscanf can also do what you need:

<?php
// in my example I loaded the data in an array line by line
$lines = file('sscanf_data.txt');

foreach($lines as $line) {
    $data = array();
    // define the format of the input string, assign the 
    // extracted data to an associative array
    sscanf($line, "%s %s %s %s %[^.]", 
        $data['col_1'], 
        $data['col_2'], 
        $data['col_3'], 
        $data['col_4'], 
        $data['col_5']);

    // dump array contents
    print_r($data);
}

Output:

Array
(
    [col_1] => AS
    [col_2] => AF
    [col_3] => AFG
    [col_4] => 004
    [col_5] => Afghanistan, Islamic Republic of

)
...

The good thing is that if you store the data in an associative array you already have field-value pairs for inserting them in the DB.

Max