Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin.
Does anyone know of a good reference manual for PigLatin? I'm looking for something that includes all the syntax and commands descriptions for the language. Unfortunately the wiki page in Pig wiki is broken.
...
Hi,
This looks like homework stuff but please be assured that it isn't homework. Just an exercise in the book we use in our c++ course, I'm trying to read ahead on pointers..
The exercise in the book tells me to split a sentence into tokens and then convert each of them into pig latin then display them..
pig latin here is basically ...
Assume I have the following input in Pig:
some
And I would like to convert that into:
s
so
som
some
I've not (yet) found a way to iterate over a chararray in pig latin. I have found the TOKENIZE function but that splits on word boundries.
So can "pig latin" do this or is this something that requires a Java class to do that?
...
Hi,
I'm looking to take the short cut on formatting/style for pig latin (hadoop-ay).
Does anyone know where I can find a style guide?
-daniel
...
Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:
REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)
Is there also a library out there that w...
I have a User Defined Function (UDF) written in Java to parse lines in a log file and return information back to pig, so it can do all the processing.
It looks something like this:
public abstract class Foo extends EvalFunc<Tuple> {
public Foo() {
super();
}
public Tuple exec(Tuple input) throws IOException {
...
I have a Pig program where I am trying to compute the minimum center between two bags. In order for it to work, I found I need to COGROUP the bags into a single dataset. The entire operation takes a long time. I want to either open one of the bags from disk within the UDF, or to be able to pass another relation into the UDF without ne...
We are using Pig 0.6 to process some data. One of the columns of our data is a space-separated list of ids (such as: 35 521 225). We are trying to map one of those ids to another file that contains 2 columns of mappings like (so column 1 is our data, column 2 is a 3rd parties data):
35 6009
521 21599
225 51991
12 6129
We wrote a UD...
Hi
My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS.
I understand that-
Pig's language Pig Latin is a shift
from(suits the way programmers think)
SQL like declarative style of
programming and Hive's query language closely
...
I'm brand-spanking new to Java and I made this little translator for PigLatin.
package stringmanipulation;
public class PigLatinConverter {
public String Convert(String word){
int position = 0;
if (!IsVowel(word.charAt(0))) {
for (int i= 0; i < word.length(); i++) {
if (IsVowel(word.charA...
I'm trying to combine Hadoop, Pig and Cassandra to be able to work on data stored in Cassandra by means of simple Pig queries. Problem is I can't get Pig to create Map/Reduce jobs that actually work with the CassandraStorage.
What I did is I copied the storage-conf.xml file from one of my cluster machines on top of the one in contrib/pi...
I have the following scenario-
Pig version used 0.70
Sample HDFS directory structure:
/user/training/test/20100810/<data files>
/user/training/test/20100811/<data files>
/user/training/test/20100812/<data files>
/user/training/test/20100813/<data files>
/user/training/test/20100814/<data files>
As you can see in the paths listed abo...
Last night I was messing around with Piglatin using Arrays and found out I could not reverse the process. How would I shift the phrase and take out the Char's "a" and "y" at the end of the word and return the original word in the phrase.
For instance if I entered "piggy" it would come out as "iggypay" shifting the word piggy so "p" is a...
I'm working on a Pig script (my first) that loads a large text file. For each record in that text file, the content of one field needs to be sent off to a RESTful service for processing. Nothing needs to be evaluated or filtered. Capture data, send it off and the script doesn't need anything back.
I'm assuming that a UDF is required for...
I have a Pig script--currently running in local mode--that processes a huge file containing a list of categories:
/root/level1/level2/level3
/root/level1/level2/level3/level4
...
I need to insert each of these into an existing database by calling a stored procedure. Because I'm new to Pig and the UDF interface is a little daunting, I'...