piglatin

Reference manual for Apache Pig Latin

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. Does anyone know of a good reference manual for PigLatin? I'm looking for something that includes all the syntax and commands descriptions for the language. Unfortunately the wiki page in Pig wiki is broken. ...

tokenizing and converting to pig latin

Hi, This looks like homework stuff but please be assured that it isn't homework. Just an exercise in the book we use in our c++ course, I'm trying to read ahead on pointers.. The exercise in the book tells me to split a sentence into tokens and then convert each of them into pig latin then display them.. pig latin here is basically ...

Splitting input into substrings in PIG (Hadoop)

Assume I have the following input in Pig: some And I would like to convert that into: s so som some I've not (yet) found a way to iterate over a chararray in pig latin. I have found the TOKENIZE function but that splits on word boundries. So can "pig latin" do this or is this something that requires a Java class to do that? ...

Hadoop pig latin style guide?

Hi, I'm looking to take the short cut on formatting/style for pig latin (hadoop-ay). Does anyone know where I can find a style guide? -daniel ...

Storing data to SequenceFile from Apache Pig

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader(); log = LOAD '/data/logs' USING SequenceFileLoader AS (...) Is there also a library out there that w...

Does throwing an exception in an EvalFunc pig UDF skip just that line, or stop completely?

I have a User Defined Function (UDF) written in Java to parse lines in a log file and return information back to pig, so it can do all the processing. It looks something like this: public abstract class Foo extends EvalFunc<Tuple> { public Foo() { super(); } public Tuple exec(Tuple input) throws IOException { ...

How can I load a file into a DataBag from within a Yahoo PigLatin UDF?

I have a Pig program where I am trying to compute the minimum center between two bags. In order for it to work, I found I need to COGROUP the bags into a single dataset. The entire operation takes a long time. I want to either open one of the bags from disk within the UDF, or to be able to pass another relation into the UDF without ne...

Pass a relation to a PIG UDF when using FOREACH on another relation?

We are using Pig 0.6 to process some data. One of the columns of our data is a space-separated list of ids (such as: 35 521 225). We are trying to map one of those ids to another file that contains 2 columns of mappings like (so column 1 is our data, column 2 is a 3rd parties data): 35 6009 521 21599 225 51991 12 6129 We wrote a UD...

Difference between Pig and Hive? Why have both?

Hi My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS. I understand that- Pig's language Pig Latin is a shift from(suits the way programmers think) SQL like declarative style of programming and Hive's query language closely ...

Can I improve this Pig-Latin converter?

I'm brand-spanking new to Java and I made this little translator for PigLatin. package stringmanipulation; public class PigLatinConverter { public String Convert(String word){ int position = 0; if (!IsVowel(word.charAt(0))) { for (int i= 0; i < word.length(); i++) { if (IsVowel(word.charA...

Bundling jars when submitting map/reduce jobs via Pig?

I'm trying to combine Hadoop, Pig and Cassandra to be able to work on data stored in Cassandra by means of simple Pig queries. Problem is I can't get Pig to create Map/Reduce jobs that actually work with the CassandraStorage. What I did is I copied the storage-conf.xml file from one of my cluster machines on top of the one in contrib/pi...

Pig Latin: Load multiple files from a date range (part of the directory structure)

I have the following scenario- Pig version used 0.70 Sample HDFS directory structure: /user/training/test/20100810/<data files> /user/training/test/20100811/<data files> /user/training/test/20100812/<data files> /user/training/test/20100813/<data files> /user/training/test/20100814/<data files> As you can see in the paths listed abo...

Piglatin using Arrays

Last night I was messing around with Piglatin using Arrays and found out I could not reverse the process. How would I shift the phrase and take out the Char's "a" and "y" at the end of the word and return the original word in the phrase. For instance if I entered "piggy" it would come out as "iggypay" shifting the word piggy so "p" is a...

Call RESTful service in Pig script

I'm working on a Pig script (my first) that loads a large text file. For each record in that text file, the content of one field needs to be sent off to a RESTful service for processing. Nothing needs to be evaluated or filtered. Capture data, send it off and the script doesn't need anything back. I'm assuming that a UDF is required for...

Pig's Stream Through PHP

I have a Pig script--currently running in local mode--that processes a huge file containing a list of categories: /root/level1/level2/level3 /root/level1/level2/level3/level4 ... I need to insert each of these into an existing database by calling a stored procedure. Because I'm new to Pig and the UDF interface is a little daunting, I'...