views:

45

answers:

2

Here is my code.

import urllib2
import urllib
import json
from BeautifulSoup import BeautifulSoup

class parser:
    """
    This class uses the Beautiful Soup library to scrape the information from
    the HTML source code from Google Translate.

    It also offers a way to consume the AJAX result of the translation, however
    encoding on Windows won't work well right now so it's recommended to use
    the scraping method.
    """

    def fromHtml(self, text, languageFrom, languageTo):
        """
        Returns translated text that is scraped from Google Translate's HTML
        source code.
        """
        langCode={
            "arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN",
            "croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl",
            "english":"en", "finnish":"fi", "french":"fr", "german":"de",
            "greek":"el", "hindi":"hi", "italian":"it", "japanese":"ja",
            "korean":"ko", "norwegian":"no", "polish":"pl", "portugese":"pt",
            "romanian":"ro", "russian":"ru", "spanish":"es", "swedish":"sv" }

        urllib.FancyURLopener.version = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008070400 SUSE/3.0.1-0.1 Firefox/3.0.1"

        try:
            postParameters = urllib.urlencode({"langpair":"%s|%s" %(langCode[languageFrom.lower()],langCode[languageTo.lower()]), "text":text,"ie":"UTF8", "oe":"UTF8"})
        except KeyError, error:
            print "Currently we do not support %s" %(error.args[0])
            return

        page = urllib.urlopen("http://translate.google.com/translate_t", postParameters)
        content = page.read()
        page.close()

        htmlSource = BeautifulSoup(content)
        translation = htmlSource.find('span', title=text )
        return translation.renderContents()


    def fromAjaxService(self, text, languageFrom, languageTo):
        """
        Returns a simple string translating the text from "languageFrom" to
        "LanguageTo" using Google Translate AJAX Service.
        """
        LANG={
            "arabic":"ar", "bulgarian":"bg", "chinese":"zh-CN",
            "croatian":"hr", "czech":"cs", "danish":"da", "dutch":"nl",
            "english":"en", "finnish":"fi", "french":"fr", "german":"de",
            "greek":"el", "hindi":"hi", "italian":"it", "japanese":"ja",
            "korean":"ko", "norwegian":"no", "polish":"pl", "portugese":"pt",
            "romanian":"ro", "russian":"ru", "spanish":"es", "swedish":"sv" }

        base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
        langpair='%s|%s'%(LANG.get(languageFrom.lower(),languageFrom),
                          LANG.get(languageTo.lower(),languageTo))
        params=urllib.urlencode( (('v',1.0),
                           ('q',text.encode('utf-8')),
                           ('langpair',langpair),) )
        url=base_url+params
        content=urllib2.urlopen(url).read()
        try: trans_dict=json.loads(content)
        except AttributeError:
            try: trans_dict=json.load(content)
            except AttributeError: trans_dict=json.read(content)
        return trans_dict['responseData']['translatedText']

Now in another class called TestingGrounds.py I want to try out both methods, but I get the following error:

from Parser import parser

print parser.fromHtml("Hello my lady!", "English", "Italian")

Traceback (most recent call last): File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\BabylonPython\src\TestingGrounds.py", line 3, in print parser.fromHtml("Hello my lady!", "English", "Italian") TypeError: unbound method fromHtml() must be called with parser instance as first argument (got str instance instead)

+1  A: 

You have to have an instance of the parser class, not call the method on the class itself.

from Parser import parser

print parser().fromHTML("Hello my lady!", "English", "Italian")

or

from Parser import parser

p = parser()
p.fromHTML(...)

Alternatively, you could make fromHTML a staticmethod:

class parser(object):   # you should probably use new-style classes
    ...
    @staticmethod
    def fromHTML(...):
        ...

which you could then use like:

from Parser import parser

print parser.fromHTML(...)
W_P
What would be the most recommended way to do it? To create a new variable object of the class and then invoke the method you need? I'm new to Python, but familiar with C#. In C# I'd create a variable to hold the parser class then use the methods. Thanks!
Serg
After edit: Ah great! I didn't know python could instantiate object like that in the middle of a line. That's great.
Serg
It really depends on your use case. Will the `parser` class need separate instances, each with different sets of data? If so, then you might want to keep `fromHTML` as an instance method. If you decide to make it a static method, _don't_ give it a `self` argument, like @Xorlev shows. Either way, Python's syntax makes it so neither way is more or less verbose than the other (in this case at least). If it is an instance method, you call it with `parser().fromHTML()`, and if it is a static method, you call it with `parser.fromHTML()`
W_P
+1  A: 

If you want to use fromHtml() as a static method, useful if you don't really need to access any datamembers in parser, you'll need to do this (cut for brevity)

class parser:
    @staticmethod
    def fromHtml(text, languageFrom, languageTo):
         # etc.

Or, if you want it to be both a static method and have the ability to be an instance method...

class parser:
    @classmethod
    def fromHtml(self, text, languageFrom, languageTo):
         # etc.

You can now use it as parser.fromHtml() or parser().fromHtml()

Looking at your code, I should think you'd only need a static method.

Xorlev