views:

79

answers:

1

We have to automate database updation couple of times a month.

My current use case is as follows.

Currently we manually diff the input data with the previous month's input. If it is XML we use MS xmldiff. If it is Pdf , it is fully done through manual verificaton :-( .

Once the changes are found, we update the database through a form interface (again manually). The changes could be creating new entries,updating existing entries or removing older entries. Since this overall process is both time consuming and manual not to mention extremely boring and frustrating for a individual person, we are looking for ways to automate this process as much as possible.

We are currently thinking of implementing the whole thing using a scripting language(specifically Python). But we dont really have anyone currently who has experience in scripting languages so we basically have to learn as we go.

The questions we would like answered before we jump in are

1) Is going with a scripting language the correct approach? We thought scripting language is better since there are multiple areas to be covered(XML diff, database connectivity, creating XML etc). Are there any other alternatives or tools?

2) Is going with Python as good as any other language? Based on what we googled, python seems to be mature and does support all kinds of database connectivity through libraries. Are there any other alternatives we should investigate?(Again no one has written a single Python code)

3) Are there any good and free Diff tools which work on pdf files? We are looking for something which can check if a specific table or heading content is changed in a pdf and dump the output.

Just FYI the database is MS access.

Thanks for your time.

+1  A: 

I think Python is easy to learn language, and in my opinion if you have VBScript experience, you should be able to pick it up quickly.

I used BeautifulSoup for my XML/HTML parsing, which I found very easy to use. http://www.crummy.com/software/BeautifulSoup/documentation.html

For PDF stuff you can take a look at rportlab toolkit (which I have not used) http://www.reportlab.org/

Thanks. We will try out the links. So the script based approach is the best for this kind of scenario?