views:

748

answers:

14

Hey. I'm taking a course titled Principles of Programming Languages, and I need to decide on a project to do this summer. Here is a short version of what the project needs to accomplish:

"The nature of the project is language processing. Writing a Scheme/Lisp processor is a project of this type. A compiler for a language like C or Pascal is also a potential project of this type. Some past students have done projects related to databases and processing SQL. Another possible project might relate to pattern matching and manipulating XML. Lisp, Pascal, and C usually result in the most straight forward projects."

I am very interested in web technologies, and have some experience with PHP, MySql, JavaScript, etc. and I would like to do something web oriented, but I'm having trouble coming up with any ideas. I also want this to be a worthwhile project that could have some significance, instead of just doing the same thing as everyone else in class.

Any ideas? Thanks!

EDIT: I really like the idea of a Latex to XHTML/MathML translator, and I passed the idea to my instructor, in which he wrote back:

"I think the idea is interesting, my question (and yours) is whether it is appropriate.

I think of LateX as a low-level mark-up language. I'm wondering if converting this to XHTML or MathML is really a change in levels and complexity. I think you can make your point with a little more discussion and some examples. You might also think of some other mark-up constructs which made it easier to describe equations."

Any ideas on how to convince him this may be appropriate, or any extensions of this idea that could work for the goals of my project?

Thanks for all the responses so far!

A: 

Why not write some sort of interface that can be interpreted/compiled down to the appropriate web technology of the users choice?

Or something like a Python to C compiler?

samoz
A Python-to-C compiler isn't that simple, considering Python is a dynamically typed language and C is not. It's theoretically possible, but not a summer-only project IMO. (PyPy did it, but they cheated by using a strict subset of the language.)
musicfreak
+5  A: 

Hm, neat! Maybe:

1. A web-based language interpreter. eg, a very simple assembly interpreter in javascript, or a PHP-based C interpreter (PHP script reads C code, and executes it in some sort of sandboxed kind of way. Obviously it would only be able to implement a small subset of the C language)

2. Maybe some automated way to transform PHP data structures (like PHP arrays) into SQL queries, and vice versa. That kind of stuff has already been done, but you might be able to do something which (for example) takes an SQL query and creates the array datastructure that would be needed to "hold" the information returned by the SQL. It could support complex things like JOINS and GROUP BYs.

3. Maybe a C-to-PHP compiler? (or a PHP-to-C compiler, to be able to run simple PHP code natively. Use this with any combination of languages)

edit:

4. Maybe a regex-to-C parser. That is, something that takes a regex, and generates C code to match that pattern. Or something which takes a regex, and converts it into an FSM which represents the "mathematical" translation of that expression. Or the opposite - something which takes an FSM for a CFL and generates the perl-syntax regex for it.

5. Maybe an XML-to-PHP/MySQL parser. eg, an XML file might contain information about a database and fields, and then your program creates the SQL to create those tables, or the HTML/PHP code for the forms.

Best of luck!

rascher
Flying Spaghetti Monster?!
Skilldrick
Finite State Machine.
Andy Mikula
A: 

Writing compiler for C or Pascal will likely take you months or years, if you are not compiler guru.

Write a simple web server. It will be fun and might prove useful as a simple and free solution. I once met a guy who said he did something like this and used for simple customer sites. Yours could become a useful thing as well.

User
i don't think writing a webserver will qualify as language processing.
Schnalle
A: 

Just something I thought of recently: write a Ruby interpreter in Lisp.

Svante
A: 

Something that can be interesting to work on, is a regexp to automaton using Glouchkov's algorithm, here are some key features that can be implemented

  • Syntaxical analysis of regexp
  • Transformation into an automaton using Glouchkov's algorithm
  • Generating random phrases matching the regexp with that automaton / Validating phrases
  • Exporting automatons using XML

That's not a very long assignment so you may be able to handle it in a few months

Lau
+3  A: 

I'd stay away from PHP and MySQL for a project like this. Both are commercial platforms that have compromised a lot of core CS principles in order to gain market share and solve user's problems. Given what you've described it sounds like the point of this project is to think about how programming languages are processed. Javascript The Language (not the browser API) might be a good choice here. Writing a processor/interpreter/compiler for Javascript or using Javascript itself to write a processor/interpreter/compiler for another language would meet the criteria for the assignment. Writing a Javascript "minifier" that removes all unnecessary white space (for smaller file sizes) while maintaining the program's functionality is another possible project.

Alan Storm
Ooh, or he could write a code-obfuscater. There seem to be a dearth of good open-source ones out there anyway. (aside from the merits of such a program, that would be an interesting thing to write - a number of lexicographical "gotchas" would surely crop up.)
rascher
A: 

You can try to make a scripting language in the vein of nadvsh if you want to do something interesting, but it might be too removed from what your instructor is expecting of you.

New Adventure Shell (nadvsh)

Coding With Style
+2  A: 

You shouldn't view creating an implementation of a particular language as insignificant. Everyone probably wants to be a famous programmer and not many people achieve it. This is a great opportunity to be familiar with very cool uncommon languages. (Lisp, APL, etc) If this is your first time creating a compiler/interpreter then it will also be a better choice to go with an already existent language (so you can see what design elements are needed to create a successful language.)

Significant ideas typically arise from necessity. People began using a language because they either needed it or it was a lot easier to accomplish the task they wanted to do. I don't think you will find the answer or the motivation to start a project from scratch here. That being said, I've always thought it would be cool to have a language that uses processor native byte code to create dynamic websites (without using something like cgi).

Patrick Gryciuk
A: 

If you want to process language you can do a UIMA program. UIMA stands for Unstructured Information Management Architecture, it was developped by IBM at a cost of about 45Million dollars and is now available opensource. Basically UIMA is ascii codecs to analyse text documents to find patterns. It is made to find things where there is no order(finding needles in hay stacks). It uses XML and C.

yan bellavance
A: 

The web is a rich area for doing work with languages. Take a look at a popular web framework like Ruby on Rails, and you'll find that much of its productivity comes from the fact that it implements a domain specific language well suited to web applications. Ruby just so happened to be a good language to implement such a language because of its dynamic nature, but the power comes from the language they created from it.

In your case, perhaps you could try designing your own domain specific language using a language that you are familiar with, such as PHP, to implement the essential core of a web framework:

  • routing URLs to pages
  • generating pages dynamically using a template (and maybe implement your own template syntax!)
  • connecting objects to underlying databases (object relational mapping)

If you are really ambitious, instead of building from an existing language, you could build your own language from the ground up (lexer, parser, code generator, etc) to do this.

alanlcode
+2  A: 

I finished this course last semester :)

IMHO the best way to go is to build an expression evaluator. build the simplest expression evaluator you can.

Then add these features in order as many as you like:

1- constant symbols, just place holders for variables. your evaluator should ask for their values after parsing the expression.

2- imperative-style variables. Like variables in any imperative language, where the user can change the value of a symbol anywhere in code.

3- simple control-statements. 'if-else' and pretest while loop are the simplest to consider.

4- arrays. if you really want your expression evaluator to be really like a programming language. It would be interesting if you add variable dimension arrays to your 'language'. you have to build a generic mapping function for your arrays.

Now you got a real programming language. To be a useful one, you might add subroutines. so the list continuous:

5- subroutines. This is little harder than previous features, but it should not be impossible :)

6- build a simple math library for your new language in your language it self! and that is the fun part in my opinion ;)

Sebest book is a good book to overview famous imperative programming languages.

AraK
This is going to be pretty close to what I'm going to end up doing, thanks for the response!
Domenic
A: 

You can ideas from this massive list.

drikoda
+4  A: 

Here's something I'd love: a PHP-based LaTeX-to-MathML translator. It wouldn't have to do everything, but if I could just cut-and-paste mathematical formulas written in valid LaTeX code into a window and have the script parse it and convert it into valid MathML, that'd be awesome.

Let me expand on this some more. The current state of scientific publishing on the web isn't great. Titles, headers, section numbers, tables, etc. can all be done in HTML, but for mathematical and chemical formulas which depend on precise two-dimensional formatting, scientific authors have only second-class options:

  • Publish their work in pdf format, which looks great but has a (comparably) huge file size and doesn't do hyperlinking well, or
  • Use something like latex-to-html, which converts formulas into .gif files (or some similar image file), which are semantically meaningless and thus doesn't lend themselves to indexing or searching.

Moreover, neither of these options allow for mathematical formulas to be generated programmatically, which would be helpful to the education community (think randomly-generated online homework).

Publishing scientific work in MathML would solve all of these issues, but it has a few of issues of its own, namely:

  1. It's really too verbose to code by hand. I mean, you can do it, but c'mon.
  2. The scientific community uses LaTeX for publishing, they're happy with it (for good reason), and they're not about to learn another mathematical markup language when they've got their own research and lesson-planning to do.
  3. Browser support for MathML is currently pretty limited. I know this, and I don't mean to stick my head in the sand about it.

In other words: scientific authors know LaTeX, they use it daily, it's the de facto standard for authoring scientific content. MathML isn't and won't ever be the way math and science is authored, but it's the only semantically rich way to put hypertext mathematics on the web. Browser support for MathML is weak because nobody uses it; nobody uses it because it's too hard to write by hand. Now, maybe this is wishful thinking, but I have to believe that if it were only easier to write MathML, more scientists and mathematicians, especially the early-adopter types, would at least try it, and this would inspire browsers (especially open-source browsers) to improve their support, which would then lead to more authors using it, etc.

Here's where the translator comes in: Until the barrier-to-entry for MathML drops, it'll never be widely adopted. A simple LaTeX-to-MathML converter would take care of that. It would reduce the barrier-to-entry for MathML to near zero. If it leads to widespread use of and better support for MathML, it would be a major benefit to the scientific and education communities.

Alex Basson
ohman, there are so many good things that would happen with LaTeX-to-anything translators. I know that there are already latex-to-html things out there, but they're all perl-shell-scripts. Nothing that you could easily use as a PHP module or C module or something.
rascher
although a really cool idea, it just didn't fit the project description, although I will keep this in the back of my mind as a possible side project someday, thank you for your answer
Domenic
+1  A: 

In response to your edit, here are some latex ideas:

  1. LaTeX-to-ASCII pretty print, perhaps just for a small subset of TeX
  2. LaTeX-to-Maple/Mathcad/Mathematica script, so that equations can be imported or edited or solved (don't know if that already exists)
  3. Javascript LaTeX translator. basically, as you type, it does a translation from latex to html/css/.gif/whatever, so you can see your math "live" as you type it, kinda like the stackoverflow text editor.
  4. Perhaps some sort of latex macros for expressing C code or something? Or how about this: often, C code is doing math: "det = (b*b - 4*a*c); det_sqrt = sqrt(det); etc" How about something which takes C (or java or whatever) code, which is performing a series of arithmetic assignments, and converts it into a nicely-formatted latex list of equations that are human-readable (ie, a \begin{eqnarray} block)
  5. Or something that does the opposite: take a listing of latex computations or equations, and generates C code which declares the requisite variables, gets requisite user input, and performs the computations listed in your latex?
rascher