theory

Theory of caching

Is there a unified theory of caching? That is, collections of theorems and algorithms for constructing caches and/or optimizing for them? The question is deliberately broad, because the results I am looking for are also broad. Formulas for maximum achievable speedup, metrics for caching algorithms, stuff like that. A university-level te...

How to write a simple database engine

I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database e...

Can a SHA-1 hash be purely numeric?

Is there any chance that a SHA-1 hash can be purely numeric, or does the algorithm ensure that there must be at least one alphabetical character? Edit: I'm representing it in base 16, as a string returned by PHP's sha1() function. ...

"Rebuttals" and "Comments" - Two DB-Tables or One?

I'm working on a project for a friend and I've come across a difficult decision. The project consists of essays, each of which can be challenged, and also commented on. The thing is this, only one person is able to challenge the essay, and then everybody else is locked out and can only comment. The rebuttals can only be two responses de...

Largest possible group of friends in common?

I'm trying to come up with the largest possible group of friends that would theoretically get along with each other, i.e., each person in the group should know at least 50% of the other people in the group. I'm trying to come up with an algorithm for this that doesn't take ridiculously long; Facebook's API/cross-server talk is pretty sl...

Given an RE, derive the largest substring match

I'm looking for a bit of code that will: Given regular expression E, derive the longest string X such that for every S, X is a substring of S iff S will match E examples: E = "a", X = "a" E = "^a$", X = "a" E = "a(b|c)", X = "a" E = "[ab]", X = "" context: I want to match some regular expressions against a data store that only sup...

What is the most underrated or little known but useful algorithm?

I'm looking for the one algorithm or data structure which is so unknown yet useful that you think it's a horrible oversight by the computer science or programming community. If only we could all learn this one thing, a lot of good would be done to many future programs. The best one I can come up with is interpolation search, which only...

Why does Java have both checked and unchecked exceptions?

Possible Duplicate: When to choose checked and unchecked exceptions Why does Java as a language have both checked and unchecked exceptions. What purpose do they serve? Note: I'm not asking when I should use them, or how to code them, but what they add to the language. ...

Should I verify an email address when a user registers on my website?

I have a membership site in beta right now... At the moment, when a user registers, it marks the account as unverified and sends them an email with a link to verify their account. The real reason for doing this is to make sure they entered their valid email address correctly. So I was contemplating on removing the verification step to ...

High Level Coding Overview?

I'm going to spend the next few months getting acquainted with programming, but I'm not the type who learns best by "just doing it". I am only productive if I can understand things from a more structural, high level overview type context. Every one of you will no doubt want to just reply "Seriously dude, just start coding!" and I appre...

How would you code an anti plagiarism site?

First, please note, that I am interesting in how something like this would work, and am not intending to build it for a client etc, as I'm sure there may already be open source implementations. How do the algorithms work which detect plagiarism in uploaded text? Does it use regex to send all words to an index, strip out known words like...

How does a large MySQL database become efficient?

Do really large (say, 100 million rows) databases regularly get broken up? I'm totally unfamiliar with scaling -anything-, but thought that maybe one database could act as a key to larger chunks of the actual data, becoming an index that would say "Oh yeah, that query needs the 20,000,000-25,000,000 block" and sends the query into that b...

HTML and Compilers

This question is a more discussion oriented one that a simple problem specific question. Writing basic HTML is simple but writing fast light standards based, SEO best practices complaint, all browsers compatible HTML pages is hard and very time consuming. But why it hard ? In my opinion it hard because of the hundreds of different r...

Why is multiplying cheaper than dividing?

I recently wrote a Vector 3 class, and I submitted my normalize() function for reviewal to a friend. He said it was good, but that I should multiply by the reciprocal where possible because "multiplying is cheaper than dividing" in CPU time. My question simply is, why is that? ...

Retrieve maximal / minimal record

A rather complicated SQL query I was working on got me thinking about a limitation of (ANSI) SQL: Is there a way to retrieve a record that is maximal or minimal with respect to an arbitrary ordering? In other words: Given a query like this: SELECT * FROM mytable WHERE <various conditions> ORDER BY <order clause> is it possible to w...

Meaning of memory related terms?

While playing with memory profiling in delphi, although it applies to any language I've found some terms about used memory which I don't completely understand. Could someone explain (or refer to a good document or manual) the meaning of the following terms? Working set size Pagefile used Committed memory Uncommited memory Private Usage...

Is finding the equivalence of two functions undecidable?

Is it impossible to know if two functions are equivalent? For example, a compiler writer wants to determine if two functions that the developer has written perform the same operation, what methods can he use to figure that one out? Or can what can we do to find out that two TMs are identical? Is there a way to normalize the machines? E...

Computer Science and Psychology

First off, my apologies for asking this fairly off-topic question. But in my experience, there are a lot of highly intelligent people on SO so I figured I might give it a shot. Please don't be too triggerhappy with the 'close'-button :-) Besides, I think there's some chance that this may be of general interest. I'm a Computer Science ma...

Good link or book for basics and theory of version control

Good link or book for basics and theory of version control Would like to really understand all of the fundamentals and theory of version control. Probably implementation agnostic but if book or resource uses something to practice with that is fine. Was looking at the pragmatic series. Is there something better or open source? ...

What is starvation?

In multitasking systems, some abnormal conditions prevent progress of executing processes or threads. I'll refer to both processes and threads simply as "processes". Two of these conditions are called dead-lock and live-lock. The former refers to processes which are blocking each other, thus preventing either from executing. The latter ...