First of all, to be able to optimize efficiently, you need to know what it taking time :
- is PHP doing too much calculations ?
- do you have too many SQL queries ?
- do you have SQL queries that take too much time ?
- where is your script spending time ?
With those informations, you can then try to figure out :
- if you can diminish the number of SQL queries
- for instance, if you are doing the exact same query over and over again, you are obviously wasting time
- another idea is to "regroup" queries, if that is possible ; for instance, use only one query to get 10 lines, instead of 10 queries which all get one line back.
- if you can optimize queries that take too long
- either using indexes -- those which are usefull generally depend on the joins and conditions you are using
- or re-writing the queries, if they are "bad"
- About optimization of select statements, you can take a look at : 7.2. Optimizing SELECT and Other Statements
- if PHP is doing too much calculations, can you have it make less calculations ?
- Maybe not recalculating similar stuff again and again ?
- Or using more efficient queries ?
- if PHP is taking time, and the SQL server is not over-loaded, using parallelism (launching several calculations at the same time) might also help speed up the whole thing.
Still : this is quite a specific question, and the answers will be probably be pretty specific too -- which means more informations might be necessary if you want more than general answer...
Edit after your edits
As you only have simple queries, things might be a bit easier... Maybe.
- First of all : you need to identify the kind of queries you are doing.
- I'm guessing, of all all your queries, you can identify some "types" of queries.
- for instance : "
select * from a where x = 12
" and "select * from a where x = 14
" are of the same type : same select, same table, same where clause -- only the value changes
- once you know which queries are used the most, you'll need to check if they are optimized : using
EXPLAIN
will help
- (if needed, I'm sure some people will be able to help you understand its output, if you provider it alongside the schema of you DB (tables + indexes))
- If needed : create the right indexes -- that's kind of the hard/specific part ^^
- it is also for those queries that reducing the number of queries might prove useful...
- when you're done with queries often used, it's time to go with queries that take too long ; using
microtime
from PHP will help you find out which ones those are
Before that, to find out if PHP is working too much, or if it's MySQL, a simple way is to use the "top" command on Linux, or the "process manager" (I'm not on windows, and don't use it in english -- the real name might be something else).
If PHP is eating 100% of CPU, you have your culprit. If MySQL is eating all CPU, you have your culprit too.
When you know which one of those is working too much, it's a first step : you know what to optimize first.
I see from your portion of code that your are :
- going through 10,000 elements one by one -- it should be easy to split those in 2 or more slices
- using DOM and XPath, which might eat some CPU on the PHP-side
If you have a multi-core CPU, an idea (that I would try if I see that PHP is eating lots of CPU) would to to parallelize.
For instance, you could have two instances of the PHP script running at the same time :
- one that will deal with the first half of the URLs
- the SQL query for this one will be like "
select * from urls where id < 5000
"
- and the other one that will deal with the second half of the URLs
- its query will be like "
select * from urls where id >= 5000
"
You will get a bit more concurrency on the network (probably not a problem) and on the database (a database knows how to deal with concurrency, and 2 scripts using it will generally not be too much), but you'll be able to process almost twice the same amount of documents in the same time.
If you have 4 CPU, splitting the urls-list in 4 (or even more ; find out by trial and error) parts would do too.