tags:

views:

101

answers:

3

I have got my website loggin user activity into a .txt file. I want to be able to show these results on my Admin area in separate pages. So page one shows 1-50 results and so on.

But the problem I have is it's set out like this in the .txt file

User: Admin IP Address: xx.xxx.xxx.xx Host Address: xxxxxxxxxxxxxxxx Date And Time: Monday 20th of September 2010 11:44:18 AM URL: http://colemansystems.psm2.co.uk/ Browser: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.62 Safari/534.3 Refering URL:

With a gap of seven lines, between each set of information. I was thinking about putting it into a MySQL table but wouldn't that get very big over time?

Any help is appreciated!

+1  A: 

So what if it gets big? That's what databases are for.

Let me assure you that the [one-time] cost of setting up a database and a table will be much less than maintaining a homebrew (as in custom-format) data file in the long run.

What if you want to know what happened between 2010-09-25 and -26 via the IP address 1.2.3.4? Are you going to write a function? (It's a single statement in SQL.) Are you going to scan the whole file? (Proper DBMS's will just use the indices. MySQL will use at least one index.)

I'm half inclined to say "try them both and see how the DB approach wins in the long run" because the advantages are too numerous to list.

aib
Thanks a lot for your help. I may try the MySQL method now. I really appreciate your response!
hart1994
Good answer. The DB option also cuts down on alot of overhead - you don't need to repeat all of the "User:", "Host Address:", etc. headings for each entry - you just setup fields for each of them and fill them with data.
Lucanos
Lucanos, Yes that's what is exactly what is needed, the simpler the better! Thanks!
hart1994
A: 

A database table seems more suited for this if you will be querying it, rather than just looking at the logs in some monitor (eg unix tail).

It will get very big over time, which is why you can prune it by deleting older entries every now and then if you notice performance drops. You can setup a script that runs a DELETE query on the table on all entries older than, say, 1 month. Then you can schedule that script to run daily.

Another option will be to use partitioning on date if you truly need all the logs even older ones, but that's a bit more advanced.

Frankly, assuming you setup proper indexes, the row count should enter the tens of millions before you notice any degradation of performance.

Fanis
Is that called a cron job? If so, how do you make/set one up?
hart1994
@hart1994 on unix systems yes, a cron job. This looks like a thorough tutorial: http://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/ . You would be scheduling a CLI mode PHP script, one that you'd run in the terminal itself to test it.
Fanis
ok thanks, What is CLI? Sorry I don't know loads about servers. Would i just type in PHP in there then?
hart1994
@hart1994 Oh sorry, yes CLI is just Command Line Interface - basically PHP on the command line, not through a web browser. It's regular PHP code but you should be careful of some enviromental variables, eg `$_SERVER`. For your needs of connecting to mysql and running a query, I don't think you will run into any trouble. If you'd like to read more, http://articles.sitepoint.com/article/php-command-line-1 looks thorough
Fanis
Ok, thanks a lot for your help! I will take a look and run this now and again. So I just go onto "Cron Jobs" on my cPanel?
hart1994
I expect so, yes. In fact I presume cpanel will have an easy way of adding jobs. I'm afraid I don't have access to a cpanel server to know more details.
Fanis
Ok, thanks for your help!
hart1994
A: 

Any modern database, including MySQL, will handle queries on a properly indexed table in reasonable time with several tens of millions of rows in it. That's what they're for.

My usual rule of thumb is that for any table up to 10,000 rows and straightforward queries you don't even need to think too hard about data access paths or anything else. In the 10,000 to 1 million row range I would be giving it significant attention to ensure the table was properly indexed, and beyond a million row then more advance management techniques may be required to keep performance acceptable (although for a simple log table this would be less of an issue).

Cruachan
Oh thanks, thats what im doing now, migrating the logs to mysql database it makes making queries easier and faster.
hart1994