views:

160

answers:

6

Hi guys, i'm implementing in a project of mine an ip banning functionality. First of all, i'd like to avoid .htaccess for this purpose 'cause the CMS probably would reset it upon modifications, so i have to use a PHP-send-header-and-die solution. Obviously every HTTP request will be checked.

Considering an highly trafficated site, i have two solutions to store ip ban infos :

1 - In a directory, let's say /bans/, i can create N files where N = number of banned ips, so :

/bans/23.23.23.23.ban

would ban 23.23.23.23, in this case all i have to do from my script is to check with file_exists, for instance :

<?php
    if( file_exists("bans/".$_SERVER['REMOTE_ADDR'].".ban"){
        header("HTTP/1.0 403 Forbidden");
        die();
    }
    else{
        // Continue surfing ....
    }
?>

2 - Use a MySQL table, let's say cms_bans, and execute a SELECT for every HTTP request to check if the ip is in the ban list.

Considering those 2 solutions, which one has less overload impact (filesystem vs mysql :D), assuming MySQL query caching is disabled ?

Please only motivated answers, not just personal preferences.

Thanks

A: 

I would tend to think the file_exists() has less overhead since there's no remote connection to make and it can be cached by PHP. But if you've got a huge number of bans, and the db connection is already being made for some other part of the app, then the MySQL solution starts to look better and is certainly easier to manage. That said, I'm not a fan of either approach and would suggest moving the bans to the network layer, via your firewall/proxy/load balancer.

Alternatively, if you don't have a large number of bans and they don't change very often, you'll be better off storing the list directly as a PHP array, include()'ing that in your code, and then using in_array() to scan for the bans:

$banned = array(
    '1.2.3.4',
    '2.3.4.5',
    '3.4.5.6'
);
if (in_array($ip, $banned)) { baninate(); }
Alex Howansky
I have a LOT of bans, can't manage firewall/proxy/etc (shared hosting) and i have to manage bans timestamps (ban durations) too ... so the PHP solutions, even if it's really sexy (was my first try), could consume too much RAM :( And yes, i DO have a pre-made MySQL connection.
Simone Margaritelli
I think we would expect that someone asking this question and not micro-optimising would have a lot of bans they need to deal with.
Jivlain
Yeah ok, given that info, sounds like the MySQL option is your only choice. Unless maybe you have some other low-overhead connection also all ready to go, like a Memcache or a MongoDB.
Alex Howansky
A: 

I would prefer the mysql database. You can store additional data in the table like banned_by, banned_on, bann_reason and access_count.

In your code you only do a

SELECT COUNT(1) FROM banns WHERE ip = '23.23.23.23' 

assuming you have an index on the ip column this should be pretty fast. If someone hit's a ban you do an

 UPDATE banns SET access_count = access_count + 1 WHERE ip = '23.23.23.23'
SchlaWiener
I don't need those infos, i just need performance :)
Simone Margaritelli
@Simone - If you're **that** bothered about performance, the best way to find out for sure is to test it. Work out an average of how long it takes to add a ban, and check for a ban, using both methods. Then you'll know for sure.
chigley
@Simone I doubt you really need it.
Col. Shrapnel
A: 

Why stick with one over the other?

I would set up a MySQL-based ban table that I can expand easily. MySQL is a fast system, and is a lot more flexible keeping in mind the future.

But, you can also cache the results yourself in a file and read from that. Now, anyone who's familiar with MySQL can add bans directly, they don't have to know your special format in order to work directly with it (in the case of extensions to your ban system, working with it, etc). The question here, then, is file permissions. So, you'd just have to throw in a little extra code to compensate for that.

If they need the cache updated, give them the ability.

Codeacula
Simpler and nicer idea :)
Simone Margaritelli
You can even further enhance it by implementing a memcached or other caching engine solution.
Codeacula
I have a caching solution i made some time ago, i could use that or memcached if it's available on the server ... anyhow let's wait for other answers ;)
Simone Margaritelli
A: 

You don't say what CMS you are using, but if is open source or your own why not modify it to use .htaccess for the bans ?

Elija
A: 

I don't think this question is really performance related.
Such a simple key-value lookup will never be a bottleneck compared to the other parts of the application.
That's mistake which being made very often: people tend to optimize less resource-consuming parts of the site, without any particular reason but because it just came to their mind.

That's why question should be motivated, not going out of just personal preferences.

Col. Shrapnel
I actually managed a site in which adding a `file_exists` call on every page load would bring the server to his knees. I still agree that it's very likely to be "less resource-consuming" than many other parts of the site, but you can't say for sure if this is the case. Moreover, both I/O performance and network latency can become bottlenecks real quick.
MartinodF
@MartinodF got any details on this knee situation? I have a plenty of calls of similar function (though is_readable() is one I prefer) and never seen any problem caused by this function. As for the network latency please keep in mind that memcache servers often used to speed up, usually being separate servers over network. That makes me thing that normal network is not a cause for the problem. Let me remind you again that not only these silly IP addresses we store in the database.
Col. Shrapnel
@Col. Shrapnel He's talking about a shared hosting, so I wouldn't be sure how performing his I/O or MySQL server could be. I also don't know how many requests he's going to handle or if his application already requires a lot of disk access, and so on. I was managing a client's seriously underpowered site, where the hardware was so near its limit that changing PHP includes from absolute to relative paths affected performance. It's not an everyday situation, but I always like to look for the best solution, even for an apparently secondary problem like this one :)
MartinodF
@MartinodF @Let me remind you yet again that not only these silly IP addresses we store in the database. If we will have any database-related issues, it will affect whole site, not only this silly IP lookup. So, this this poor database should be our concern, not ban routine. I hope you'll be able to understand that.
Col. Shrapnel
A: 

As far as your original question, using file_exists() is quicker for small numbers of bans (about <1000 bans) whereas using MySQL is quicker for larger numbers. The connection is made to the database only one time and the answer is sent back only one time, so the "bottleneck" of MySQL only adds a set, constant amount of time to the required time to do the query. MySQL (and other database) software then scales extremely well because there is a constant byte-width for each row, so it need only check bytes nRX to nRX+Y for integer multiples of n.

In older file systems the Operating System can not make such an assumption since files can be of variable length. Thus it would scan for the end_of_file bit. Newer operating systems create a database of every file (The "File Allocation Table") at the beginning of the partition and it need merely search this. The problem is- the more files on the computer, the longer it takes to search this table. Also, fragmentation of the drive can make it harder to find if a file still exists. These little slow downs don't equal the time it takes to connect to an SQL database... for small numbers of bans.

What would be an even better solution would be to have a text file containing one ban per line.

bans.txt:
23.23.23.23
192.168.1.42
200.200.200.200

Then you just use strpos($file_contents, $_SERVER["REMOTE_ADDR"]. Note that the fewer lines of PHP you have, the quicker it will ultimately run since the C back-end of PHP is about 100x quicker than the interpretation. Therefore in two lines we can get_file_contents() (dump the contents to RAM) and strpos() (search for a string within the RAM) and it's entirely handled by the C back-end, which iterates through quite quickly.

There are even quicker ways to do it, too, if you're willing to write your own database that keeps the bans listed numerically (allowing binary search).

Although as several people have already stated, this is not where any main bottleneck will occur in your server. Optimizing the "check for ban" portion of your website will net a 0.01% speed increase to your whole site. What you want to be really careful to optimize are loops that run >100 times, calls to remote servers, and queries which return several lines of a database to be parsed.

Also, don't write a function to perform something which already has a built-in PHP function. I spent a year manually parsing string with hundreds of substr(strpos()) lines before I learned how to use preg_replace()

steven_desu