views:

87

answers:

2

Hi all, I'm building an application that is a kind of registry. Think about the dictionary: you lookup for a word and it return something if the word is found. Now, that registry is going to store valuable informations about companies, and some could be tempted to get the complete listing. My application use EJB 3.0 that replies to WS.

So I was thinking about permits a maximum of 10 query per IP address per day. Storing the IP address and a counter on a table that would be empty by a script every night.

Is it a good idea/practice to do so? If yes, how can I get the IP address on the EJB side? Is there a better way to prevent something to get all the data from my database? I've also though about CAPTCHA but I think it's a pain for the user, and sometime, they are difficult to read even for real human.

Hope it's all clear since I'm not english...

Thanks Alain

+1  A: 

I'd say the limit of 10 query per day per IP is not very good. Take into account that many people may share the same public IP.

Although it's not 100% accurate you could analyze if an unusual amount of request are coming from the same IP in a short period of time. In case that your alarm sounds, you show a CAPTCHA.

Claudio Redi
+1 for shared IP addresses. Not to mention someone could just pay to get a lot of IP addresses, or use bots, making it pretty much impossible to stop the "collection". Better restrict the app to people who are trusted, or restrict _what_ they can see.
Longpoke
+1  A: 

An alternative is to put an unique request based token in a hidden field of the form which you store in the session scope and then compare that on submit of the form. That would filter out the bots which doesn't maintain the session and that are already pretty much.

To go a step further, you could add a timestamp to the request based token and then check if the form is submitted within reasonable time, e.g. 5 seconds (at least the fastest time a normal human can enter and submit the form). That would filter out another bots which usually instantly fills and submits the form in subsecond. Another advantage of this is that in case of a very smart bot that it is then forced to take it more easy with firing lot of subsequent requests.

I would at least not rely on the IP address. It comes with too much external disturbing factors.

BalusC
But he wants to prevent a _targeted_ attack, in which case, the "collector" will probably be able to circumvent these countermeasures.
Longpoke
@Longpoke: It would not be easy to figure that out :) Do you know better ways then?
BalusC
Thanks a lot for your answer. But I just want to make sure I understand it correctly.On my form, I place an hidden field on my form with the timestamp in it and I store the same value in the session. When the form is submitted, I compare the value of the hidden field submitted with the value that is in the session. And if the value is less then 5 or 10 seconds old, then I could show a CAPTCHA to stop the bot right there.Can you please confirm that I understand correctly your solution?Thanks
Alain
You can also do so. That's another step further. I would however encrypt the timestamp with a request based token as cipher key and make the name of the hidden field non-sensible.
BalusC
@Longpoke: By "targeted" attack, do you mean that the attacker would concentrate on getting data from my site only, so he can script the bot to handle the timestamp? Or query my site every 10 sec instead of every 5 mills. So that kind of "targeted " bot will not only be interrested getting email address to spam, but will try to get sensitive informations. And that solution wont prevent that. So the only effective way would be the CAPTCHA in that case?
Alain
@Alain, the latter, yes. I don't believe there is any way to stop a targetting attacker. The extreme would be that you detect that too many people (due to the attacker's IPs) are accessing the site in a consistent manner, so you start showing captchas globally, but then the attacker would just learn and slow down the attack until it's unnoticable, to prevent the global captchas. I am in a situation similar to yours, the only way I solve it is by limiting access to trusted people. @BalusC: I don't see why it's so hard, if the attacker is trying to hide himself, he'll just mimic browsers.
Longpoke
btw SO gave me a captcha when I wrote that answer T_T
Longpoke
You can never stop targeted attacks. But you would already love to have 1 HTTP request per 5 seconds instead of 100 per second.
BalusC