A test means you have a pass/fail threshold. For a performance test, this means too slow and you fail, fast enough and you pass. If you fail, you start doing rework.
If you can't fail, then you're benchmarking, not actually testing.
When you talk about "system is capable of running" you have to define "capable". You could use any of a large number of hardware performance benchmarks. Whetstone, Dhrystone, etc., are popular. Or, perhaps you have a database-intensive application, then you might want to look at the TPC benchmark. Or, perhaps you have a network-intensive application and want to use netperf. Or a GUI-intensive application and want to use some kind of graphics benchmark.
Any of these give you some kind of "capability" measurement. Pick one or more. They're all good. Equally debatable. Equally biased toward your competitor and away from you.
Once you've run the benchmark, you can then run your software and see what the system actually does.
You could -- if you gather enough data -- establish some correlation between some benchmark numbers and your performance numbers. You'll see all kinds of variation based on workload, hardware configuration, OS version, virtual machine, DB server, etc.
With enough data from enough boxes with enough different configurations, you will eventually be able to develop a performance model that says "given this hardware, software, tuning parameters and configuration, I expect my software to do [X] transactions per second." That's a solid definition of "capable".
Once you have that model, you can then compare your software against the capability number. Until you have a very complete model, you don't really know which systems are even capable of running the piece of code 1000 times per second.