Unless the classification state space you are attempting to learn is extremely large, I would expect that there is significant redundancy in a text-mining-focused dataset with 10-100 billion records or training samples. As a rough guess, I would doubt that one would need much more than a 1-2% random sample subset to learn reliable classifiers that would hold up well under cross-validation testing.
A quick literature search came up with the following relevant papers. The Tsang paper claims O(n) time complexity for n training samples, and there is software related to it available as the LibCVM toolkit. The Wolfe paper describes a distributed EM approach based on MapReduce.
Lastly, there was a Large-Scale Machine Learning workshop at the NIPS 2009 conference that looks to have had lots of interesting and relevant presentations.
References
Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung (2005). "Core Vector Machines: Fast SVM Training on Very Large Data Sets", Journal of Machine Learning Research, vol 6, pp 363–392.
J Wolfe, A Haghighi, D Klein (2008). "Fully Distributed EM for Very Large Datasets", Proceedings of the 25th International Conference on Machine Learning, pp 1184-1191.
Olivier Camp, Joaquim B. L. Filipe, Slimane Hammoudi and Mario Piattini (2005). "Mining Very Large Datasets with Support Vector Machine Algorithms ", Enterprise Information Systems V, Springer Netherlands, pp 177-184.