views:

104

answers:

6

Hey all, I am learning to be a dba, and the one thing i am missing is good quantity of data to handle a database with. Someone on irc said if you cant handle few terabytes of data then you are still not good enough.

My question is, is there a way i can have terabytes of data from somewhere that i can use it for my learning purposes ? I am going to use it in oracle.

I thought about collecting spam emails,but it would be a long shot to get large quantity of data in a short time. Should i go for this ? I would be helpful if someone can recommend a better solution. I just need few terabytes of data to play with the database.

Thanks.

+2  A: 

Why don't you define a small database schema with a few tables that have different data types and write a few stored procedures that add random data to those tables. Writing that stuff will help you become a better dba too.

klausbyskov
You could use a tool like http://www.dominicgiles.com/datagenerator.html
Gary
Great comment. It would have been a great answer
borjab
+2  A: 

May be there are terabyte databases for learning purposes. But, How do you distribute it? By internet? by hundrends of DVD?

You can create big tables using querys from the special views. The scalar product of two or three tables will give you lots of combinations:

INSERT INTO Target_table
SELECT 
ROWNUM                                  AS ID,
a1.objetc_name || '_' '' a2.object_name AS name
all_objects a1, 
all_objetcs a2
borjab
+1 for "How do you distribute it"? Assuming I was getting my maximum download rate at all times, 24/7, it would take well over a month to download a terabyte. That's plenty of time for my ISP to ask what I'm doing with all that bandwidth.
David Thornley
A: 

I don't know if it is in the terabyte range, but here's 100 million movie ratings.

Sinan Ünür
Thanks. I'll definitely use it.
A: 

There are several of these available for download on the Internet. Amazon, for example, has a Public Data Sets service that includes large data sets like US Census data (but these run on Amazon's cloud).

Google for "public data sets" and you will turn up lots of freely available databases. You can also download Wikipedia.

Ken Liu
Thanks! Wikipedia sounds good.
A: 

Terabyte level stuff is generally data warehouse range (or multi-media stuff which is quite specialised). Lots of business apps will be in the hundreds of gig, or even less.

You'd have a hard job finding multi-gig datasets on the Internet. The Stackoverflow data dump is available, but less than a gig. OpenStreetMap has a whole bunch of geographic data freely available that would go to several gigabytes (planet OSM is about 7.5 Gb, but that is zipped XML so the database size would be quite different).

Gary
At work here (for a cellphone provider) they have dbs in terabytes, but i am not a dba here :(. Thanks for your reply though. I'll see what i can use from it.
A: 

Someone on irc said if you cant handle few terabytes of data then you are still not good enough.

"Handle" meaning what, though?

If it means "Design a backup and restore strategy" then it is much, much more important to understand the internals of Oracle redo, undo, RMAN, and recovery. That is the place to start, and you can work with very smalll data sets to make sure that you have that understanding. Read the documentation, read articles by reputable people, practice, practice, practice.

If it means "Design an indexing strategy" then work on understanding indexes and the cost based optimiser. Again, data volume is not critical here but a solid understanding of the internals will take you a very long way.

In fact whatever it means it is way more important as a DBA to understand the Oracle architecture and the internal workings. Once you have those then you'll be way ahead of 90% of other DBA's out there, and working with Terabytes will not be a challenge.

David Aldridge
@David Thanks. I would get it to right away. Thank you again for your deeper insight.