I don't have any experience with XNA and 3D, but I will give you some advice for 2D games. I spent some time creating a tile engine in XNA at the beginning of this year and wondered the same thing. I think the short answer here is yes, combining your tiles into a larger sprite is a good idea if you're worried about performance. However, there is a much longer answer if you're interested.
In general, when it comes to performance optimizations, the answer is almost always, "don't do it." If you're sure you need to optimize for performance, the next answer is almost always, "don't do it yet." Finally, if you do attempt to optimize for performance, the most important thing you can do is use benchmarks to gather precise measurements of the performance before and after the changes. Without that, you don't know if you're succeeding!
Now, related more to 2D games, I learned that I saw a lot better performance in my tile engine the less I switched textures. For example, let's say I have a grass tile and a gravel tile. If these are two separate textures in memory, and I draw a grass tile, then a gravel tile, then a grass tile to the screen, the GPU will load the grass texture, then switch it out to load the gravel texture, then switch the grass texture back in to draw another grass tile. This kills performance really quickly! The easiest way to get around this is to have a spritesheet, where you put your grass and gravel tiles into one texture, and just tell SpriteBatch to draw from a different area on the same texture each time.
Another thing to consider is how many tiles you're going to be drawing on the screen at once. I can't remember specifically, but I was drawing thousands of tiles at once (in a zoomed out view). I noticed that when I used bigger tiles and drew fewer of them, as you are suggesting in your question, that performance also improved. This wasn't as big of an improvement as what I described in the last paragraph though, and I would still encourage you to measure the performance changes resulting from different implementations. Also, if you're only drawing a dozen or few hundred tiles it may not be worth it to try to optimize that right now (see the 2nd paragraph).
Just so you know I'm not completely making this up, here's a link to a post from Shawn Hargreaves on texture swapping, spritesheets, etc. There are probably better posts on the XNA forums as well as Shawn Hargreaves' blog if you search on the topic.
http://forums.xna.com/forums/p/24254/131437.aspx#131437
Update:
Since you updated your question, let me update my post. I decided to just benchmark some samples to give you an idea of what the performance differences might be. In my Draw() function I have the following:
GraphicsDevice.Clear(Color.CornflowerBlue);
Stopwatch sw = new Stopwatch();
sw.Start();
spriteBatch.Begin();
#if !DEBUG
spriteBatch.Draw(tex, new Rectangle(0, 0,
GraphicsDevice.Viewport.Width,
GraphicsDevice.Viewport.Height),
Color.White);
#else
for (int i = 0; i < 128; i++)
for (int j = 0; j < 72; j++)
{
Rectangle r = new Rectangle(i * 10, j * 10, 10, 10);
spriteBatch.Draw(tex, r, r, Color.White);
}
#endif
spriteBatch.End();
sw.Stop();
if (draws > 60)
{
numTicks += sw.ElapsedTicks;
}
draws++;
if (draws % 100 == 0)
Console.WriteLine("avg ticks: " + numTicks / (draws - 60));
base.Draw(gameTime);
Just drop the exclamation point in the "#if !DEBUG" statement to switch between the two methods. I skipped the first 60 draws because they included some initial setup (not really sure what) and were skewing the averages. I downloaded one 1280x720 image, and for the top test case I just drew it once. For the bottom test case I drew the one source image in tiles, 128x72 like you asked about in your question. Here are the results.
Drawing one image:
avg ticks: 68566
avg ticks: 73668
avg ticks: 82659
avg ticks: 81654
avg ticks: 81104
avg ticks: 84664
avg ticks: 86626
avg ticks: 88211
avg ticks: 87677
avg ticks: 86694
avg ticks: 86713
avg ticks: 88116
avg ticks: 89380
avg ticks: 92158
Drawing 128x72 tiles:
avg ticks: 7902592
avg ticks: 8052819
avg ticks: 8012696
avg ticks: 8008819
avg ticks: 7985545
avg ticks: 8028217
avg ticks: 8046837
avg ticks: 8291755
avg ticks: 8309384
avg ticks: 8336120
avg ticks: 8320260
avg ticks: 8322150
avg ticks: 8381845
avg ticks: 8364629
As you can see, there's a couple order of magnitude difference there, so it's pretty significant. It's pretty simple to test this kind of thing, and I'd recommend you run your own benchmarks for your specific setup to take into account something I might've missed.