views:

707

answers:

5

What I'm trying to do is run the same SQL select on many Oracle databases (at least a dozen), and display the output in a Gridview.

I've hacked together something that works but unfortunately it's very slow. I think its exacerbated by the fact that at least 1 of the dozen databases will invariably be unreachable or otherwise in an error state.

As well as being slow I can't help thinking it's not the best way of doing it, nor very '.NET' like.

I've written something similar in the past as a simple loop in PHP that just connects to each db in turn, runs the sql and writes another <tr>, and it works at least twice as fast, for a given query. But I'm not really happy with that, I'd like to improve my knowledge!

I'm learning C# and ASP.NET so please excuse the horrible code :)

public void BindData(string mySQL)
    {
        OracleConnection myConnection;
        OracleDataAdapter TempDataAdapter;
        DataSet MainDataSet = new DataSet();
        DataTable MainDataTable = new DataTable();
        DataSet TempDataSet;
        DataTable TempDataTable;
        string connectionString = "";
        Label1.Visible = false;
        Label1.Text = "";

        foreach (ListItem li in CheckBoxList1.Items)
        {
            if (li.Selected)
            {
                connectionString = "Data Source=" + li.Text + "";
                connectionString += ";Persist Security Info=True;User ID=user;Password=pass;Unicode=True";
                myConnection = new OracleConnection(connectionString);
                try
                {
                    TempDataAdapter = new OracleDataAdapter(mySQL, myConnection);
                    TempDataSet = new DataSet();
                    TempDataTable = new DataTable();
                    TempDataAdapter.Fill(TempDataSet);
                    TempDataTable = TempDataSet.Tables[0].Copy();
                    /* If the main dataset is empty, create a table by cloning from temp dataset, otherwise
                     copy all rows to existing table.*/
                    if (MainDataSet.Tables.Count == 0)
                    {
                        MainDataSet.Tables.Add(TempDataTable);
                        MainDataTable = MainDataSet.Tables[0];
                    }
                    else
                    {
                        foreach (DataRow dr in TempDataTable.Rows)
                        {
                            MainDataTable.ImportRow(dr);
                        }
                    }
                }
                catch (OracleException e)
                {
                    Label1.Visible = true;
                    Label1.Text = Label1.Text + e.Message + " on " + li.Text + "<br>";

                }
                finally
                {
                    if (myConnection != null)
                    {
                        myConnection.Close();
                        myConnection = null;
                    }
                    TempDataSet = null;
                    TempDataAdapter = null;
                    TempDataTable = null;

                }
            }
        }
        GridView1.DataSourceID = String.Empty;
        if (MainDataSet.Tables.Count != 0)
        {
        GridView1.DataSource = MainDataSet;
            if (GridView1.DataSource != null)
            {
                GridView1.DataBind();
            }
        }
    }
    protected void Button1_Click(object sender, EventArgs e)
    {
        BindData(TextBox1.Text);
    }

Thanks!

UPDATE: The SQL code varies, for testing I have used very simple queries such as select sysdate from dual or select name from v$database. In eventual use, it will be much more complicated, the idea is that I should be able to run pretty much anything, hence the BindData(TextBox1.Text)

UPDATE: The reason for connecting to many databases from the ASP.NET code rather than a stored proc on one or all dbs, or replicating to one db, is twofold. Firstly, the dbs in question are frequently updated replicas of several similar production environments (typically development, testing and support for each client), so anything done to the actual dbs would have to be updated or redone regularly as they are reloaded anyway. Secondly, I don't know in advance what kind of query might be run, this form lets me just type e.g. select count (name) from dbusers against a dozen databases without having to first think about replicating the dbusers table to a master db.

+2  A: 

could be a lot of factors causing it to be slow. What's the sql statement being executed that's running slow?

If anyone reading this is using sql server, Scott Mitchell just wrote a nice article to help solve this in sql server: Running the Same Query Against Multiple Databases

Nathan Prather
The SQL code varies, for testing I have used very simple queries such as select sysdate from dual or select name from v$database.
Colin Pickard
Thanks for the article link, that is pretty interesting and the principle carries across for Oracle.
Colin Pickard
(By which I mean you could code your own sp_MsForEachDb equivilent for Oracle)
Colin Pickard
+3  A: 

If you run the DataAdapter.Fill method on a DataTable object the table will be updated with the results from the query. So instead of creating new DataTable and DataSet objects and then copying the DataRows manually you can just add rows to the same table.

Try something like this (in untested C# code):

public void BindData(string mySQL)
{
  OracleConnection myConnection;
  // Empty connection string for now
  OracleDataAdapter MainDataAdapter = new OracleDataAdapter(mySQL, ""); 
  DataTable MainDataTable = new DataTable();
  string connectionString = "";
  Label1.Visible = false;
  Label1.Text = "";

  foreach (ListItem li in CheckBoxList1.Items)
  {
    if (li.Selected)
    {
      connectionString = "Data Source=" + li.Text + "";
      connectionString += ";Persist Security Info=True;User ID=user;Password=pass;Unicode=True";
      MainDataAdapter.SelectCommand.Connection.ConnectionString = connectionString
      try
      {
        MainDataAdapter.Fill(MainDataTable);
      }
      catch (OracleException e)
      {
        Label1.Visible = true;
        Label1.Text = Label1.Text + e.Message + " on " + li.Text + "<br>";
      }
    }
  }
  GridView1.DataSourceID = String.Empty;
  GridView1.DataSource = MainDataTable;
  GridView1.DataBind();
}

I did the following changes:

  • Created one data adapter and assigned it a select command using your mySQL query
  • Gave the connection an empty connection string
  • Created a data table object and removed the data sets (you only need them if your query returns several rows)
  • Changed you loop to just set the connection string of the SelectCommand (you may have to change this to replacing the SelectCommand with a new one)
  • Removed the connection.Close() calls. The DataAdapter does this automatically.

And thats it. If your databases are offline you will still experience slowdowns, but at least the code is simpler and faster since you don't have to copy all the rows between your tables.

One more thing. You can probably set a timeout for the connection in the connection string. Try to lower this one.

Rune Grimstad
very late update... your code is actually almost perfect!if you change:MainDataAdapter.SelectCommand.ConnectionString = connectionStringto:MainDataAdapter.SelectCommand.Connection.ConnectionString = connectionString;it works nicely
Colin Pickard
Oops. Good point! :-) Actually, I think a better approach would be to use MainDataAdapter.SelectCommand.Connection = new OracleConnection(connectionString);
Rune Grimstad
A: 

It sounds like you may be more interested in getting an answer to this more generic question: How can I execute a long running task without hanging the UI (ASP or WinForms)?

The answer to that question is to use multiple Threads. I would perform a long running task like this on a separate thread and show the user a page with the current results (either refreshing automatically or with ajax, etc). You can even get fancy and create tasks for each available processor to get the most out of your machine (using something like the Parallel Extensions); however this increases the complexity significantly and can be hard to get right.

If you haven't worked with Threads in .Net a great tutorial can be found here (by the one and only Jon Skeet)

akmad
I think you misinterpreted the question. The point about the code being slow is partially because the servers may be offline and partially because he manually copies the data rows.
Rune Grimstad
You're right, the code is slow for multiple reasons. However, even if you made the multi-db code faster you're not going to get a meaningful performance increase compared to the time waiting for a single timeout.
akmad
thanks, certainly in addition to getting the code faster it would be beneficial to update the page with the current results - the databases differ in size, and the host servers differ in performance so the time for an identical query can be much different
Colin Pickard
+1  A: 

Why not run a single stored procedure on one oracle database, and have the sproc call the other databases? This is the proper way to work with linked databases.

FlySwat
+1  A: 

Why not use replication to do this...you know, one central database which is pooling for new data from the other databases and just execute your queries over this set of data which is never going to be down.

Jaime Febres
That's perhaps the best way to handle this. All those connections are points of failure. If you transform the data, you can also standardize. Even if its temporary, a local set is much more malleable.
Abyss Knight
Yeah I considered this first. The only problem is that replicating the data means knowing in advance which data I will be querying (or copying everything!)
Colin Pickard