tags:

views:

523

answers:

3

I have a group of erlang nodes that are replicating their data through Mnesia's "extra_db_nodes"... I need to upgrade hardware and software so I have to detach some nodes as I make my way from node to node.

How does one remove a node and still preserve the data that was inserted?

[update] removing nodes is as important as adding them. Over time as your cluster grows it must also contract. If not then Mnesia is going to be busy trying to send data to nonexistent nodes filling up queues and keeping the network busy.

[final update] after pouring through the erlang/mnesia source code I was able to determine that it is not possible to completely disassociate nodes. While del_table_copy removes the linkage between tables it is incomplete. I would close this question but none of the close descriptions are adequate.

A: 

I am new to mnesia, so no idea, but this might be of some help:

How to add a node to an mnesia cluster?

Toby Hede
-1 for phishing for points. Adding was never the issue and the other post never made mention.
Richard
I was just figuring that this had been open for a while, I was reading up on mnesia last night, noticed this post about thought that *removing* a node might be similar to *adding* a node. But you know, whatevs.
Toby Hede
A: 

If you have replicated the table (added table copies) on nodes other than the one you're removing, then you're already fine - just remove the node.

If you wanted to be slightly tidier you'd delete the table copies from the node you're about to remove first via mnesia:del_table_copy/2.

Generally, mnesia gracefully handles node loss and detects node rejoin (rebooted nodes obtain new table copies from nodes that kept running, nodes that didn't reboot are detected as a network partition event). Mnesia does not consume CPU or network traffic for nodes that have gone down. I think, though I haven't confirmed it in the source, mnesia won't reconnect to nodes that have gone down automatically - the node that goes down is expected to reboot (mnesia) and reconnect.

mnesia:add_table_copy/3, mnesia:move_table_copy/3 and mnesia:del_table_copy/2 are the functions you should look at for live schema management.

The extra_db_nodes parameter should only be used when initialising a new DB node - once a new node has a copy of the schema it doesn't need the extra_db_nodes parameter.

archaelus
I'm on the fence with this answer. I like the general information, however, it's not current. The three methods you mention are not included in the R13B release. A search of the R13A code did not reveal any similar methods.
Richard
Continuing my search of the source I found some indication that there is a call to mnesia_controller:add_list/2 that is used when adding the extra node. There is a comment that suggests calling mnesia_recover:disconnect_nodes/1, however, that method does not exist anywhere and might simply be a typo; mnesia_recover:disconnect/1 exists.
Richard
I said delete_table_copy instead of del_table copy, but apart from that those methods are present, documented and current.You shouldn't have to disconnect nodes by hand - mnesia handles node disconnection by itself. Just turn off the unwanted nodes. Or use net_kernel:disconnect/1 to do it forcibly.
archaelus
del_table_copy does not remove the node from the extra_db_nodes list.
Richard
Sure, but why are they in the extra_db_nodes list in the first place? You only need extra_db_nodes while joining the cluster - after that it's more of a hinderance than anything else. You can change_config to alter extra_db_nodes at runtime, but why do that? You shouldn't be specifying extra_db_nodes in normal operation, so your problem is not how to delete thing from extra_db_nodes when removing nodes, but how to avoid using extra_db_nodes at any point after a new node joins a cluster.
archaelus
+1  A: 

I've certainly used this method to perform this (supporting the mnesia:del_table_copy/2 use). See removeNode/1 below:

-module(tool_bootstrap).

-export([bootstrapNewNode/1, closedownNode/0,
     finalBootstrap/0, removeNode/1]).

-include_lib("records.hrl").

-include_lib("stdlib/include/qlc.hrl").

bootstrapNewNode(Node) ->
    %% Make the given node part of the family and start the cloud on it
    mnesia:change_config(extra_db_nodes, [Node]),
    %% Now make the other node set things up
    rpc:call(Node, tool_bootstrap, finalBootstrap, []).

removeNode(Node) ->
    rpc:call(Node, tool_bootstrap, closedownNode, []),
    mnesia:del_table_copy(schema, Node).

finalBootstrap() ->
    %% Code removed to actually copy over my tables etc...
    application:start(cloud).

closedownNode() ->
    application:stop(cloud), mnesia:stop().
Alan Moore
while this code may have appeared to work it does not cleanup all of the data. del_table_copy does not remove the node from the extra_db_node list. In fact there is no code in the source that completely removes the node.
Richard
Yes you're right. I removed all of the code specific to my application for clarity...
Alan Moore
The source code I was referring to is in the Mnesia library.
Richard
Ah - yes. I should read the original question. My solution certainly had the effect I desired - that of no longer having my (deleted) node host a replicated copy of the data. But I never checked what extra_db_nodes was set to after the change...
Alan Moore