tags:

views:

271

answers:

1

Hi to all, I am hoping someone can give me pointers as to where I'm going wrong with clustering 3 servers with MySQL Cluster 7.1 with multiple management nodes.

Currently, the cluster works perfectly with one management node. This is the setup:

  1. First server runs only an instance of ndb_mgmd (192.168.66.114)
  2. Second server runs an instance of ndbd and mysqld (192.168.66.2)
  3. Third server runs an instance of ndbd and mysqld (192.168.66.113)

I want to introduce a second management node into the cluster. I have exactly the same config.ini for both managements servers. Here it is:

[NDBD DEFAULT]
NoOfReplicas=2

[MYSQLD DEFAULT]

[NDB_MGMD DEFAULT]
PortNumber=1186
datadir=c:/Progra~1/mysql-cluster-gpl-7.1.3-win32
LogDestination=FILE:filename=c:/Progra~1/mysql-cluster-gpl-7.1.3-win32/clusterlog.log


[TCP DEFAULT]

# Management Server
[NDB_MGMD]
Id=1
HostName=192.168.66.114
ArbitrationRank=1

[NDB_MGMD]
Id=6
HostName=192.168.66.2
ArbitrationRank=2

# Storage Engines
[NDBD]
Id=2
HostName=192.168.66.2
DataDir= D:/AppData/ndb-7.1.3

[NDBD]
Id=3
HostName=192.168.66.113
DataDir= D:/AppData/ndb-7.1.3

[MYSQLD]
Id=4
HostName=192.168.66.2

[MYSQLD]
Id=5
HostName=192.168.66.113

When I start the ndb_mgmd instances on both servers and issue a show command in ndb_mgm, on the first management server I see that it's started:

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.66.2  (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0, Master)
id=3    @192.168.66.113  (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @192.168.66.114  (mysql-5.1.44 ndb-7.1.3)
id=6 (not connected, accepting connect from 192.168.66.2)

[mysqld(API)]   2 node(s)
id=4    @192.168.66.2  (mysql-5.1.44 ndb-7.1.3)
id=5    @192.168.66.113  (mysql-5.1.44 ndb-7.1.3)

ndb_mgm>

I am yet to start the second management instance on the second management server, so the following lines is perfectly OK (from above ndb_mgm output):

id=6 (not connected, accepting connect from 192.168.66.2)

Then, I go to the second management server (192.168.66.2), and start ndb_mgmd. After starting it, I issue a show command against it:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from 192.168.66.2)
id=3 (not connected, accepting connect from 192.168.66.113)

[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 192.168.66.114)
id=6    @192.168.66.2  (mysql-5.1.44 ndb-7.1.3)

[mysqld(API)]   2 node(s)
id=4 (not connected, accepting connect from 192.168.66.2)
id=5 (not connected, accepting connect from 192.168.66.113)

ndb_mgm>

Instead of listing both management nodes as connected, the second management node just reports that it itself is connected. Going back to the first management server at 192.168.66.114 still gives the same output as before starting the second ndb_mgmd, i.e. ONLY management node at 192.168.66.114 is connected:

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.66.2  (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0, Master)
id=3    @192.168.66.113  (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @192.168.66.114  (mysql-5.1.44 ndb-7.1.3)
id=6 (not connected, accepting connect from 192.168.66.2)

[mysqld(API)]   2 node(s)
id=4    @192.168.66.2  (mysql-5.1.44 ndb-7.1.3)
id=5    @192.168.66.113  (mysql-5.1.44 ndb-7.1.3)

ndb_mgm>

I've spent many hours now trying to figure out what's wrong, but to no avail. Please also take a look at the ndb_mgmd log file of the first management server, excerpt of which is taken immediately after starting the second ndb_mgmd at 192.168.66.2:

2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Reading cluster configuration from 'c:/Progra~1/mysql-cluster-gpl-7.1.3-win32/config.ini'
2010-05-21 16:05:04 [MgmtSrvr] WARNING  -- at line 45: Cluster configuration warning:
  arbitrator with id 6 and db node with id 2 on same host 192.168.66.2
  Running arbitrator on the same host as a database node may
  cause complete cluster shutdown in case of host failure.
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Config equal!
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.66.114, m_reserved_nodes 1.
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 127.0.0.1:3727: Connected!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- Sending CONFIG_CHECK_REQ to 1
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- Got CONFIG_CHECK_REQ from node: 1. Our generation: 1, other generation: 1, our state: 2, other state: 2, our checksum: 0xc7202738, other checksum: 0xc7202738
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- Send CONFIG_CHECK_CONF to node: 1
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- Got CONFIG_CHECK_CONF from node: 1
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.113:51051: Connected!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.2:65492: Connected!
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Node 1: Node 6 Connected
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Node 6 connected
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- Sending CONFIG_CHECK_REQ to 6
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- Got CONFIG_CHECK_CONF from node: 6
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Node 1: Node 3 Connected
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.113:51051: Stopped!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.113:51051: Disconnected!
2010-05-21 16:05:04 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.2:65492: Stopped!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.2:65492: Disconnected!
2010-05-21 16:05:05 [MgmtSrvr] INFO     -- Node 3: Prepare arbitrator node 1 [ticket=16800008ebadb656]
2010-05-21 16:05:05 [MgmtSrvr] INFO     -- Node 2: Started arbitrator node 1 [ticket=16800008ebadb656]

Personally, I find the following two lines from the above output as interesting:

2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.2:65492: Stopped!
2010-05-21 16:05:04 [MgmtSrvr] DEBUG    -- 192.168.66.2:65492: Disconnected!

There's no error message though, it just says Stopped and Disconnected.

Can anyone figure out what's wrong with my setup? Any help would be much, much appreciated.

A: 

Guys, this one actually fixed itself. Don't know why, but later today the second management node started connecting properly without my intervention.

Boyan Georgiev