RocketMQ HA模式master节点文件系统readonly后的一些问题

语言: CN / TW / HK

前言

最近身边有个朋友问RocketMQ HA模式master节点文件系统readonly产生的一些问题,所以这里从源码角度分析下master节点文件系统readonly分别对broker端存储消息、HA主从同步以及对客户端的影响(源码版本是4.9.1,测试环境中配置的是同步刷盘同步复制)。

一、broker端存储消息

1.写入内存

当master节点文件系统readonly时,此时如果commitlog的最新mappedFile没有被写满数据,那么此时producer发送的数据会被正常写入内存中,但是当最新的mappedFile被写满了,此时会执行创建mappedFile的操作,此时由于文件系统readonly所以无法成功创建,会报出创建mappedFile的异常,所以消息无法写入内存。

private void init(final String fileName, final int fileSize) throws IOException {
this.fileName = fileName;
this.fileSize = fileSize;
this.file = new File(fileName);
this.fileFromOffset = Long.parseLong(this.file.getName());
boolean ok = false;

ensureDirOK(this.file.getParent());

try {
this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel();
this.mappedByteBuffer = this.fileChannel.map(MapMode.READ_WRITE, 0, fileSize);
TOTAL_MAPPED_VIRTUAL_MEMORY.addAndGet(fileSize);
TOTAL_MAPPED_FILES.incrementAndGet();
ok = true;
} catch (FileNotFoundException e) {
log.error("Failed to create file " + this.fileName, e);
throw e;
} catch (IOException e) {
log.error("Failed to map file " + this.fileName, e);
throw e;
} finally {
if (!ok && this.fileChannel != null) {
this.fileChannel.close();
}
}
}

2.刷盘

由于broker配置文件中配置的是同步刷盘,所以这里提供数据刷盘的service是GroupCommitService,当producer发送的数据被正常写入内存中,那么接着就会执行刷盘操作,在刷盘的过程会报错以下错误日志:

GroupCommitService - Error occurred when force data to disk.
java.io.IOException: Read-only file system
at java.nio.MappedByteBuffer.force0(Native Method) ~[na:1.8.0_242]
at java.nio.MappedByteBuffer.force(MappedByteBuffer.java:203) ~[na:1.8.0_242]
at org.apache.rocketmq.store.MappedFile.flush(MappedFile.java:281) ~[rocketmq-store-4.7.0.jar:4.7.0]
at org.apache.rocketmq.store.MappedFileQueue.flush(MappedFileQueue.java:430) [rocketmq-store-4.7.0.jar:4.7.0]
at org.apache.rocketmq.store.CommitLog$GroupCommitService.doCommit(CommitLog.java:1432) [rocketmq-store-4.7.0.jar:4.7.0]
at org.apache.rocketmq.store.CommitLog$GroupCommitService.run(CommitLog.java:1459) [rocketmq-store-4.7.0.jar:4.7.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

这里刷盘的核心方法如下,由于文件系统readonly,所以这里会导致刷盘失败,这里可以看到源码中对于刷盘失败的异常只是在日志中以ERROR级别打印错误,所以在这种情况下刷盘返回的结果是正常状态。

/**
* @return The current flushed position
*/

public int flush(final int flushLeastPages) {
if (this.isAbleToFlush(flushLeastPages)) {
if (this.hold()) {
int value = getReadPosition();

try {
//We only append data to fileChannel or mappedByteBuffer, never both.
if (writeBuffer != null || this.fileChannel.position() != 0) {
this.fileChannel.force(false);
} else {
this.mappedByteBuffer.force();
}
} catch (Throwable e) {
log.error("Error occurred when force data to disk.", e);
}

this.flushedPosition.set(value);
this.release();
} else {
log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
this.flushedPosition.set(getReadPosition());
}
}
return this.getFlushedPosition();
}

3.HA数据同步

这里可以先大致回顾下HA数据同步的过程,详细解析可以参考笔者之前的文章RocketMQ源码分析之主从数据复制:

(1)启动master并监听slave连接

(2)启动slave并建立与master连接

(3)slave向master发送待拉取数据的物理偏移量

(4)master根据slave请求数据的物理偏移量打包数据并发送给slave

(5)slave读取master发送的数据并唤醒ReputMessageService构建consumequeue

这里我们重点关注master向slave返回的commitlog数据来源,从上述数据过程可以知道,master节点是WriteSocketService负责向slave推送commitlog数据,所以可以从这里入手,在其run方法中,重点关注以下片段,其逻辑是根据nextTransferFromWhere来获取slave想要的数据然后通过transferData发送给slave。

SelectMappedBufferResult selectResult =
HAConnection.this.haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere);
if (selectResult != null) {
int size = selectResult.getSize();
if (size > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize()) {
size = HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize();
}

long thisOffset = this.nextTransferFromWhere;
this.nextTransferFromWhere += size;

selectResult.getByteBuffer().limit(size);
this.selectMappedBufferResult = selectResult;

// Build Header
this.byteBufferHeader.position(0);
this.byteBufferHeader.limit(headerSize);
this.byteBufferHeader.putLong(thisOffset);
this.byteBufferHeader.putInt(size);
this.byteBufferHeader.flip();

this.lastWriteOver = this.transferData();

从getCommitLogData方法可以得知,HA数据同步的master内存中的数据,所以此时它可以将正常写入master内存中的数据同步到slave上。

public SelectMappedBufferResult getCommitLogData(final long offset) {
if (this.shutdown) {
log.warn("message store has shutdown, so getPhyQueueData is forbidden");
return null;
}

return this.commitLog.getData(offset);
}
public SelectMappedBufferResult getData(final long offset) {
return this.getData(offset, offset == 0);
}
public SelectMappedBufferResult getData(final long offset, final boolean returnFirstOnNotFound) {
int mappedFileSize = this.defaultMessageStore.getMessageStoreConfig().getMappedFileSizeCommitLog();
MappedFile mappedFile = this.mappedFileQueue.findMappedFileByOffset(offset, returnFirstOnNotFound);
if (mappedFile != null) {
int pos = (int) (offset % mappedFileSize);
SelectMappedBufferResult result = mappedFile.selectMappedBuffer(pos);
return result;
}

return null;
}

当master文件系统恢复后,由于master重启会导致内存中的数据丢失,即master会落后于slave,所以会在master的日志中看到“Slave fall behind master: XXX bytes”,该值是个负数。

4.构建consumequeue

在RocketMQ中consumequeue这块有两个相关的服务,其中ReputMessageService负责构建consumequeue,FlushConsumeQueueService负责consumequeue刷盘。当master节点的文件系统readonly且有数据被正常写入内存时,此时ReputMessageService会为该消息构建consumequeue,在构建consumequeue时会先计算出待写入数据在consumequeue文件中的位置,如果此时待写入位置的文件已经存在,则会执行appendMessage,appendMessage方法具体如下:

public boolean appendMessage(final byte[] data) {
int currentPos = this.wrotePosition.get();

if ((currentPos + data.length) <= this.fileSize) {
try {
this.fileChannel.position(currentPos);
this.fileChannel.write(ByteBuffer.wrap(data));
} catch (Throwable e) {
log.error("Error occurred when append message to mappedFile.", e);
}
this.wrotePosition.addAndGet(data.length);
return true;
}

return false;
}

此时由于文件系统readonly,所以导致appendMessage方法中的this.fileChannel.write(ByteBuffer.wrap(data));操作会出现异常,同时在日志中可以看到以下错误:

ReputMessageService - Error occurred when append message to mappedFile.
java.io.IOException: Read-only file system
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_242]
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[na:1.8.0_242]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_242]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_242]
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[na:1.8.0_242]
at org.apache.rocketmq.store.MappedFile.appendMessage(MappedFile.java:240) ~[rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.ConsumeQueue.putMessagePositionInfo(ConsumeQueue.java:474) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.ConsumeQueue.putMessagePositionInfoWrapper(ConsumeQueue.java:398) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore.putMessagePositionInfo(DefaultMessageStore.java:1478) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore$CommitLogDispatcherBuildConsumeQueue.dispatch(DefaultMessageStore.java:1538) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore.doDispatch(DefaultMessageStore.java:1472) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.doReput(DefaultMessageStore.java:1910) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.run(DefaultMessageStore.java:1968) [rocketmq-store-4.9.1.jar:4.9.1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

但是如果此时该文件需要被创建,那么同样会面临创建文件失败的异常。

当内存中存在还未刷盘的consumequeue时,FlushConsumeQueueService负责将其及时刷盘,由于此时文件系统是readonly状态,所以会在日志中看到以下错误:

ERROR FlushConsumeQueueService - Error occurred when force data to disk.
java.io.IOException: Read-only file system
at sun.nio.ch.FileDispatcherImpl.force0(Native Method) ~[na:1.8.0_242]
at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:80) ~[na:1.8.0_242]
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:388) ~[na:1.8.0_242]
at org.apache.rocketmq.store.MappedFile.flush(MappedFile.java:285) ~[rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.MappedFileQueue.flush(MappedFileQueue.java:430) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.ConsumeQueue.flush(ConsumeQueue.java:325) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore$FlushConsumeQueueService.doFlush(DefaultMessageStore.java:1806) [rocketmq-store-4.9.1.jar:4.9.1]
at org.apache.rocketmq.store.DefaultMessageStore$FlushConsumeQueueService.run(DefaultMessageStore.java:1832) [rocketmq-store-4.9.1.jar:4.9.1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

报错的核心代码如下,具体是this.fileChannel.force(false);导致的:

/**
* @return The current flushed position
*/

public int flush(final int flushLeastPages) {
if (this.isAbleToFlush(flushLeastPages)) {
if (this.hold()) {
int value = getReadPosition();

try {
//We only append data to fileChannel or mappedByteBuffer, never both.
if (writeBuffer != null || this.fileChannel.position() != 0) {
this.fileChannel.force(false);
} else {
this.mappedByteBuffer.force();
}
} catch (Throwable e) {
log.error("Error occurred when force data to disk.", e);
}

this.flushedPosition.set(value);
this.release();
} else {
log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
this.flushedPosition.set(getReadPosition());
}
}
return this.getFlushedPosition();
}

二、客户端

1.producer

当master文件系统readonly后,producer继续发送消息,如果此时master上commitlog最新mappedFIle没有被写满,那么此时数据会正常写入内存,结合前面分析对刷盘和HA数据同步的影响,可以发现producer会收到SEND_OK的状态。

2.consumer

当master文件系统readonly后,kill master的进程,consumer会从slave上消费数据

作者简介

孙玺,中国民生银行信息科技部开源软件支持组工程师,目前主要负责RocketMQ源码研究和工具开发等相关工作。