服务器开发之 Daemon 和 Keepalive

首页 › UC › 服务器开发之 Daemon 和 Keepalive

第七根弦 2015 年 4 月 9 日 UC 发表评论 (0)

由于业务开发需要，需要对数据库代理进行研究，在研究 MySQL Proxy 实现原理的过程中，对一些功能点进行了分析总结。本文主要讲解下 MySQL Proxy的 daemon 和 keepalive 功能实现原理。

MySQL Proxy 是数据库代理实现中的一种，提供了 MySQL server 与 MySQL client 之间的通信功能。由于 MySQL Proxy 使用的是 MySQL 网络协议，故其可以在不做任何修改的情况下，配合任何符合该协议的且与 MySQL 兼容的客户端一起使用。在最基本的配置下，MySQL Proxy 仅仅是简单地将自身置于服务器和客户端之间，负责将 query 从客户端传递到服务器，再将来自服务器的应答返回给相应的客户端。在高级配置下，MySQL Proxy 可以用来监视和改变客户端和服务器之间的通信。查询注入(query interception) 功能允许你按需要添加性能分析命令 (profiling) ，且可以通过 Lua 脚本语言对注入的命令进行脚本化控制。

本文不讨论 MySQL Proxy 作为数据库代理在功能上和实践中的优劣，而是着重讲述其源码实现中的两个功能点：daemon 功能和 keepalive 功能。

通过命令行启动 MySQL Proxy 时经常会用到如下两个配置项：–daemon 和 –keepalive 。在其相应的帮助命令中的解释为：

–daemon Start in daemon-mode
–keepalive try to restart the proxy if it crashed

keepalive 功能从字面理解为提供保活功能， daemon 为守护进程。但 daemon 的功能究竟是如何定义的呢？ APUE 上的定义如下： 守护进程也称 daemon 进程，是生存期较长的一种进程，它们常常在系统自举时启动，仅在系统关闭时才终止。因为它们没有控制终端，所以说它们是再后台运行的。

【Daemon 功能实现】

首先，讲解下 daemon 实现的基本原则。事实上，编写守护进程程序时是存在一些基本规则的，目的是防止产生不需要的交互作用（比如与终端的交互）。规则如下：

调用 umask 将文件模式创建屏蔽字设置为 0 。原因：防止继承得来的文件模式创建屏蔽字会拒绝设置某些权限的情况。
调用 fork ，然后使父进程退出（exit）。原因：第一，令启动 daemon 进程的 shell 认为命令已经执行完毕；第二，令产生的子进程不是其所在进程组的组长。
调用 setsid 以创建一个新会话。原因：使调用进程，第一，成为新会话的首进程；第二，成为新进程组的组长进程；第三，没有控制终端（在基于 System V 的系统中可以通过 fork 两次来达到防止取得控制终端的效果的，其不再需要下面的规则6）。
将当前工作目录更改为根目录。原因：防止出现不能 umount 的问题。
关闭不再需要的文件描述符。原因：令守护进程不再持有从父进程继承来的某些文件描述符。
某些守护进程打开 /dev/null 使其具有文件描述符0、1和2。原因：防止守护进程与终端设备相关联。

有了上面的原则，现在对照下 MySQL Proxy 中的代码：

/**
 * start the app in the background 
 * 
 * UNIX-version
 */
void chassis_unix_daemonize(void) {
#ifdef _WIN32
    g_assert_not_reached(); /* shouldn't be tried to be called on win32 */
#else
#ifdef SIGTTOU
    signal(SIGTTOU, SIG_IGN);
#endif
#ifdef SIGTTIN
    signal(SIGTTIN, SIG_IGN);
#endif
#ifdef SIGTSTP
    signal(SIGTSTP, SIG_IGN);
#endif
    if (fork() != 0) exit(0);
     
    if (setsid() == -1) exit(0);
 
    signal(SIGHUP, SIG_IGN);
 
    if (fork() != 0) exit(0);
     
    chdir("/");
     
    umask(0);
#endif
}

/**

* start the app in the background

* UNIX-version

void chassis_unix_daemonize(void) {

#ifdef _WIN32

g_assert_not_reached(); /* shouldn't be tried to be called on win32 */

#else

#ifdef SIGTTOU

signal(SIGTTOU, SIG_IGN);

#endif

#ifdef SIGTTIN

signal(SIGTTIN, SIG_IGN);

#endif

#ifdef SIGTSTP

signal(SIGTSTP, SIG_IGN);

#endif

if (fork() != 0) exit(0);

if (setsid() == -1) exit(0);

signal(SIGHUP, SIG_IGN);

if (fork() != 0) exit(0);

chdir("/");

umask(0);

#endif

}

从上面的实现代码中，可以看出以下几点：

代码执行的先后顺序有的是必须的（如setsid 之前的 fork），有的不是必须的（如 umask 放在最后执行）。
实现中使用了两次 fork ，为 System V 中理念。
在 setsid 和第二次 fork 之间插入了 signal 处理，用于对 SIGHUP 执行 SIG_IGN 处理。

在上述 6 条 daemon 编程规则中没有提到 signal 处理的问题，那么针对 SIGHUP 的处理代表的是什么意思呢？还是参阅 APUE ：

如果终端接口检测到一个连接断开，则将此信号发送给与该终端相关的控制进程（会话首进程）。仅当终端的 CLOCAL 标志没有设置时，上述条件下才产生此信号。

有别于由终端正常产生的信号（如中断、退出和挂起）– 这些信号总是传递给前台进程组 — SIGHUP 信号可以发送到位于后台运行的会话首进程。SIGHUP信号的默认处理动作是终止当前进程。通常会使用该信号来通知守护进程，以重新读取它们的配置文件，因为守护进程不会有控制终端，而且通常决不会收到这种信号。

从上面这段文字可以看出，这里增加了 signal 信号处理的原因是，在 setsid 和第二次 fork 之间，当前的子进程仍旧是会话首进程，有可能会在收到SIGHUP 信号时终止，所以这里通过设置 SIG_IGN 进行忽略。

至此，一个 daemon-mode 的守护进程就启动了。

【Keepalive 功能实现】

下面讲解下 keepalive 功能的实现。简单的说，MySQL Proxy 的服务器编程模型为：1个 daemon 父进程 + 一个工作子进程（在其中可以再启动 n 个工作线程）。而 keepalive 的功能就是要求 daemon 进程在发现工作子进程被异常终结后，能够重新启动该子进程。

首先讲下 daemon 进程中的实现代码，其主要实现的功能为：

fork 一个工作子进程，并通过 waitpid 阻塞方式获取子进程的退出状态信息，若子进程为正常退出，即 exit-code 为 0 时，则守护进程也正常退出；若信号导致子进程正常退出，则守护进程同样正常退出；若信号导致子进程异常退出，则在延时2s后，由daemon进程重新启动子进程；对于 SIGSTOP 信号按照系统默认处理；
将发送给 daemon 进程的 SIGINT/SIGTERM/SIGHUP/SIGUSR1/SIGUSR2 信号通过信号处理函数 chassis_unix_signal_forward() 转发到 daemon 所在进程组中的所有进程（这里就是为了发送给子进程）。

/**
 * forward the signal to the process group, but not us
 */
static void chassis_unix_signal_forward(int sig) {
#ifdef _WIN32
    g_assert_not_reached(); /* shouldn't be tried to be called on win32 */
#else
    signal(sig, SIG_IGN); /* we don't want to create a loop here */
 
    kill(0, sig);
#endif
}
 
/**
 * keep the ourself alive 
 *
 * if we or the child gets a SIGTERM, we quit too
 * on everything else we restart it
 */
int chassis_unix_proc_keepalive(int *child_exit_status) {
#ifdef _WIN32
    g_assert_not_reached(); /* shouldn't be tried to be called on win32 */
    return 0; /* for VC++, to silence a warning */
#else
    int nprocs = 0;
    pid_t child_pid = -1;
 
    /* we ignore SIGINT and SIGTERM and just let it be forwarded to the child instead
     * as we want to collect its PID before we shutdown too 
     *
     * the child will have to set its own signal handlers for this
     */
 
    for (;;) {
        /* try to start the children */
        while (nprocs < 1) {
            pid_t pid = fork();
 
            if (pid == 0) {
                /* child */
                 
                g_debug("%s: we are the child: %d",
                        G_STRLOC,
                        getpid());
                return 0;
            } else if (pid < 0) {
                /* fork() failed */
 
                g_critical("%s: fork() failed: %s (%d)",
                    G_STRLOC,
                    g_strerror(errno),
                    errno);
 
                return -1;
            } else {
                /* we are the angel, let's see what the child did */
                g_message("%s: [angel] we try to keep PID=%d alive",
                        G_STRLOC,
                        pid);
 
                /* forward a few signals that are sent to us to the child instead */
                signal(SIGINT, chassis_unix_signal_forward);
                signal(SIGTERM, chassis_unix_signal_forward);
                signal(SIGHUP, chassis_unix_signal_forward);
                signal(SIGUSR1, chassis_unix_signal_forward);
                signal(SIGUSR2, chassis_unix_signal_forward);
 
                child_pid = pid;
                nprocs++;
            }
        }
 
        if (child_pid != -1) {
            struct rusage rusage;
            int exit_status;
            pid_t exit_pid;
 
            g_debug("%s: waiting for %d",
                    G_STRLOC,
                    child_pid);
#ifdef HAVE_WAIT4
            exit_pid = wait4(child_pid, &exit_status, 0, &rusage);
#else
            memset(&rusage, 0, sizeof(rusage)); /* make sure everything is zero'ed out */
            exit_pid = waitpid(child_pid, &exit_status, 0);
#endif
            g_debug("%s: %d returned: %d",
                    G_STRLOC,
                    child_pid,
                    exit_pid);
 
            if (exit_pid == child_pid) {
                /* our child returned, let's see how it went */
                if (WIFEXITED(exit_status)) {
                    g_message("%s: [angel] PID=%d exited normally with exit-code = %d (it used %ld kBytes max)",
                            G_STRLOC,
                            child_pid,
                            WEXITSTATUS(exit_status),
                            rusage.ru_maxrss / 1024);
                    if (child_exit_status) *child_exit_status = WEXITSTATUS(exit_status);
                    return 1;
                } else if (WIFSIGNALED(exit_status)) {
                    int time_towait = 2;
                    /* our child died on a signal
                     *
                     * log it and restart */
 
                    g_critical("%s: [angel] PID=%d died on signal=%d (it used %ld kBytes max) ... waiting 3min before restart",
                            G_STRLOC,
                            child_pid,
                            WTERMSIG(exit_status),
                            rusage.ru_maxrss / 1024);
 
                    /**
                     * to make sure we don't loop as fast as we can, sleep a bit between 
                     * restarts
                     */
     
                    signal(SIGINT, SIG_DFL);
                    signal(SIGTERM, SIG_DFL);
                    signal(SIGHUP, SIG_DFL);
                    while (time_towait > 0) time_towait = sleep(time_towait);
 
                    nprocs--;
                    child_pid = -1;
                } else if (WIFSTOPPED(exit_status)) {
                } else {
                    g_assert_not_reached();
                }
            } else if (-1 == exit_pid) {
                /* EINTR is ok, all others bad */
                if (EINTR != errno) {
                    /* how can this happen ? */
                    g_critical("%s: wait4(%d, ...) failed: %s (%d)",
                        G_STRLOC,
                        child_pid,
                        g_strerror(errno),
                        errno);
 
                    return -1;
                }
            } else {
                g_assert_not_reached();
            }
        }
    }
#endif
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

/**

* forward the signal to the process group, but not us

static void chassis_unix_signal_forward(int sig) {

#ifdef _WIN32

g_assert_not_reached(); /* shouldn't be tried to be called on win32 */

#else

signal(sig, SIG_IGN); /* we don't want to create a loop here */

kill(0, sig);

#endif

}

/**

* keep the ourself alive

* if we or the child gets a SIGTERM, we quit too

* on everything else we restart it

int chassis_unix_proc_keepalive(int *child_exit_status) {

#ifdef _WIN32

g_assert_not_reached(); /* shouldn't be tried to be called on win32 */

return 0; /* for VC++, to silence a warning */

#else

int nprocs = 0;

pid_t child_pid = -1;

/* we ignore SIGINT and SIGTERM and just let it be forwarded to the child instead

* as we want to collect its PID before we shutdown too

* the child will have to set its own signal handlers for this

for (;;) {

/* try to start the children */

while (nprocs < 1) {

pid_t pid = fork();

if (pid == 0) {

/* child */

g_debug("%s: we are the child: %d",

G_STRLOC,

getpid());

return 0;

} else if (pid < 0) {

/* fork() failed */

g_critical("%s: fork() failed: %s (%d)",

G_STRLOC,

g_strerror(errno),

errno);

return -1;

} else {

/* we are the angel, let's see what the child did */

g_message("%s: [angel] we try to keep PID=%d alive",

G_STRLOC,

pid);

/* forward a few signals that are sent to us to the child instead */

signal(SIGINT, chassis_unix_signal_forward);

signal(SIGTERM, chassis_unix_signal_forward);

signal(SIGHUP, chassis_unix_signal_forward);

signal(SIGUSR1, chassis_unix_signal_forward);

signal(SIGUSR2, chassis_unix_signal_forward);

child_pid = pid;

nprocs++;

}

if (child_pid != -1) {

struct rusage rusage;

int exit_status;

pid_t exit_pid;

g_debug("%s: waiting for %d",

G_STRLOC,

child_pid);

#ifdef HAVE_WAIT4

exit_pid = wait4(child_pid, &exit_status, 0, &rusage);

#else

memset(&rusage, 0, sizeof(rusage)); /* make sure everything is zero'ed out */

exit_pid = waitpid(child_pid, &exit_status, 0);

#endif

g_debug("%s: %d returned: %d",

G_STRLOC,

child_pid,

exit_pid);

if (exit_pid == child_pid) {

/* our child returned, let's see how it went */

if (WIFEXITED(exit_status)) {

g_message("%s: [angel] PID=%d exited normally with exit-code = %d (it used %ld kBytes max)",

G_STRLOC,

child_pid,

WEXITSTATUS(exit_status),

rusage.ru_maxrss / 1024);

if (child_exit_status) *child_exit_status = WEXITSTATUS(exit_status);

return 1;

} else if (WIFSIGNALED(exit_status)) {

int time_towait = 2;

/* our child died on a signal

* log it and restart */

g_critical("%s: [angel] PID=%d died on signal=%d (it used %ld kBytes max) ... waiting 3min before restart",

G_STRLOC,

child_pid,

WTERMSIG(exit_status),

rusage.ru_maxrss / 1024);

/**

* to make sure we don't loop as fast as we can, sleep a bit between

* restarts

signal(SIGINT, SIG_DFL);

signal(SIGTERM, SIG_DFL);

signal(SIGHUP, SIG_DFL);

while (time_towait > 0) time_towait = sleep(time_towait);

nprocs--;

child_pid = -1;

} else if (WIFSTOPPED(exit_status)) {

} else {

g_assert_not_reached();

}

} else if (-1 == exit_pid) {

/* EINTR is ok, all others bad */

if (EINTR != errno) {

/* how can this happen ? */

g_critical("%s: wait4(%d, ...) failed: %s (%d)",

G_STRLOC,

child_pid,

g_strerror(errno),

errno);

return -1;

}

} else {

g_assert_not_reached();

}

#endif

}

其次讲解工作子进程中的实现代码，其主要实现的功能为：

通过 libevent 提供的接口设置对 SIGTERM/SIGINT/SIGHUP 三个信号的处理，通过 libevent 的信号处理方式可以做到，将I/O事件、Timer事件和信号事件统一按event-driven方式进行处理的目的，这样，一旦工作子进程检测到相应的信号，就会将控制变量signal_shutdown设置为1，进而令循环终止。

void chassis_set_shutdown_location(const gchar* location) {
    if (signal_shutdown == 0) g_message("Initiating shutdown, requested from %s", (location != NULL ? location : "signal handler"));
    signal_shutdown = 1;
}
 
gboolean chassis_is_shutdown() {
    return signal_shutdown == 1;
}
 
static void sigterm_handler(int G_GNUC_UNUSED fd, short G_GNUC_UNUSED event_type, void G_GNUC_UNUSED *_data) {
    chassis_set_shutdown_location(NULL);
}
 
static void sighup_handler(int G_GNUC_UNUSED fd, short G_GNUC_UNUSED event_type, void *_data) {
    chassis *chas = _data;
 
    g_message("received a SIGHUP, closing log file"); /* this should go into the old logfile */
 
    chassis_log_set_logrotate(chas->log);
     
    g_message("re-opened log file after SIGHUP"); /* ... and this into the new one */
}

void chassis_set_shutdown_location(const gchar* location) {

if (signal_shutdown == 0) g_message("Initiating shutdown, requested from %s", (location != NULL ? location : "signal handler"));

signal_shutdown = 1;

}

gboolean chassis_is_shutdown() {

return signal_shutdown == 1;

}

static void sigterm_handler(int G_GNUC_UNUSED fd, short G_GNUC_UNUSED event_type, void G_GNUC_UNUSED *_data) {

chassis_set_shutdown_location(NULL);

}

static void sighup_handler(int G_GNUC_UNUSED fd, short G_GNUC_UNUSED event_type, void *_data) {

chassis *chas = _data;

g_message("received a SIGHUP, closing log file"); /* this should go into the old logfile */

chassis_log_set_logrotate(chas->log);

g_message("re-opened log file after SIGHUP"); /* ... and this into the new one */

}

int chassis_mainloop(void *_chas) {
    chassis *chas = _chas;
    guint i;
    struct event ev_sigterm, ev_sigint;
#ifdef SIGHUP
    struct event ev_sighup;
#endif
    chassis_event_thread_t *mainloop_thread;
 
    /* redirect logging from libevent to glib */
    event_set_log_callback(event_log_use_glib);
 
 
    /* add a event-handler for the "main" events */
    mainloop_thread = chassis_event_thread_new();
    chassis_event_threads_init_thread(chas->threads, mainloop_thread, chas);
    chassis_event_threads_add(chas->threads, mainloop_thread);
 
    chas->event_base = mainloop_thread->event_base; /* all global events go to the 1st thread */
 
    g_assert(chas->event_base);
 
 
    /* setup all plugins all plugins */
    for (i = 0; i < chas->modules->len; i++) {
        chassis_plugin *p = chas->modules->pdata[i];
 
        g_assert(p->apply_config);
        if (0 != p->apply_config(chas, p->config)) {
            g_critical("%s: applying config of plugin %s failed",
                    G_STRLOC, p->name);
            return -1;
        }
    }
 
    /*
     * drop root privileges if requested
     */
#ifndef _WIN32
    if (chas->user) {
        struct passwd *user_info;
        uid_t user_id= geteuid();
 
        /* Don't bother if we aren't superuser */
        if (user_id) {
            g_critical("can only use the --user switch if running as root");
            return -1;
        }
 
        if (NULL == (user_info = getpwnam(chas->user))) {
            g_critical("unknown user: %s", chas->user);
            return -1;
        }
 
        if (chas->log->log_filename) {
            /* chown logfile */
            if (-1 == chown(chas->log->log_filename, user_info->pw_uid, user_info->pw_gid)) {
                g_critical("%s.%d: chown(%s) failed: %s",
                            __FILE__, __LINE__,
                            chas->log->log_filename,
                            g_strerror(errno) );
 
                return -1;
            }
        }
 
        setgid(user_info->pw_gid);
        setuid(user_info->pw_uid);
        g_debug("now running as user: %s (%d/%d)",
                chas->user,
                user_info->pw_uid,
                user_info->pw_gid );
    }
#endif
 
    signal_set(&ev_sigterm, SIGTERM, sigterm_handler, NULL);
    event_base_set(chas->event_base, &ev_sigterm);
    signal_add(&ev_sigterm, NULL);
 
    signal_set(&ev_sigint, SIGINT, sigterm_handler, NULL);
    event_base_set(chas->event_base, &ev_sigint);
    signal_add(&ev_sigint, NULL);
 
#ifdef SIGHUP
    signal_set(&ev_sighup, SIGHUP, sighup_handler, chas);
    event_base_set(chas->event_base, &ev_sighup);
    if (signal_add(&ev_sighup, NULL)) {
        g_critical("%s: signal_add(SIGHUP) failed", G_STRLOC);
    }
#endif
 
    if (chas->event_thread_count < 1) chas->event_thread_count = 1;
 
    /* create the event-threads
     *
     * - dup the async-queue-ping-fds
     * - setup the events notification
     * */
    for (i = 1; i < (guint)chas->event_thread_count; i++) { /* we already have 1 event-thread running, the main-thread */
        chassis_event_thread_t *event_thread;
     
        event_thread = chassis_event_thread_new();
        chassis_event_threads_init_thread(chas->threads, event_thread, chas);
        chassis_event_threads_add(chas->threads, event_thread);
    }
 
    /* start the event threads */
    if (chas->event_thread_count > 1) {
        chassis_event_threads_start(chas->threads);
    }
 
    /**
     * handle signals and all basic events into the main-thread
     *
     * block until we are asked to shutdown
     */
    chassis_event_thread_loop(mainloop_thread);
 
    signal_del(&ev_sigterm);
    signal_del(&ev_sigint);
#ifdef SIGHUP
    signal_del(&ev_sighup);
#endif
    return 0;
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

int chassis_mainloop(void *_chas) {

chassis *chas = _chas;

guint i;

struct event ev_sigterm, ev_sigint;

#ifdef SIGHUP

struct event ev_sighup;

#endif

chassis_event_thread_t *mainloop_thread;

/* redirect logging from libevent to glib */

event_set_log_callback(event_log_use_glib);

/* add a event-handler for the "main" events */

mainloop_thread = chassis_event_thread_new();

chassis_event_threads_init_thread(chas->threads, mainloop_thread, chas);

chassis_event_threads_add(chas->threads, mainloop_thread);

chas->event_base = mainloop_thread->event_base; /* all global events go to the 1st thread */

g_assert(chas->event_base);

/* setup all plugins all plugins */

for (i = 0; i < chas->modules->len; i++) {

chassis_plugin *p = chas->modules->pdata[i];

g_assert(p->apply_config);

if (0 != p->apply_config(chas, p->config)) {

g_critical("%s: applying config of plugin %s failed",

G_STRLOC, p->name);

return -1;

}

* drop root privileges if requested

#ifndef _WIN32

if (chas->user) {

struct passwd *user_info;

uid_t user_id= geteuid();

/* Don't bother if we aren't superuser */

if (user_id) {

g_critical("can only use the --user switch if running as root");

return -1;

}

if (NULL == (user_info = getpwnam(chas->user))) {

g_critical("unknown user: %s", chas->user);

return -1;

}

if (chas->log->log_filename) {

/* chown logfile */

if (-1 == chown(chas->log->log_filename, user_info->pw_uid, user_info->pw_gid)) {

g_critical("%s.%d: chown(%s) failed: %s",

__FILE__, __LINE__,

chas->log->log_filename,

g_strerror(errno) );

return -1;

}

setgid(user_info->pw_gid);

setuid(user_info->pw_uid);

g_debug("now running as user: %s (%d/%d)",

chas->user,

user_info->pw_uid,

user_info->pw_gid );

}

#endif

signal_set(&ev_sigterm, SIGTERM, sigterm_handler, NULL);

event_base_set(chas->event_base, &ev_sigterm);

signal_add(&ev_sigterm, NULL);

signal_set(&ev_sigint, SIGINT, sigterm_handler, NULL);

event_base_set(chas->event_base, &ev_sigint);

signal_add(&ev_sigint, NULL);

#ifdef SIGHUP

signal_set(&ev_sighup, SIGHUP, sighup_handler, chas);

event_base_set(chas->event_base, &ev_sighup);

if (signal_add(&ev_sighup, NULL)) {

g_critical("%s: signal_add(SIGHUP) failed", G_STRLOC);

}

#endif

if (chas->event_thread_count < 1) chas->event_thread_count = 1;

/* create the event-threads

* - dup the async-queue-ping-fds

* - setup the events notification

* */

for (i = 1; i < (guint)chas->event_thread_count; i++) { /* we already have 1 event-thread running, the main-thread */

chassis_event_thread_t *event_thread;

event_thread = chassis_event_thread_new();

chassis_event_threads_init_thread(chas->threads, event_thread, chas);

chassis_event_threads_add(chas->threads, event_thread);

}

/* start the event threads */

if (chas->event_thread_count > 1) {

chassis_event_threads_start(chas->threads);

}

/**

* handle signals and all basic events into the main-thread

* block until we are asked to shutdown

chassis_event_thread_loop(mainloop_thread);

signal_del(&ev_sigterm);

signal_del(&ev_sigint);

#ifdef SIGHUP

signal_del(&ev_sighup);

#endif

return 0;

}

【测试】

经过了上述源码分析，下面进行一些实验对其进行检验。

1.启动带 keepalive 功能的 mysql-proxy。

[root@Betty data]# mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
[root@Betty ~]# ps ajx 
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1 16766 16765 16765 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16766 16767 16765 16765 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty data]# mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# ps ajx

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND

1 16766 16765 16765 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16766 16767 16765 16765 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

2.向 daemon进程发送 INT 信号。

[root@Betty ~]# kill -INT 16766

1	[root@Betty ~]# kill -INT 16766

3. MySQL Proxy日志显示内容：

2013-03-19 19:31:38: (message) Initiating shutdown, requested from signal handler
2013-03-19 19:31:39: (message) shutting down normally, exit code is: 0
2013-03-19 19:31:39: (debug) chassis-unix-daemon.c:167: 16767 returned: 16767
2013-03-19 19:31:39: (message) chassis-unix-daemon.c:176: [angel] PID=16767 exited normally with exit-code = 0 (it used 1 kBytes max)
2013-03-19 19:31:39: (message) Initiating shutdown, requested from mysql-proxy-cli.c:606
2013-03-19 19:31:39: (message) shutting down normally, exit code is: 0

2013-03-19 19:31:38: (message) Initiating shutdown, requested from signal handler

2013-03-19 19:31:39: (message) shutting down normally, exit code is: 0

2013-03-19 19:31:39: (debug) chassis-unix-daemon.c:167: 16767 returned: 16767

2013-03-19 19:31:39: (message) chassis-unix-daemon.c:176: [angel] PID=16767 exited normally with exit-code = 0 (it used 1 kBytes max)

2013-03-19 19:31:39: (message) Initiating shutdown, requested from mysql-proxy-cli.c:606

2013-03-19 19:31:39: (message) shutting down normally, exit code is: 0

可以看出，父子进程均退出。因为其信号处理函数会将全局变量 signal_shutdown 设置为 1，从而导致子进程退出 loop 循环，而处于 waitpid 状态的父进程获得的子进程的退出状态为 child_exit_status = 0 ，进而令父进程也会正常退出执行。

4.重复上述动作，但是改为向子进程发送 INT 信号。

[root@Betty ~]# ps ajx 
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1 16872 16871 16871 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16872 16873 16871 16871 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
 
[root@Betty ~]# kill -INT 16873

[root@Betty ~]# ps ajx

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND

1 16872 16871 16871 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16872 16873 16871 16871 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# kill -INT 16873

日志内容如下，完全相同。

2013-03-19 20:03:49: (message) Initiating shutdown, requested from signal handler
2013-03-19 20:03:50: (message) shutting down normally, exit code is: 0
2013-03-19 20:03:50: (debug) chassis-unix-daemon.c:167: 16873 returned: 16873
2013-03-19 20:03:50: (message) chassis-unix-daemon.c:176: [angel] PID=16873 exited normally with exit-code = 0 (it used 1 kBytes max)
2013-03-19 20:03:50: (message) Initiating shutdown, requested from mysql-proxy-cli.c:606
2013-03-19 20:03:50: (message) shutting down normally, exit code is: 0

2013-03-19 20:03:49: (message) Initiating shutdown, requested from signal handler

2013-03-19 20:03:50: (message) shutting down normally, exit code is: 0

2013-03-19 20:03:50: (debug) chassis-unix-daemon.c:167: 16873 returned: 16873

2013-03-19 20:03:50: (message) chassis-unix-daemon.c:176: [angel] PID=16873 exited normally with exit-code = 0 (it used 1 kBytes max)

2013-03-19 20:03:50: (message) Initiating shutdown, requested from mysql-proxy-cli.c:606

2013-03-19 20:03:50: (message) shutting down normally, exit code is: 0

5. 同样的实验（对子进程和和父进程分别实验一次），只是将信号变为 -TERM ，结果和上面的完全相同（因为代码中对这两个信号的处理方式完全相同）。

6. 同样的实验（对子进程和和父进程分别实验一次），只是将信号变为 -HUP ，结果如下：

2013-03-19 20:10:03: (message) received a SIGHUP, closing log file
2013-03-19 20:10:03: (message) re-opened log file after SIGHUP

1 2	2013-03-19 20:10:03: (message) received a SIGHUP, closing log file 2013-03-19 20:10:03: (message) re-opened log file after SIGHUP

上述打印出现在子进程的 HUP 信号处理函数中。该函数仅对日志设置了 rotate_logs = true 标识，并没有设置 signal_shutdown = 1 ，所以子进程不会结束，父进程也不会结束。

7. 同样的实验，将信号变为 -KILL ，向子进程发送：

[root@Betty ~]# ps ajx 
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
 
    1 16902 16901 16901 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16902 16903 16901 16901 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
 
[root@Betty ~]# kill -KILL 16903

[root@Betty ~]# ps ajx

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND

1 16902 16901 16901 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16902 16903 16901 16901 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# kill -KILL 16903

输出日志如下：

2013-03-19 20:09:38: (debug) chassis-unix-daemon.c:121: we are the child: 16903
2013-03-19 20:09:38: (critical) plugin proxy 0.8.3 started
2013-03-19 20:09:38: (debug) max open file-descriptors = 1024
2013-03-19 20:09:38: (message) proxy listening on port 172.16.40.60:4040
2013-03-19 20:09:38: (message) added read/write backend: 172.16.40.60:12345
2013-03-19 20:09:38: (message) chassis-unix-daemon.c:136: [angel] we try to keep PID=16903 alive
2013-03-19 20:09:38: (debug) chassis-unix-daemon.c:157: waiting for 16903
...
...
2013-03-19 20:31:36: (debug) chassis-unix-daemon.c:167: 16903 returned: 16903
2013-03-19 20:31:36: (critical) chassis-unix-daemon.c:189: [angel] PID=16903 died on signal=9 (it used 1 kBytes max) ... waiting 3min before restart
2013-03-19 20:31:38: (debug) chassis-unix-daemon.c:121: we are the child: 16947
2013-03-19 20:31:38: (critical) plugin proxy 0.8.3 started
2013-03-19 20:31:38: (debug) max open file-descriptors = 1024
2013-03-19 20:31:38: (message) proxy listening on port 172.16.40.60:4040
2013-03-19 20:31:38: (message) added read/write backend: 172.16.40.60:12345
2013-03-19 20:31:38: (message) chassis-unix-daemon.c:136: [angel] we try to keep PID=16947 alive
2013-03-19 20:31:38: (debug) chassis-unix-daemon.c:157: waiting for 16947

2013-03-19 20:09:38: (debug) chassis-unix-daemon.c:121: we are the child: 16903

2013-03-19 20:09:38: (critical) plugin proxy 0.8.3 started

2013-03-19 20:09:38: (debug) max open file-descriptors = 1024

2013-03-19 20:09:38: (message) proxy listening on port 172.16.40.60:4040

2013-03-19 20:09:38: (message) added read/write backend: 172.16.40.60:12345

2013-03-19 20:09:38: (message) chassis-unix-daemon.c:136: [angel] we try to keep PID=16903 alive

2013-03-19 20:09:38: (debug) chassis-unix-daemon.c:157: waiting for 16903

...

2013-03-19 20:31:36: (debug) chassis-unix-daemon.c:167: 16903 returned: 16903

2013-03-19 20:31:36: (critical) chassis-unix-daemon.c:189: [angel] PID=16903 died on signal=9 (it used 1 kBytes max) ... waiting 3min before restart

2013-03-19 20:31:38: (debug) chassis-unix-daemon.c:121: we are the child: 16947

2013-03-19 20:31:38: (critical) plugin proxy 0.8.3 started

2013-03-19 20:31:38: (debug) max open file-descriptors = 1024

2013-03-19 20:31:38: (message) proxy listening on port 172.16.40.60:4040

2013-03-19 20:31:38: (message) added read/write backend: 172.16.40.60:12345

2013-03-19 20:31:38: (message) chassis-unix-daemon.c:136: [angel] we try to keep PID=16947 alive

2013-03-19 20:31:38: (debug) chassis-unix-daemon.c:157: waiting for 16947

从日志和代码上都可以分析得出原因：由于 -KILL 信号是无法获取或者忽略的，所以当发送该信号给子进程后，子进程将被杀死，退出状态为 died on signal=9 ，此时父进程会执行 restart 子进程的操作。

此时重新查看进程信息：

[root@Betty ~]# ps ajx 
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
    1 16902 16901 16901 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16902 16947 16901 16901 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# ps ajx

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND

1 16902 16901 16901 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16902 16947 16901 16901 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

若向父进程发送 -KILL 信号，那么父进程将被直接杀死，子进程被 init 收留，而 init 进程根本不会理会是否需要 keepalive 子进程的问题，所以此时再向子进程发送 -KILL ，子进程被杀死后，不会重新被启动。

8. 同样的实验，将信号变为-STOP，向子进程发送：

[root@Betty ~]# ps ajx 
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
 
    1 16977 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16977 16978 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
 
[root@Betty ~]# kill -STOP 16978
 
    1 16977 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16977 16978 16976 16976 ?           -1 T        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
 
 
[root@Betty ~]# kill -CONT 16978
 
    1 16977 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16977 16978 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
 
[root@Betty ~]# kill -STOP 16977
 
    1 16977 16976 16976 ?           -1 T        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16977 16978 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
 
[root@Betty ~]# kill -CONT 16977
 
    1 16977 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf
16977 16978 16976 16976 ?           -1 S        0   0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# ps ajx

PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND

1 16977 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16977 16978 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# kill -STOP 16978

1 16977 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16977 16978 16976 16976 ? -1 T 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# kill -CONT 16978

1 16977 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16977 16978 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# kill -STOP 16977

1 16977 16976 16976 ? -1 T 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16977 16978 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

[root@Betty ~]# kill -CONT 16977

1 16977 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

16977 16978 16976 16976 ? -1 S 0 0:00 mysql-proxy --defaults-file=/etc/mysql-proxy.cnf

出现上述结果的原因，是信号 -STOP 同样不可捕获和忽略，而进程对该信号的默认处理方式为暂停进程（可以从进程状态标志看出来）。同时在代码中，父进程在获得子进程状态处于暂停时，没有做任何特别处理，只是重新调用 waitpid 继续获取子进程的状态而已。

【总结】
daemon 功能和 keepalive 功能属于服务器程序开发过程中经常要面对到的问题，本文提供了上述功能的一种实现方式。通过学习开源代码，可以有机会接触到一些经典的处理问题的方法，通过对一些问题的深入了解，能够进一步完善自身的知识体系，强化对一些知识的理解。最后引用一位大师的名言：源码面前，了无秘密。祝玩的开心！

====================================

再贴两个 daemonize 的实现进行对比（取自 memcached-1.4.14）：

int daemonize(int nochdir, int noclose)
{
    int fd;
 
    switch (fork()) {
    case -1:
        return (-1);
    case 0:
        break;
    default:
        _exit(EXIT_SUCCESS);
    }
 
    if (setsid() == -1)
        return (-1);
 
    if (nochdir == 0) {
        if(chdir("/") != 0) {
            perror("chdir");
            return (-1);
        }
    }
 
    if (noclose == 0 && (fd = open("/dev/null", O_RDWR, 0)) != -1) {
        if(dup2(fd, STDIN_FILENO) < 0) {
            perror("dup2 stdin");
            return (-1);
        }
        if(dup2(fd, STDOUT_FILENO) < 0) {
            perror("dup2 stdout");
            return (-1);
        }
        if(dup2(fd, STDERR_FILENO) < 0) {
            perror("dup2 stderr");
            return (-1);
        }
 
        if (fd > STDERR_FILENO) {
            if(close(fd) < 0) {
                perror("close");
                return (-1);
            }
        }
    }
    return (0);
}

int daemonize(int nochdir, int noclose)

{

int fd;

switch (fork()) {

case -1:

return (-1);

case 0:

break;

default:

_exit(EXIT_SUCCESS);

}

if (setsid() == -1)

return (-1);

if (nochdir == 0) {

if(chdir("/") != 0) {

perror("chdir");

return (-1);

}

if (noclose == 0 && (fd = open("/dev/null", O_RDWR, 0)) != -1) {

if(dup2(fd, STDIN_FILENO) < 0) {

perror("dup2 stdin");

return (-1);

}

if(dup2(fd, STDOUT_FILENO) < 0) {

perror("dup2 stdout");

return (-1);

}

if(dup2(fd, STDERR_FILENO) < 0) {

perror("dup2 stderr");

return (-1);

}

if (fd > STDERR_FILENO) {

if(close(fd) < 0) {

perror("close");

return (-1);

}

return (0);

}

（下面代码取自 Twemproxy）

static rstatus_t
nc_daemonize(int dump_core)
{
    rstatus_t status;
    pid_t pid, sid;
    int fd;
 
    pid = fork();
    switch (pid) {
    case -1:
        log_error("fork() failed: %s", strerror(errno));
        return NC_ERROR;
 
    case 0:
        break;
 
    default:
        /* parent terminates */
        _exit(0);
    }
 
    /* 1st child continues and becomes the session leader */
 
    sid = setsid();
    if (sid < 0) {
        log_error("setsid() failed: %s", strerror(errno));
        return NC_ERROR;
    }
 
    if (signal(SIGHUP, SIG_IGN) == SIG_ERR) {
        log_error("signal(SIGHUP, SIG_IGN) failed: %s", strerror(errno));
        return NC_ERROR;
    }
 
    pid = fork();
    switch (pid) {
    case -1:
        log_error("fork() failed: %s", strerror(errno));
        return NC_ERROR;
 
    case 0:
        break;
 
    default:
        /* 1st child terminates */
        _exit(0);
    }
 
    /* 2nd child continues */
 
    /* change working directory */
    if (dump_core == 0) {
        status = chdir("/");
        if (status < 0) {
            log_error("chdir(\"/\") failed: %s", strerror(errno));
            return NC_ERROR;
        }
    }
 
    /* clear file mode creation mask */
    umask(0);
 
    /* redirect stdin, stdout and stderr to "/dev/null" */
 
    fd = open("/dev/null", O_RDWR);
    if (fd < 0) {
        log_error("open(\"/dev/null\") failed: %s", strerror(errno));
        return NC_ERROR;
    }
 
    status = dup2(fd, STDIN_FILENO);
    if (status < 0) {
        log_error("dup2(%d, STDIN) failed: %s", fd, strerror(errno));
        close(fd);
        return NC_ERROR;
    }
 
    status = dup2(fd, STDOUT_FILENO);
    if (status < 0) {
        log_error("dup2(%d, STDOUT) failed: %s", fd, strerror(errno));
        close(fd);
        return NC_ERROR;
    }
 
    status = dup2(fd, STDERR_FILENO);
    if (status < 0) {
        log_error("dup2(%d, STDERR) failed: %s", fd, strerror(errno));
        close(fd);
        return NC_ERROR;
    }
 
    if (fd > STDERR_FILENO) {
        status = close(fd);
        if (status < 0) {
            log_error("close(%d) failed: %s", fd, strerror(errno));
            return NC_ERROR;
        }
    }
 
    return NC_OK;
}

100

101

static rstatus_t

nc_daemonize(int dump_core)

{

rstatus_t status;

pid_t pid, sid;

int fd;

pid = fork();

switch (pid) {

case -1:

log_error("fork() failed: %s", strerror(errno));

return NC_ERROR;

case 0:

break;

default:

/* parent terminates */

_exit(0);

}

/* 1st child continues and becomes the session leader */

sid = setsid();

if (sid < 0) {

log_error("setsid() failed: %s", strerror(errno));

return NC_ERROR;

}

if (signal(SIGHUP, SIG_IGN) == SIG_ERR) {

log_error("signal(SIGHUP, SIG_IGN) failed: %s", strerror(errno));

return NC_ERROR;

}

pid = fork();

switch (pid) {

case -1:

log_error("fork() failed: %s", strerror(errno));

return NC_ERROR;

case 0:

break;

default:

/* 1st child terminates */

_exit(0);

}

/* 2nd child continues */

/* change working directory */

if (dump_core == 0) {

status = chdir("/");

if (status < 0) {

log_error("chdir(\"/\") failed: %s", strerror(errno));

return NC_ERROR;

}

/* clear file mode creation mask */

umask(0);

/* redirect stdin, stdout and stderr to "/dev/null" */

fd = open("/dev/null", O_RDWR);

if (fd < 0) {

log_error("open(\"/dev/null\") failed: %s", strerror(errno));

return NC_ERROR;

}

status = dup2(fd, STDIN_FILENO);

if (status < 0) {

log_error("dup2(%d, STDIN) failed: %s", fd, strerror(errno));

close(fd);

return NC_ERROR;

}

status = dup2(fd, STDOUT_FILENO);

if (status < 0) {

log_error("dup2(%d, STDOUT) failed: %s", fd, strerror(errno));

close(fd);

return NC_ERROR;

}

status = dup2(fd, STDERR_FILENO);

if (status < 0) {

log_error("dup2(%d, STDERR) failed: %s", fd, strerror(errno));

close(fd);

return NC_ERROR;

}

if (fd > STDERR_FILENO) {

status = close(fd);

if (status < 0) {

log_error("close(%d) failed: %s", fd, strerror(errno));

return NC_ERROR;

}

return NC_OK;

}

发表评论取消回复

要发表评论，您必须先登录。

第七根弦的技术博客