打造高可用 Kubernetes 集群

學習如何在裸機上建立高可用 Kubernetes 集群

July 15, 2016 ta-ching chen

4 minute read

文章目錄

預先準備

請參照先前的安裝教學確保沒有任何東西遺漏。

建立可靠的資訊儲存空間

Kubernetes 仰賴 etcd 來儲存整個集群的相關資訊，若要建立一個完整的 Kubernetes 集群，我們勢必要先從 etcd 集群開始著手來建立可靠的高可用的 key-value 儲存空間，而一旦 etcd 集群建立後，其內部機制回會自動把儲存的資料同步至其他的 etcd 節點。

停止 etcd 服務

service etcd stop

建立 etcd 集群

首先將 etcd 執行檔複製到各節點上並執行它們。

sudo /opt/bin/etcd --name <node name> -data-dir <path to etcd data dir> \
--initial-advertise-peer-urls http://<node ip>:4000 \
--listen-peer-urls http://<node ip>:4000 \
--listen-client-urls http://127.0.0.1:4001,http://<node ip>:4001 \
--advertise-client-urls http://<node ip>:4001 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster <node1 name>=http://<node1 ip>:4000,<node2 name>=http://<node2 ip>:4000,<node3 name>=http://<node3 ip>:4000 \
--initial-cluster-state new

Example:
- infra0: 10.211.55.14
- infra1: 10.211.55.15
- infra2: 10.211.55.16

# infra0
sudo /opt/bin/etcd --name infra0 -data-dir /opt/etcd --initial-advertise-peer-urls http://10.211.55.14:4000 --listen-peer-urls http://10.211.55.14:4000 --listen-client-urls http://127.0.0.1:4001,http://10.211.55.14:4001 --advertise-client-urls http://10.211.55.14:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.211.55.14:4000,infra1=http://10.211.55.15:4000,infra2=http://10.211.55.16:4000 --initial-cluster-state new
# infra1
sudo /opt/bin/etcd --name infra1 -data-dir /opt/etcd --initial-advertise-peer-urls http://10.211.55.15:4000 --listen-peer-urls http://10.211.55.15:4000 --listen-client-urls http://127.0.0.1:4001,http://10.211.55.15:4001 --advertise-client-urls http://10.211.55.15:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.211.55.14:4000,infra1=http://10.211.55.15:4000,infra2=http://10.211.55.16:4000 --initial-cluster-state new
# infra2
sudo /opt/bin/etcd --name infra2 -data-dir /opt/etcd --initial-advertise-peer-urls http://10.211.55.16:4000 --listen-peer-urls http://10.211.55.16:4000 --listen-client-urls http://127.0.0.1:4001,http://10.211.55.16:4001 --advertise-client-urls http://10.211.55.16:4001 --initial-cluster-token etcd-cluster-1 --initial-cluster infra0=http://10.211.55.14:4000,infra1=http://10.211.55.15:4000,infra2=http://10.211.55.16:4000 --initial-cluster-state new

etcd 集群建立後，可以透過 etcdctl ls 來查看目前儲存的設定

$ etcdctl ls
/registry

重建 Flannel 網路

Flannel 可以在不同節點間建立所謂的 overlay network，讓 Kubernetes 的 Pod 可以在多個節點間互相溝通傳輸。由於 Flannel 透過 etcd 來儲存自身相關的網路設定，因此在建立 etcd 集群後我們必須重建整個 Flannel 的網路來確保網路的暢通。

重建 flannel 設定

首先我們到各節點上，執行下面的指令

# following command can be find at kubernetes/cluster/ubuntu/util.sh
# it will rebuild flannel network config entry
FLANNEL_NET="172.16.0.0/16" KUBE_CONFIG_FILE="config-default.sh" DOCKER_OPTS="" ~/kube/reconfDocker.sh ai

執行後 /coreos.com/network/subnets 會出現各節點的網路相關資訊，透過這些資訊 Flannel 才能讓封包在 Kubernetes 之間相互傳輸。

$ etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/172.16.92.0-24
/coreos.com/network/subnets/172.16.80.0-24
/coreos.com/network/subnets/172.16.2.0-24

$ etcdctl get /coreos.com/network/subnets/172.16.2.0-24
{"PublicIP":"10.211.55.15","BackendType":"vxlan","BackendData":{"VtepMAC":"22:f2:fc:58:41:72"}}

建立冗余 (redundancy) Kubernetes 元件服務

接下來我們需要將 kube-apiserver、kube-controller-manager、kube-scheduler 的執行檔複製到各個我們希望當作主節點的機器上並執行。

服務預設設定檔

/etc/default/kube-apiserver

KUBE_APISERVER_OPTS=" --apiserver-count=<number of apiserver> --insecure-bind-address=0.0.0.0 --insecure-port=8080 --etcd-servers=http://127.0.0.1:4001 --logtostderr=true --service-cluster-ip-range=192.168.3.0/24 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,SecurityContextDeny,ResourceQuota --service-node-port-range=30000-32767 --advertise-address=<node ip, e.g 10.211.55.14> --client-ca-file=/srv/kubernetes/ca.crt --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key"

/etc/default/kube-controller-manager

KUBE_CONTROLLER_MANAGER_OPTS=" --master=127.0.0.1:8080 --root-ca-file=/srv/kubernetes/ca.crt --service-account-private-key-file=/srv/kubernetes/server.key --v=2 --leader-elect=true --logtostderr=true"

/etc/default/kube-scheduler

KUBE_SCHEDULER_OPTS=" --logtostderr=true --master=127.0.0.1:8080 --v=2 --leader-elect=true"

建立高可用的 API 端點

在前述的步驟中儘管我們執行多個 kube-apiserver服務，但由於在其他服務設定檔中仍只能設定單一個 api 端點位址，因此我們仍有機會遇到單節點故障 (single point of failure) 的問題。所以我們必須在每個節點執行 Pound 當作本地端的 api 代理來連線到多個 api 端點，確保任意端點出現異常時服務仍能夠正常執行。

安裝 Pound

由於 Kubernetes 底層溝通會需要 RESTful 的 PATCH，因此我們需要安裝 Pound 2.7+ 的版本來確保功能正常。

Ubuntu 16.04+

$ sudo apt-get install pound

Ubuntu 14.04

從 Pound 官方網站下載 2.7.tar.gz 來編譯

$ wget http://www.apsis.ch/pound/Pound-2.7.tgz
$ tar -zxvf Pound-2.7.tgz
$ cd Pound-2.7
$ configure && make

設定 Pound

修改 /etc/pound/pound.cfg 中的 backend ip 及 port
自行編譯者須自行建立該檔案

User        "www-data"
Group       "www-data"
LogLevel    1
## check backend every X secs:
Alive       1
## use hardware-accelleration card supported by openssl(1):
#SSLEngine  "<hw>"
# poundctl control socket
Control "/var/run/pound/poundctl.socket"
ListenHTTP
      Address 0.0.0.0
    Port    8080
    xHTTP       1
    Service
        BackEnd
            Address x.x.x.x
            Port    8081
        End
        BackEnd
            Address y.y.y.y
            Port    8081
        End
        BackEnd
            Address z.z.z.z
            Port    8081
        End
    End
End

Upstart 腳本 /etc/init.d/pound

#! /bin/sh
### BEGIN INIT INFO
# Provides:          pound
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Should-Start:      $named
# Should-Stop:       $named
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: reverse proxy and load balancer
# Description:       reverse proxy, load balancer and
#                    HTTPS front-end for Web servers
### END INIT INFO
#
# pound - reverse proxy, load-balancer and https front-end for web-servers
PATH=/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/pound
DESC="reverse proxy and load balancer"
NAME=pound
# Exit if the daemon does not exist (anymore)
test -f $DAEMON || exit 0
. /lib/lsb/init-functions
# Check if pound is configured or not
if [ -f "/etc/default/pound" ]
then
  . /etc/default/pound
  if [ "$startup" != "1" ]
  then
    log_warning_msg "$NAME will not start unconfigured."
    log_warning_msg "Please configure; afterwards, set startup=1 in /etc/default/pound."
    exit 0
  fi
else
  log_failure_msg "/etc/default/pound not found"
  exit 1
fi
# The real work of an init script
case "$1" in
  start)
  log_daemon_msg "Starting $DESC" "$NAME"
    if [ ! -d "/var/run/pound" ]
    then
        mkdir -p /var/run/pound
    fi
  start_daemon $DAEMON $POUND_ARGS
  log_end_msg $?
  ;;
  stop)
  log_daemon_msg "Stopping $DESC" "$NAME"
  killproc $DAEMON
  log_end_msg $?
  ;;
  restart|force-reload)
  log_daemon_msg "Restarting $DESC" "$NAME"
  killproc $DAEMON
  start_daemon $DAEMON $POUND_ARGS
  echo "."
  ;;
  status)
        pidofproc $DAEMON >/dev/null
  status=$?
  if [ $status -eq 0 ]; then
            log_success_msg "$NAME is running"
        else
            log_success_msg "$NAME is not running"
        fi
  exit $status
        ;;
  *)
  echo "Usage: $0 {start|stop|restart|force-reload|status}"
  exit 1
  ;;
esac
# Fallthrough if work done.
exit 0

修改 kube-apiserver 設定

將 /etc/default/kube-apiserver 內的預設 Port 從 8080 改成 8081

--insecure-port=8081

重啟 Kubernetes 服務

sudo service kube-apiserver restart
sudo service kube-controller-manager restart
sudo service kubelet restart
sudo service kube-proxy restart
sudo service kube-scheduler restart
sudo service pound restart

Troubleshooting

當我們使用 kubectl exec 會發生以下的情況

$ kubectl exec -it <pod> bash
error: Timeout occured

原因在於 Kubernetes 使用 SPDY 建立連線到該 Pod 所在的節點，而目前 Pound 尚未支持 SPDY，因此必須用下述指令取代

ip: api 端點 ip
port: api 端點 port 而不是本地代理的 port

$ kubectl -s <ip>:<port> exec -it <pod> bash
$ kubectl -s 127.0.0.1:8081 exec -it <pod> bash

參考連結

文章內容的轉載、重製、發佈，請註明出處: https://tachingchen.com/tw/

打造高可用 Kubernetes 集群

預先準備

建立可靠的資訊儲存空間

停止 etcd 服務

建立 etcd 集群

重建 Flannel 網路

建立冗余 (redundancy) Kubernetes 元件服務

服務預設設定檔

建立高可用的 API 端點

安裝 Pound

設定 Pound

修改 kube-apiserver 設定

重啟 Kubernetes 服務

Troubleshooting

參考連結

相關文章

近期文章

Mac 網路分享，導致連線異常

從菜鳥開始，工程師學 Pitch

收費課程 - Kubernetes 容器編排平台導入

Google 如何進行 Code Review - 6

Google 如何進行 Code Review - 5

文章分類