Nos artigos anteriores, pudemos explorar um pouco sobre a operação de Failover em um ambiente Data Guard, de forma manual. Para facilitar o trabalho e potencializar os recursos de alta disponibilidade, a Oracle disponibiliza o FSFO, que em resumo seria a opção de Failover de forma automática, respeitando regras e parâmetros definidos de forma premeditada e antecipada. Aqui vamos explorar um pouco sobre essa importante opção.
O FSFO nada mais seria que um OCI client construído dentro do utilitário DGMGRL, que invoca de forma automática o Failover de um ambiente, além de também conseguir realizar o “reinstate” do ambiente primary com falha (após sua disponilidade). Em termos de infraestrutura, exige a criação de uma entidade chamada \”Observer\”, que verifica de forma periódica as condições dos membros do Data Guard. Um ponto importante é que o Observer deve ser criado em um servidor/estrutura separado dos demais membros, já que ele fica como um ponto de falha único nessa estrutura (caso o mesmo fique fora, o ambiente como um todo ficará travado/congelado):

Alguns dos pré-requisitos seriam a configuração do Data Broker, Flashback Database habilitado em todos os membros, Standby Redo Log files criados em todos os membros, instalar o utilitário DGMGRL na máquina onde o Observer irá rodar (em nosso artigo, como é apenas um laboratório, criarei o Observer em uma máquina já existente) e configuração do tnsnames para comunicação entre os ambientes.
Os critérios para iniciar o FSFO podem incluir: problemas de conectividade ao primary através do standby ou observer, datafile offline, control file corrompido, dicionário corrompido, redo log inacessíveis, congelamento por esgotamento da área de archives, erros com prefixo “ORA-” ou por definição da aplicação (como se fosse o start do FSFO por API).
Checando condições do ambiente em que vamos utilizar:
[oracle@fornix1 ~]$ dgmgrl sys/oracle@CORTEX
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 05:35:38 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "cortex"
Connected as SYSDBA.
DGMGRL> SHOW CONFIGURATION;
Configuration - cortex
Protection Mode: MaxPerformance
Members:
cortex - Primary database
cortexdr - Physical standby database
Fast-Start Failover: Disabled
Configuration Status:
SUCCESS (status updated 58 seconds ago)
DGMGRL> SHOW DATABASE CORTEX;
Database - cortex
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
cortex
Database Status:
SUCCESS
DGMGRL> SHOW DATABASE CORTEXDR;
Database - cortexdr
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Average Apply Rate: 18.00 KByte/s
Real Time Query: OFF
Instance(s):
CORTEXDR
Database Status:
SUCCESS
Conforme Oracle Note “Doc ID 1387859.1”, é importante nos certificarmos que o parâmetro “StaticConnectIdentifier” esteja devidamente preenchido em todos os membros para o pleno funcionamento do FSFO. O mesmo é usado pelo Broker para realizar a conexão remota aos bancos de dados. O atributo HOST é definido automaticamente pelo Broker, a partir do que está definido no parâmetro LOCAL_LISTENER:
Primário preenchido de forma correta:
DGMGRL> SHOW DATABASE CORTEX StaticConnectIdentifier;
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.109)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=cortex_DGMGRL.localdomain)(INSTANCE_NAME=cortex)(SERVER=DEDICATED)))'
DGMGRL>
Standby também:
DGMGRL> SHOW DATABASE CORTEXDR StaticConnectIdentifier;
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.110)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=cortexDR_DGMGRL.localdomain)(INSTANCE_NAME=CORTEXDR)(SERVER=DEDICATED)))'
DGMGRL>
Caso fosse necessário alterar o valor do parâmetro, a ordem correta é ajustar o HOST do LOCAL_LISTENER, reiniciar o listener, e realizar o “reset” do parâmetro dentro do Broker, para que o mesmo consiga coletar o novo valor definido (EDIT DATABASE **** RESET PROPERTY StaticConnectIdentifier;) e por último reiniciar o processo de MRP.
Checando que o recurso de FSFO está desabilitado:
DGMGRL> SHOW FAST_START FAILOVER;
Fast-Start Failover: Disabled
Protection Mode: MaxPerformance
Lag Limit: 30 seconds
Threshold: 30 seconds
Active Target: (none)
Potential Targets: (none)
Observer: (none)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
Definindo o target standby de cada banco de dados pelo FSFO:
DGMGRL> EDIT DATABASE CORTEX SET PROPERTY FastStartFailoverTarget = 'CORTEXDR';
Property "faststartfailovertarget" updated
DGMGRL> EDIT DATABASE CORTEXDR SET PROPERTY FastStartFailoverTarget = 'CORTEX';
Property "faststartfailovertarget" updated
DGMGRL>
Definindo o valor em 45 segundos do FSFO Threshold (que seria o tempo em que o primário passasse inacessível pelo Observer/Standby e que disparasse o processo de Failover):
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold=45;
Property "faststartfailoverthreshold" updated
Podemos definir também o tempo (em segundos) do Redo Apply Lag entre o standby e primary (o valor padrão deste parâmetro é 30, e o menor valor aceitável é 10 segundos):
DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverLagLimit = 120;
Property "faststartfailoverlaglimit" updated
Verificando que o parâmetro “FastStartFailoverPmyShutdown” está definido como TRUE:
DGMGRL> SHOW CONFIGURATION FastStartFailoverPmyShutdown
FastStartFailoverPmyShutdown = 'TRUE'
DGMGRL>
Mesma coisa para o parâmetro “FastStartFailoverAutoReinstate”:
DGMGRL> SHOW CONFIGURATION FastStartFailoverAutoReinstate;
FastStartFailoverAutoReinstate = 'TRUE'
DGMGRL>
Para habilitar o FSFO, basta rodar o comando abaixo:
DGMGRL> ENABLE FAST_START FAILOVER;
Enabled in Potential Data Loss Mode.
DGMGRL>
Iniciando o Observer na máquina do Standby (para ambientes corporativos, nunca faça isso, crie-o em uma máquina separada de todos os membros):
[oracle@fornix2 trace]$ dgmgrl sys/oracle@CORTEXDR;
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 05:59:41 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "cortexDR"
Connected as SYSDBA.
DGMGRL> START OBSERVER FILE='/home/oracle/CORTEXFSFO.dat';
[W000 2021-04-25T05:59:46.156-03:00] FSFO target standby is cortexdr
Observer 'fornix2' started
[W000 2021-04-25T05:59:46.226-03:00] Observer trace level is set to USER
[W000 2021-04-25T05:59:46.226-03:00] Try to connect to the primary.
[W000 2021-04-25T05:59:46.226-03:00] Try to connect to the primary cortex.
[W000 2021-04-25T05:59:46.282-03:00] The standby cortexdr is ready to be a FSFO target
[W000 2021-04-25T05:59:46.282-03:00] Connection to the primary restored!
[W000 2021-04-25T05:59:48.283-03:00] Disconnecting from database cortex.
Já podemos ver o reflexo da nova configuração no primary:
DGMGRL> SHOW CONFIGURATION
Configuration - cortex
Protection Mode: MaxPerformance
Members:
cortex - Primary database
cortexdr - (*) Physical standby database
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
SUCCESS (status updated 41 seconds ago)
DGMGRL> SHOW FAST_START FAILOVER
Fast-Start Failover: Enabled in Potential Data Loss Mode
Protection Mode: MaxPerformance
Lag Limit: 120 seconds
Threshold: 45 seconds
Active Target: cortexdr
Potential Targets: "CORTEXDR"
cortexdr valid
Observer: fornix2
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL>
Vendo o FSFO em ação ao indisponibilizar o primary:
[oracle@fornix1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Sun Apr 25 06:15:45 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
SQL> SELECT DB_UNIQUE_NAME,DATABASE_ROLE,OPEN_MODE FROM V$DATABASE;
DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE
------------------------------ ---------------- --------------------
cortex PRIMARY READ WRITE
SQL> SHU ABORT;
ORACLE instance shut down.
SQL> !ps -ef | grep pmon
grid 3308 1 0 04:56 ? 00:00:00 asm_pmon_+ASM
oracle 9057 9035 0 06:16 pts/0 00:00:00 /bin/bash -c ps -ef | grep pmon
oracle 9059 9057 0 06:16 pts/0 00:00:00 grep pmon
SQL>
Analisando comportamento do Observer:
[W000 2021-04-25T05:59:46.282-03:00] Connection to the primary restored!
[W000 2021-04-25T05:59:48.283-03:00] Disconnecting from database cortex.
[W000 2021-04-25T06:15:55.144-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:15:55.144-03:00] Fast-Start Failover threshold has not exceeded. Retry for the next 45 seconds
[W000 2021-04-25T06:15:56.144-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:15:56.355-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:15:57.355-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:15:57.360-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:15:58.360-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:15:58.366-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:15:59.366-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:16:37.653-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:16:37.653-03:00] Fast-Start Failover threshold has not exceeded. Retry for the next 3 seconds
[W000 2021-04-25T06:16:38.653-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:16:38.659-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:16:38.659-03:00] Fast-Start Failover threshold has not exceeded. Retry for the next 2 seconds
[W000 2021-04-25T06:16:39.659-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:16:39.664-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:16:39.664-03:00] Fast-Start Failover threshold has not exceeded. Retry for the next 1 second
[W000 2021-04-25T06:16:40.664-03:00] Try to connect to the primary.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:16:40.671-03:00] Primary database cannot be reached.
[W000 2021-04-25T06:16:40.671-03:00] Fast-Start Failover threshold has expired.
[W000 2021-04-25T06:16:40.671-03:00] Try to connect to the standby.
[W000 2021-04-25T06:16:40.671-03:00] Making a last connection attempt to primary database before proceeding with Fast-Start Failover.
[W000 2021-04-25T06:16:40.671-03:00] Check if the standby is ready for failover.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[S002 2021-04-25T06:16:40.781-03:00] Fast-Start Failover started...
2021-04-25T06:16:40.781-03:00
Initiating Fast-Start Failover to database "cortexdr"...
[S002 2021-04-25T06:16:40.781-03:00] Initiating Fast-start Failover.
Performing failover NOW, please wait...
Failover succeeded, new primary is "cortexdr"
2021-04-25T06:17:21.360-03:00
[S002 2021-04-25T06:17:21.360-03:00] Fast-Start Failover finished...
[W000 2021-04-25T06:17:21.360-03:00] Failover succeeded. Restart pinging.
[W000 2021-04-25T06:17:21.684-03:00] Primary database has changed to cortexdr.
[W000 2021-04-25T06:17:21.686-03:00] Try to connect to the primary.
[W000 2021-04-25T06:17:21.686-03:00] Try to connect to the primary cortexdr.
[W000 2021-04-25T06:17:22.073-03:00] The standby cortex needs to be reinstated
[W000 2021-04-25T06:17:22.073-03:00] Try to connect to the new standby cortex.
[W000 2021-04-25T06:17:22.073-03:00] Connection to the primary restored!
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:17:24.075-03:00] Disconnecting from database cortexdr.
[W000 2021-04-25T06:17:52.119-03:00] Try to connect to the new standby cortex.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
[W000 2021-04-25T06:18:22.162-03:00] Try to connect to the new standby cortex.
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Unable to connect to database using cortex
Podemos observar que o Failover foi realizado com êxito, e é refletido na configuração do Data Guard. O que nos chama a atenção também sobre o log acima é que além do failover, ele continua tentando conexão com o novo standby, para poder realizar o reinstate do mesmo.
[oracle@fornix1 ~]$ dgmgrl sys/oracle@CORTEXDR
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 06:19:40 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "CORTEXDR"
Connected as SYSDBA.
DGMGRL> SHOW CONFIGURATION;
Configuration - cortex
Protection Mode: MaxPerformance
Members:
cortexdr - Primary database
Warning: ORA-16824: multiple warnings, including fast-start failover-related warnings, detected for the database
cortex - (*) Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
WARNING (status updated 22 seconds ago)
DGMGRL> SHOW DATABASE CORTEXDR;
Database - cortexdr
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
CORTEXDR
Database Warning(s):
ORA-16829: fast-start failover configuration is lagging
ORA-16869: fast-start failover target not initialized
Database Status:
WARNING
DGMGRL> SHOW DATABASE CORTEX;
Database - cortex
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: (unknown)
Apply Lag: (unknown)
Average Apply Rate: (unknown)
Real Time Query: OFF
Instance(s):
cortex
Database Status:
DISABLED - ORA-16661: the standby database needs to be reinstated
Montando o novo standby, de modo que o Observer o reconhecerá e procederá com o reinstate:
[oracle@fornix1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Sun Apr 25 06:22:39 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
Connected to an idle instance.
SQL> STARTUP MOUNT;
ORACLE instance started.
Total System Global Area 2583690520 bytes
Fixed Size 8899864 bytes
Variable Size 553648128 bytes
Database Buffers 2013265920 bytes
Redo Buffers 7876608 bytes
Database mounted.
SQL>
Reinstate realizado:
Unable to connect to database using cortex
[W000 2021-04-25T06:22:52.556-03:00] Try to connect to the new standby cortex.
ORA-12528: TNS:listener: all appropriate instances are blocking new connections
Unable to connect to database using cortex
[W000 2021-04-25T06:23:22.593-03:00] Try to connect to the new standby cortex.
[W000 2021-04-25T06:23:23.598-03:00] Connection to the new standby restored!
[W000 2021-04-25T06:23:25.599-03:00] Try to connect to the primary cortexdr.
[W000 2021-04-25T06:23:26.603-03:00] Connection to the primary restored!
[W000 2021-04-25T06:23:27.604-03:00] Wait for new primary to be ready to reinstate.
[W000 2021-04-25T06:23:28.604-03:00] New primary is now ready to reinstate.
[W000 2021-04-25T06:23:28.604-03:00] Issuing REINSTATE command.
2021-04-25T06:23:28.605-03:00
Initiating reinstatement for database "cortex"...
Reinstating database "cortex", please wait...
[W000 2021-04-25T06:24:16.671-03:00] The standby cortex is ready to be a FSFO target
Reinstatement of database "cortex" succeeded
2021-04-25T06:24:50.108-03:00
[W000 2021-04-25T06:24:50.713-03:00] Successfully reinstated database cortex.
Configuração do Data Guard atualizada:
[oracle@fornix1 ~]$ dgmgrl sys/oracle@CORTEX
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 06:26:11 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "cortex"
Connected as SYSDBA.
DGMGRL> SHOW CONFIGURATION;
Configuration - cortex
Protection Mode: MaxPerformance
Members:
cortexdr - Primary database
cortex - (*) Physical standby database
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
SUCCESS (status updated 47 seconds ago)
Realizando um switchover para o primário original:
[oracle@fornix1 ~]$ dgmgrl sys/oracle@CORTEXDR
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 06:27:43 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "CORTEXDR"
Connected as SYSDBA.
DGMGRL> SWITCHOVER TO CORTEX;
Performing switchover NOW, please wait...
Operation requires a connection to database "cortex"
Connecting ...
Connected to "cortex"
Connected as SYSDBA.
New primary database "cortex" is opening...
Oracle Clusterware is restarting database "cortexdr" ...
Connected to an idle instance.
Connected to an idle instance.
Connected to "cortexDR"
Connected to "cortexDR"
Switchover succeeded, new primary is "cortex"
DGMGRL>
Operação pela perspectiva do Observer:
2021-04-25T06:24:50.108-03:00
[W000 2021-04-25T06:24:50.713-03:00] Successfully reinstated database cortex.
[W000 2021-04-25T06:28:09.118-03:00] Primary database has changed to cortex.
[W000 2021-04-25T06:28:09.121-03:00] Try to connect to the primary.
[W000 2021-04-25T06:28:09.121-03:00] Try to connect to the primary cortex.
[W000 2021-04-25T06:28:09.642-03:00] Connection to the primary restored!
[W000 2021-04-25T06:28:10.643-03:00] Disconnecting from database cortex.
[W000 2021-04-25T06:28:27.249-03:00] Standby database has changed to cortexdr.
[W000 2021-04-25T06:28:27.250-03:00] Try to connect to the primary.
[W000 2021-04-25T06:28:27.250-03:00] Try to connect to the primary cortex.
[W000 2021-04-25T06:28:28.910-03:00] Connection to the primary restored!
[W000 2021-04-25T06:28:30.911-03:00] Disconnecting from database cortex.
[W000 2021-04-25T06:28:54.953-03:00] The standby cortexdr is ready to be a FSFO target
Nova configuração do Data Guard:
[oracle@fornix1 ~]$ dgmgrl sys/oracle@CORTEX
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 06:30:01 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "cortex"
Connected as SYSDBA.
DGMGRL> SHOW CONFIGURATION;
Configuration - cortex
Protection Mode: MaxPerformance
Members:
cortex - Primary database
cortexdr - (*) Physical standby database
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
SUCCESS (status updated 25 seconds ago)
Para parar o Observer:
[oracle@fornix1 ~]$ dgmgrl sys/oracle@CORTEXDR
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Sun Apr 25 06:31:07 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "cortexDR"
Connected as SYSDBA.
DGMGRL> STOP OBSERVER;
Observer stopped.
DGMGRL>
[W000 2021-04-25T06:28:28.910-03:00] Connection to the primary restored!
[W000 2021-04-25T06:28:30.911-03:00] Disconnecting from database cortex.
[W000 2021-04-25T06:28:54.953-03:00] The standby cortexdr is ready to be a FSFO target
2021-04-25T06:31:16.233-03:00
Observer 'fornix2' stopped
[oracle@fornix2 trace]$
Por fim, para desabitilarmos o recurso do FSFO:
DGMGRL> DISABLE FAST_START FAILOVER;
Disabled.
Obs: Este procedimento foi criado pelo senhor Ahmed Baraka (www.ahmedbaraka.com) e foi apenas reproduzido por mim em um laboratório pessoal para fins de aprendizado.
Pingback: FSFO (Fast-Start Failover Operation) requested by DBMS_DG.INITIATE_FS_FAILOVER – Bruno Santos da Silva
Pingback: FSFO (Fast-Start Failover Operation) requested by DBMS_DG.INITIATE_FS_FAILOVER – SWIV