Så kom dagen, hvor mit nye Nimble HF20 SAN fra HPE ankom. Det “gamle” Nimble CS235 er ved at løbe tør for plads og prisen for et nyt HF20 42TB SAN var samme pris, som service forlængelse og ekstra disk hylder ville koste til CS235 enheden. Det er altså efter 56% nedsættelse i prisen fra det oprindelige tilbud HPE kom med.

Med den nye model for jeg nu også inline deduplikering, ud over inline compression samt væsentligt større predictive caching – 6 x 480GB SSD diske.

Et par hurtige specifikationer på boksen:
1 x Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz
1 x 32GB DDR4 ram
1 x 8GB Agiga NVRAM
21 x 2TB Seagate SAS diske (det giver 30TB plads efter raid)
6 x 480GB Intel SSD diske
1 x Advantech 16GB SSD M2
4 x 1Gbit netkort
Dual controller active/passive
NimbleOS 5.0.5.200

Jeg ved godt, at de “gamle” storage modeller, inden HPE opkøbte Nimble, var baseret på Super Micro, men jeg blev alligevel overrasket over, at de nye modeller er 100% baseret på Intel servere. Selve motherboarded er et Intel S2600BP, hvilket ikke dårligt, men hvorfor ikke bruge sine egne produkter, når man har dem.

Enheden er pakket ud, tændt op og tilsluttet netværket så i gang med opsætningen. Nimble setup softwaren kunne bare ikke finde enheden, hverken fra mine servere tilsluttet iSCSI nettet eller min laptop tilsluttet samme vlan eller med direkte netkabel til controller A.
Efter en del tids fejlsøgning oprettede jeg en seriel forbindelse til controller A for at se, hvad der sker inde i boksen. Her er, hvad jeg ser:

Copyright (c) 2006 – 2017 Intel Corporation. All rights reserved.
IFWI Version: SE5C620.86B.OR.64.2018.13.3.01.0740.selfboot
Primary Bios Version: SE5C620.86B.00.01.N310.C0DEV.032820180740
Backup Bios Version: SE5C620.86B.00.01.N310.032820180740
System is booting from BIOS Primary Area!
BMC Firmware Version: 1.43.C4D9E834
SDR Version: SDR Package 5.06
ME Firmware Version: 04.00.04.294
Platform ID: S2600BP
System memory detected: 40960 MB
Current memory speed: 2133 MT/s
Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz
Number of physical processors identified: 1
AHCI Capable Controller 1 enabling 8 ports of 6Gb/s SATA
AHCI Capable Controller 2 enabling 6 ports of 6Gb/s SATA

USB Keyboard detected
USB Mouse detected

BMC BaseBoard IP Address 1 : 0.0.0.0
BMC BaseBoard IP Address 2 : 0.0.0.0
BMC Dedicated NIC IP Address : 0.0.0.0Press [Enter] to directly boot.
Press [F2] to enter setup and select boot options.
Press [F6] to show boot menu options.
Press [F12] to boot from network.
[ 0.000000] Linux version 4.4.36-583371-opt (build@centos-7-2-1511-v1-0) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 29 17:08:02 PDT 2018
¦[ 2.115044] md: reading super blocks
[ 2.118632] md: load sb for sda2 adding …
[ 2.122996] md: checking sda2 UUID and sb with others drives
[ 2.128666] md: considering sda2 …
[ 2.132252] md: adding sda2 …

================================================================================
Starting Micro Kernel First Stage Boot.

Date: Thu Jan 3 16:04:46 UTC 2019
USB version: 10.3 (583385)
Kernel version: 4.4.36-583371-opt
Boot device: sda2[0]

Disable SATA write cache on sda
ERROR: PLX not detected on Gen5 – normal bootup will fail!
Loading module: ipmi-msghandler
Loading module: ipmi-si
Loading module: ipmi-devintf
Failed to get controller id
Load crashdump kernel

Saving file /tmp/bios.ini in progress…

Successfully Completed
LSI SAS Expander boot failure sense utility.
Trigger not given – exiting.
ERROR: could not detect controller ID
[ 3.557824] IPMI WDG state is not valid (curr_timeout 0 new_timeout 600 state 3 action_val 3) … rearming
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
CTRLR-UNKNOWN (Maintenance Mode):/#

BUM! LSI SAS Expander boot failure sense utility. Mon ikke fejlen ligger her. Controller A booter aldrig op, så jeg kontrollerer nu controller B, som booter ganske fint op på normal vis. Altså én kørende controller så jeg forstår ikke, hvorfor Nimble setup softwaren ikke finder den nye enhed på netværket.
Anyway, jeg fjerner controller A, så kun controller B er i drift, men enheden kan stadig ikke findes. Først da jeg flytter controller B over på controller A plads begynder der at ske noget.
Så det tyder på, at controller A skal være virkende for at første gangs setup virker. Jeg har forespurgt Nimble support vedr. dette, men har ikke fået et svar.

Jeg fik fejlmeldt array´et via InfoSight og fik besked, at de ville sende mig en ny controller.

Jeg skulle dog lige se, hvad der mon kunne være galt, så jeg kiggede lidt nærmere på controlleren. Der sidder et IT bridge board (AHWBPBGB24 tror jeg det er), der tilsluttes bridge port og riser slot 3&4 på mother boarded.
På det bridge board sidder en LSI 3008 IOC, hvilket er ASIC chippen til de eksterne SAS connectorer bag på controlleren.
I riser slot 2 sidder en board til 2 PCI Express busser, der igen er tilsluttet IT bridge boarded. Mit gæt er, at via de forskellige riser porte (2,3 & 4) og bridge porten på mother boarded skabes en gennemgået forbindelse til de forskellige enheder i serveren.

Kiggede jeg godt efter på riser boarded i slot 2, kunne jeg se, at den kontakt, som skaber forbindelse til it bridge boarded var en lille smule løs og kunne “løftes” ½-1 mm ud fra printet, så der dermed ikke var elektrisk kontakt mellem enhederne.

HPE Nimble HF20 riser card slot 2 with defect connectors

HPE Nimble HF20 riser slot cage

Mit bedste bud er, at der enten ikke er strøm til bridge boarded eller manglende kontakt til de eksterne LSI SAS connectorer. Det vil i hvert fald give mening.

Selve SAS expander modulet ser sådan ud:

Næste opgave med arrayet er, at oprette replikering mellem det gamle array og det nye, så jeg kan få snapshot replikeret mine volumer over, lavet handover og ændre Hyper-V opsætningen.

Og for de interesserede vises her en normalt boot af en HF20 controller:

Copyright (c) 2006 – 2017 Intel Corporation. All rights reserved.
IFWI Version: SE5C620.86B.OR.64.2018.13.3.01.0740.selfboot
Primary Bios Version: SE5C620.86B.00.01.N310.C0DEV.032820180740
Backup Bios Version: SE5C620.86B.00.01.N310.032820180740
System is booting from BIOS Primary Area!
BMC Firmware Version: 1.43.C4D9E834
SDR Version: SDR Package 5.06
ME Firmware Version: 04.00.04.294
Platform ID: S2600BP
System memory detected: 40960 MB
Current memory speed: 2133 MT/s
Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz
Number of physical processors identified: 1
AHCI Capable Controller 1 enabling 8 ports of 6Gb/s SATA
AHCI Capable Controller 2 enabling 6 ports of 6Gb/s SATA

USB Keyboard detected
USB Mouse detected

BMC BaseBoard IP Address 1 : 0.0.0.0
BMC BaseBoard IP Address 2 : 0.0.0.0
BMC Dedicated NIC IP Address : 0.0.0.0

Press [Enter] to directly boot.
Press [F2] to enter setup and select boot options.
Press [F6] to show boot menu options.
Press [F12] to boot from network.
[ 0.000000] Linux version 4.4.36-583371-opt (build@centos-7-2-1511-v1-0) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 29 17:08:02 PDT 2018
¦[ 14.563819] md: reading super blocks
[ 14.567401] md: load sb for sdab2 adding …
[ 14.571844] md: checking sdab2 UUID and sb with others drives
[ 14.577592] md: considering sdab2 …
[ 14.581256] md: adding sdab2 …

================================================================================
Starting Micro Kernel First Stage Boot.

Date: Thu Jan 3 15:19:52 UTC 2019
USB version: 10.3 (583385)
Kernel version: 4.4.36-583371-opt
Boot device: sdab2[0]

Disable SATA write cache on sdab
Loading module: ipmi-msghandler
Loading module: ipmi-si
Loading module: ipmi-devintf
Load crashdump kernel

Saving file /tmp/bios.ini in progress…

Successfully Completed
LSI SAS Expander boot failure sense utility.
Bring up interconnect
Loading NTBiNic modules
Assembling and Mounting HDD MD array
Trying throttle Peer I/O for boot_mk_boot
md UUID 2ed23140-acb3-2bcd-85eb-192c9f759f8c found on 21 drives using bank 0, partition 4
Did not find any head drives using bank 1
Selected bank 0 for md assembly
bank:0 AF-205129-20181215222249
My controller id is 1
Reading all physical volumes. This may take a while…
Reading all physical volumes. This may take a while…
Found volume group “vg1” using metadata type lvm2
5 logical volume(s) in volume group “vg1” now active
dumpe2fs 1.42.9 (28-Dec-2013)
dumpe2fs 1.42.9 (28-Dec-2013)
Backup superblock BS:4096, SB offsets:32768 98304 163840 229376 294912 819200 884736
Using root device md_d10p1
Verifying HDD MD kernel
Mk hook script not found.

******************************************
* Default option is 0, timeout is 2 sec *
* *
* 0. Normal mode *
* 1. No service mode *
* 2. Maintenance mode *
******************************************
Enter your option:

Booting into Normal Mode.
Loading kexec….
Kernel Image: vmlinuz-4.4.36-588731-opt
Command Line: console=nsklog init=/sbin/mk_boot.sh ro rootflags=data=ordered md_mod.aamd=0 console=ttyS0,115200 raid=part crashkernel=96M@256M panic=1 quiet rootdelay=10 processor.max_cstate=1 idle=halt mk_hdd_rt=md_d10p1 mk_usb_rt=md_d0p1
Saving mk log messages into ram …
[ 25.506451] IPMI WDG state is not valid (curr_timeout 0 new_timeout 600 state 3 action_val 3) … rearming
0 logical volume(s) in volume group “vg1” now active
mdadm: stopped /dev/md_d10
mdadm: stopped /dev/md_d12
mdadm: stopped /dev/md_d16
Disabling first expansion shelf phys: 0 1 2 3
Disabling first expansion shelf phys on right port: 36 37 38 39
[ 27.486985] MD reboot: dropping new IO requests
[ 27.491526] MD reboot: waiting for pending IOs to drain
[ 29.497416] MD reboot: resuming after draining IOs
[ 30.828524] kexec_core: Starting new kernel
[ 0.000000] Linux version 4.4.36-588731-opt (build@centos-7-2-1511-v1-0) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Oct 9 17:34:10 PDT 2018
¦

================================================================================
Starting NimbleOS INITRAMFS boot stage
Probing for USB devices
Waiting 10 secs for USB discovery to complete
Assembling MD raid devices
md UUID 2ed23140-acb3-2bcd-85eb-192c9f759f8c found on 21 drives using bank 0, partition 4
Did not find any head drives using bank 1
Selected bank 0 for md assembly
bank:0 AF-205129-20181215222249
My controller id is 1
Reading all physical volumes. This may take a while…
Reading all physical volumes. This may take a while…
Found volume group “vg1” using metadata type lvm2
5 logical volume(s) in volume group “vg1” now active
Using root device md_d10p1
Disabling any HW watchdog timer
Saving ramfs logs
0 logical volume(s) in volume group “vg1” now active
================================================================================
Starting Nimble Array on Intel
mdadm: /dev/md_d0 has been started with 1 drive.
Loading module: loop
Loading module: dca
Loading module: ixgbe
Loading module: usbhid
Loading module: ipmi-msghandler
Loading module: ipmi-si
Loading module: ipmi-devintf
Loading module: i40e
Loading module: loop
Checking and mounting system filesystems
Mounting file systems
File-based locking initialisation failed.
Reading all physical volumes. This may take a while…
Found volume group “vg1” using metadata type lvm2
File-based locking initialisation failed.
5 logical volume(s) in volume group “vg1” now active

System Booting
Controller: B
Date: Thu Jan 3 15:20:32 UTC 2019
Restoring mk logs from RAM
Load crashdump kernel
Loading module: imc_smb nvdimm_core agigaram jedec_nvdimm nvdimm_smbus nvdimm_mem
9596 2019-01-03,15:20:48.948892+00 INFO: plat:main: writing pnvdimm_dsdsb -valid -armed -data_valid dsd off:size 4096:4242538496
Loading plx module if hw supports it
Loading NTBiNic modules
Loading module: ns_dhm
Enabling scsistats collection

Model: HF20
Version: 5.0.5.200-588749-opt
Controller SN: PQS2006AP11846058D
MAC Address: 00:e0:ed:48:b8:bc
NVDIMM SN: 1082821F

logger: IPMI watchdog params… [DONE]
Tuning system runtime parameters
Use default hostname ctl1
Bring up interface lo
Bring up interface eth1b
Bring up interface eth1a
Bring up interface eth0b
Setting eth0b rx-usecs to 7
Bring up interface eth0a
Setting eth0a rx-usecs to 7
Bring up interface i2
Bring up interface i1
Set up net filters
iptables: No chain/target/match by that name.
iptables: No chain/target/match by that name.
Starting syslogd
Checking update state
Checking system partition quick-rebuild config: [true]
md_d0: ro 1 raid10 1 bitmap 1
md_d10: ro 0 raid10 0 bitmap 1
Setting up memory for system partition quick-rebuild
Starting system partition quick-rebuild on md_d10
md_d12: ro 0 raid10 0 bitmap 1
Starting system partition quick-rebuild on md_d12
md_d14: ro 0 raid10 1 bitmap 1
md_d16: ro 0 raid10 0 bitmap 1
Starting system partition quick-rebuild on md_d16
Enabling first expansion shelf phys: 0 1 2 3
Enabling first expansion shelf phys on right port: 36 37 38 39
sleeping for 10 seconds to discover ES expanders
Check for required controller specific firmware upgrades…
Verifying Intel BIOS settings
Verifying PCI slot speeds
Local BMC IP: 0.0.0.0
^[[5~^[[6~Channel 1 is not a LAN channel
Starting pmd
Starting disk check

Nimble Storage Console

ctl1 login:

 

Categories:

Tags:

No responses yet

Skriv et svar

Din e-mailadresse vil ikke blive publiceret. Krævede felter er markeret med *

This site uses Akismet to reduce spam. Learn how your comment data is processed.