Bug #17325
closedstorage class 'drain' scenario test
0%
Description
for RT#127 support
- create a cluster with 2 volumen with replication 1
- make one of the volumes storage class 'DRAIN'
- run keep-balance
this should move all the blocks from one volume to another if so, document the process
Updated by Nico César about 4 years ago
Started a cluster with arvie and ran the cwl1.2 compliance test to have a bunch of data in keepstores. Then installed keep-balance in keepproxy container and this is the output:
root@0115289509ae:/app# keep-balance -once 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" get_state: start mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 525 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 515 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 525 entries to map at 1x (525 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 515 entries to map at 1x (515 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done collections: 0/3345 collections: 1000/3345 collections: 2000/3345 collections: 3000/3345 collections: 3345/3345 collections: 3345/3345 get_state: took 1.8234625100000001s changeset_compute: start changeset_compute: took 0.00203597s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 165 replicas (165 blocks, 725509297 bytes) underreplicated (0<have<want) 871 replicas (871 blocks, 995722077 bytes) just right (have=want) 2 replicas (2 blocks, 1129 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "default": 1038 replicas (1038 blocks, 1721232503 bytes) needed storage class "default": 2 replicas (2 blocks, 1129 bytes) unneeded storage class "default": 165 replicas (165 blocks, 725509297 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === 1038 replicas (1038 blocks, 1721232503 bytes) total commitment (excluding unreferenced) 1040 replicas (1038 blocks, 1721233632 bytes) total usage === test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:88, Trashes:0} test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:77, Trashes:0} === Replication level distribution: 0: 1 ##### 1: 1036 ########################################################## 2: 2 ######### === sweep: took 1.8630594839999999s
Updated by Nico César about 4 years ago
I dedided to add "archival" as discribed in https://doc.arvados.org/v2.1/admin/storage-classes.html
root@6f65f83fdd08:/app# keep-balance -once -commit-pulls=true -commit-trash=true 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" clearing existing trash lists, in case the new rendezvous order differs from previous run send_trash_lists: start send_trash_lists: took 0.000398614s get_state: start mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 515 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 515 entries to map at 1x (515 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done collections: 0/3345 test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 1038 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 1038 entries to map at 1x (1038 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done collections: 1000/3345 collections: 2000/3345 collections: 3000/3345 collections: 3345/3345 collections: 3345/3345 get_state: took 1.806154504s changeset_compute: start changeset_compute: took 0.004263266s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want) 523 replicas (523 blocks, 662217104 bytes) just right (have=want) 515 replicas (515 blocks, 1059015399 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "archival": 0 replicas (0 blocks, 0 bytes) needed storage class "archival": 515 replicas (515 blocks, 1059015399 bytes) unneeded storage class "archival": 0 replicas (0 blocks, 0 bytes) pulling storage class "archival": 0 replicas (0 blocks, 0 bytes) unachievable === storage class "default": 1038 replicas (1038 blocks, 1721232503 bytes) needed storage class "default": 0 replicas (0 blocks, 0 bytes) unneeded storage class "default": 0 replicas (0 blocks, 0 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === 1038 replicas (1038 blocks, 1721232503 bytes) total commitment (excluding unreferenced) 1553 replicas (1038 blocks, 2780247902 bytes) total usage === test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:0, Trashes:0} test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:0, Trashes:0} === Replication level distribution: 0: 1 ###### 1: 523 ########################################################## 2: 515 ##########################################################
Updated by Nico César about 4 years ago
last run:
root@6f65f83fdd08:/app# keep-balance -once -commit-pulls=true -commit-trash=true 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" clearing existing trash lists, in case the new rendezvous order differs from previous run send_trash_lists: start send_trash_lists: took 0.000426701s get_state: start mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 515 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 515 entries to map at 1x (515 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done collections: 0/3345 test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 1038 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 1038 entries to map at 1x (1038 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done collections: 1000/3345 collections: 2000/3345 collections: 3000/3345 collections: 3345/3345 collections: 3345/3345 get_state: took 1.910985301s changeset_compute: start changeset_compute: took 0.001566913s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want) 523 replicas (523 blocks, 662217104 bytes) just right (have=want) 515 replicas (515 blocks, 1059015399 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "archival": 0 replicas (0 blocks, 0 bytes) needed storage class "archival": 515 replicas (515 blocks, 1059015399 bytes) unneeded storage class "archival": 0 replicas (0 blocks, 0 bytes) pulling storage class "archival": 0 replicas (0 blocks, 0 bytes) unachievable === storage class "default": 1038 replicas (1038 blocks, 1721232503 bytes) needed storage class "default": 0 replicas (0 blocks, 0 bytes) unneeded storage class "default": 0 replicas (0 blocks, 0 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === 1038 replicas (1038 blocks, 1721232503 bytes) total commitment (excluding unreferenced) 1553 replicas (1038 blocks, 2780247902 bytes) total usage === test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:0, Trashes:0} test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:0, Trashes:0} === Replication level distribution: 0: 1 ###### 1: 523 ########################################################## 2: 515 ########################################################## === send_pull_lists: start send_pull_lists: took 0.000739257s send_trash_lists: start send_trash_lists: took 0.000432182s sweep: took 2.034863354s
Updated by Nico César about 4 years ago
- Subject changed from storage class 'DRAIN' scenario test to storage class 'archival' scenario test
Updated by Nico César about 4 years ago
- Subject changed from storage class 'archival' scenario test to storage class 'drain' scenario test
Updated by Nico César about 4 years ago
Starting from scratch, so we use "drain" and document every step.
install keep-balance, add it to the config and marked one of the buckets as read only:
# keep-balance -once -commit-pulls=false -commit-trash=false 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" get_state: start mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 322 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 322 entries to map at 1x (322 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 353 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 353 entries to map at 1x (353 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done collections: 0/2090 collections: 1000/2090 collections: 2000/2090 collections: 2090/2090 collections: 2090/2090 get_state: took 1.180681104s changeset_compute: start changeset_compute: took 0.001066497s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want) 671 replicas (671 blocks, 493815320 bytes) just right (have=want) 2 replicas (2 blocks, 1129 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "default": 673 replicas (673 blocks, 493816449 bytes) needed storage class "default": 2 replicas (2 blocks, 1129 bytes) unneeded storage class "default": 0 replicas (0 blocks, 0 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === 673 replicas (673 blocks, 493816449 bytes) total commitment (excluding unreferenced) 675 replicas (673 blocks, 493817578 bytes) total usage === test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:0, Trashes:0} test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:0, Trashes:0} === Replication level distribution: 0: 1 ###### 1: 671 ########################################################## 2: 2 ######### === sweep: took 1.2174236330000001s
then change the configuration and add
StorageClasses: drain: true
to the specific volume and restart the corresponding keepstore, then run again:
root@2e1c4f0b4e89:/app# keep-balance -once -commit-pulls=false -commit-trash=false 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" get_state: start mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 353 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 353 entries to map at 1x (353 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 322 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 322 entries to map at 1x (322 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done collections: 0/2090 collections: 1000/2090 collections: 2000/2090 collections: 2090/2090 collections: 2090/2090 get_state: took 1.201796697s changeset_compute: start changeset_compute: took 0.001673737s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 320 replicas (320 blocks, 161808914 bytes) underreplicated (0<have<want) 355 replicas (353 blocks, 332008664 bytes) just right (have=want) 0 replicas (0 blocks, 0 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "default": 353 replicas (353 blocks, 332007535 bytes) needed storage class "default": 0 replicas (0 blocks, 0 bytes) unneeded storage class "default": 320 replicas (320 blocks, 161808914 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === storage class "drain": 322 replicas (322 blocks, 161810043 bytes) needed storage class "drain": 0 replicas (0 blocks, 0 bytes) unneeded storage class "drain": 0 replicas (0 blocks, 0 bytes) pulling storage class "drain": 0 replicas (0 blocks, 0 bytes) unachievable === 675 replicas (673 blocks, 493817578 bytes) total commitment (excluding unreferenced) 675 replicas (673 blocks, 493817578 bytes) total usage === test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:0, Trashes:0} test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:320, Trashes:0} === Replication level distribution: 0: 1 ###### 1: 671 ########################################################## 2: 2 ######### === sweep: took 1.241898839s
Updated by Nico César about 4 years ago
Now we reun keep balance twice with -commit-pulls=true -commit-trash=true
root@2e1c4f0b4e89:/app# keep-balance -once -commit-pulls=true -commit-trash=true 2>&1 | jq -r '.msg' (..) root@2e1c4f0b4e89:/app# keep-balance -once -commit-pulls=true -commit-trash=true 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" clearing existing trash lists, in case the new rendezvous order differs from previous run send_trash_lists: start send_trash_lists: took 0.000416834s get_state: start mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 322 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 322 entries to map at 1x (322 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done collections: 0/2090 test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 673 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 673 entries to map at 1x (673 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done collections: 1000/2090 collections: 2000/2090 collections: 2090/2090 collections: 2090/2090 get_state: took 1.198901342s changeset_compute: start changeset_compute: took 0.001188916s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want) 995 replicas (673 blocks, 655626492 bytes) just right (have=want) 0 replicas (0 blocks, 0 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "default": 673 replicas (673 blocks, 493816449 bytes) needed storage class "default": 0 replicas (0 blocks, 0 bytes) unneeded storage class "default": 0 replicas (0 blocks, 0 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === storage class "drain": 322 replicas (322 blocks, 161810043 bytes) needed storage class "drain": 0 replicas (0 blocks, 0 bytes) unneeded storage class "drain": 0 replicas (0 blocks, 0 bytes) pulling storage class "drain": 0 replicas (0 blocks, 0 bytes) unachievable === 995 replicas (673 blocks, 655626492 bytes) total commitment (excluding unreferenced) 995 replicas (673 blocks, 655626492 bytes) total usage === test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:0, Trashes:0} test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:0, Trashes:0} === Replication level distribution: 0: 1 ###### 1: 351 ########################################################## 2: 322 ######################################################### === send_pull_lists: start send_pull_lists: took 0.000583328s send_trash_lists: start send_trash_lists: took 0.000301969s sweep: took 1.238894471s
and we make sure we have no underreplicated and everything "just right"
0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want) 995 replicas (673 blocks, 655626492 bytes) just right (have=want)
now we can move the files out of there and restart the keepstore just to check that everything is as expected, this is the output:
root@2e1c4f0b4e89:/app# keep-balance -once -commit-pulls=false -commit-trash=false 2>&1 | jq -r '.msg' sweep: start listening skipping test1-bi6l4-e95ff93d09bdc567 with service type "proxy" get_state: start mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): retrieve index from test1-bi6l4-4115e11442571448 (keep1:25108, disk) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): retrieve index from test1-bi6l4-d79bdca77949726b (keep0:25107, disk) test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): add 0 entries to map test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): added 0 entries to map at 1x (0 replicas) mount test1-nyw5e-111111111111111 () on test1-bi6l4-4115e11442571448 (keep1:25108, disk): index done collections: 0/2090 test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): add 673 entries to map test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): added 673 entries to map at 1x (673 replicas) mount test1-nyw5e-000000000000000 () on test1-bi6l4-d79bdca77949726b (keep0:25107, disk): index done collections: 1000/2090 collections: 2000/2090 collections: 2090/2090 collections: 2090/2090 get_state: took 1.203116008s changeset_compute: start changeset_compute: took 0.00127101s === 1 replicas (1 blocks, 0 bytes) lost (0=have<want) 0 replicas (0 blocks, 0 bytes) underreplicated (0<have<want) 673 replicas (673 blocks, 493816449 bytes) just right (have=want) 0 replicas (0 blocks, 0 bytes) overreplicated (have>want>0) 0 replicas (0 blocks, 0 bytes) unreferenced (have>want=0, new) 0 replicas (0 blocks, 0 bytes) garbage (have>want=0, old) === storage class "default": 673 replicas (673 blocks, 493816449 bytes) needed storage class "default": 0 replicas (0 blocks, 0 bytes) unneeded storage class "default": 0 replicas (0 blocks, 0 bytes) pulling storage class "default": 1 replicas (1 blocks, 0 bytes) unachievable === 673 replicas (673 blocks, 493816449 bytes) total commitment (excluding unreferenced) 673 replicas (673 blocks, 493816449 bytes) total usage === test1-bi6l4-4115e11442571448 (keep1:25108, disk): ChangeSet{Pulls:0, Trashes:0} test1-bi6l4-d79bdca77949726b (keep0:25107, disk): ChangeSet{Pulls:0, Trashes:0} === Replication level distribution: 0: 1 ###### 1: 673 ########################################################## === sweep: took 1.2420714959999999s
Updated by Nico César about 4 years ago
- Status changed from In Progress to Resolved
Sent the corresponding email for this test.