Array variables and their expansion in bash

Array variables and their expansion in bash

Quote escape, word splitting, array usage usually become big struggles for new bash developers.

This post will cover around array in bash and related topics.

There will be two sections. The first section introduces fundamentals and definitions of arrays in bash. After that, the second section will raise a real-life example that makes use of an array in bash. This real-life example is the main reason for the born of this post.

If you have already known about array and basic operations with an array in bash. You can skip the first section and go-ahead to the second section.


Array fundamentals

Bash array manual docs

Declaration

There are two types of an array in bash: indexed array (zero-based index) and associative array (arbitrary index).

To declare a variable as an indexed array variable

foo[1]=bar
declare -a foo
declare -a foo[1] # exactly same as declare -a foo
foo=([1]=bar [5]=baar foo fooo)
These commands are separated commands

To declare an associative array variable

declare -A asc_array
declare asc_array[key1]
asc_array[key1]="value 1"
asc_array=([key1]="value 1" [key2]="value 2")
These commands are separated commands

Similar to declare, local and readonly builtins accept -a and -A optional to declare array variables.
read builtin accepts -a option to assign a list of words read from the standard input to an array.
mapfile command ( -t flag is recommended) accepts a variable name and creates an indexed array whose elements keep lines fed from standard input.

In an indexed array, negative indices count back from the end of the array. i.e. foo[-2] is the second-last element of the indexed array foo.

Note that read is a builtin, not a command in bash. If you run read my_var, then enter hello \, the input prompt will wait for your next input. In opposite, if you run read -r my_var, then enter hello \, the input prompt will end immediately.

Operations

Adding elements to an array

#simply defining new key
foo[4]="new value"
asc_array[key3]="value 3"

#use += operator
foo+=("trailing val1" "trailing val2")
asc_array+=([key4]="value 4" [key5]="value 5")

Deleting an element from an array

unset foo[4]
unset asc_array[key3]

Delete entire an array

unset foo
unset foo[*]
unset foo[@]
unset asc_array
unset asc_array[*]
unset asc_array[@]

Expansion

${foo[1]} or ${asc_array[key1]} are referenced to a single array element.
${foo[@]} and ${foo[*]} expand to all members of the array foo.
"${foo[@]}" expands each element to a single word, while "${foo[*]}" expands all elements to a single word.

Let's see an example: my_arr=("a b" "c d").
Create a test script to count the number of parameters passed, called test.sh

#!/usr/bin/env bash
echo $#
my_arr=("a b" "c d")
./test.sh ${my_arr} #print 2. same meaning as ./test.sh ${my_arr[0]}
./test.sh ${my_arr[1]} #print 2
./test.sh ${my_arr[1]} #print 2
./test.sh ${my_arr[@]} #print 4
./test.sh ${my_arr[*]} #print 4
./test.sh "${my_arr[@]}" #print 2
./test.sh "${my_arr[*]}" #print 1
./test.sh "e${my_arr[@]}f" #print 2. expanded to "ea b" and "c df" 
./test.sh "e${my_arr[*]}f" #print 1. expanded to "ea b c df" 

${#my_arr[@]} or ${#my_arr[*]} refer to the number of elements in an array.

${!my_arr[@]} and ${!my_arr[*]} refer to the indices array of the array.

Referencing to an array variable without a subscript is equivalent to referencing with a subscript of 0.


Notice

  • When expanding an array in a word, [@] usually produces an unexpected result. Thus, [*] is recommended in this case. When being expanded alone "${name[@]}" , [@] is likely to be used especially (e.g. the array variable keeps a list of files, parameters, ...)

For example, there is an array variable flags=(--rm -it). The expansion "--flags=${flags[@]}" is equivalent to two words --flags=--rm and -it, while the expansion with * "--flags=${flags[*]}" is usually a more expected result. Because * concatenates all values in the array to a single word ---flags=--rm -it.

  • * concatenation bases on IFS value
my_list=(a b)
echo "${my_list[*]}"
#print: a b

old_IFS="${IFS}"
IFS=,
echo "${my_list[*]}"
#print: a,b
IFS="${old_IFS}"

IFS=, eval 'echo "${my_list[*]}"'
#print: a,b
  • Because the braces syntax regards filename expansion, care should be taken when the content between braces is dynamical, such as from variable, command substitution.
ls *.sh #for e.g. print 3 files with .sh extension
#a.sh b.sh c.sh
my_var="*.sh"
my_list=( ${my_var} )
echo ${#my_list[@]} #print 3

my_list=( $(echo "*.sh") )
echo ${#my_list[@]} #print 3

read and mapfile are recommended in this situation. mapfile is used if there are multiple lines, while read is used to separate space-delimited words from a single line.

my_var="*.sh *.txt"
read -ra my_list <<< "${my_var}"
read -ra my_list < <(echo "${my_var}")

my_var='*.sh
*.txt'
mapfile -t my_list <<< "${my_var}"
mapfile -t my_list < <(echo "*.sh"; echo "*.txt")

<( syntax is called process substitution.

List of shell check rules related to array: SC2089, SC2206, SC2068, SC2145, SC2207.


Real-life example

There is a list of C source codes that needs to be compiled. We (bash script writers) want to keep this list in a variable to separate concern/dynamically manipulate this data.

Without array, storing all source code file paths in a variable

file_paths="my project/libs.c my project/main.c"

If there is space in the file path, a single variable can not fit the requirement of this problem. gcc ${file_paths} interprets 4 files my, project/libs.c, my, and project/main.c. On the other hand, gcc "${file_paths}" interprets only one file named my project/libs.c my project/main.c.

With an array, this requirement can be achieved as follows

file_paths=("my project/libs.c" "my project/main.c")

To use this array, expand its value like gcc "${file_paths[@]}".

A little higher level of difficulty: how to send this command via ssh, or, encapsulate it in a string as a parameter to /bin/bash command (e.g. to pass in a docker command).

Neither of [@] nor [*] work. In this situation, printf can help us

ssh my_gcc_server -t "gcc $(printf "\"%s\" " "${file_paths[@]}")"

In a nutshell, the printf command in the command substitution $() expands all variables in file_paths list to a string which contains all elements in a double quote, i.e. "my project/libs.c" "my project/main.c".

Look at printf official manual docs, when there are more variables ( 2) than the number of variables required by format (as \"%s\" only requires one variable), the format is reused multiple times.

The format argument is reused as necessary to convert all the given arguments. For example, the command ‘printf %s a b’ outputs ‘ab’.
Missing arguments are treated as null strings or as zeros, depending on whether the context expects a string or a number. For example, the command ‘printf %sx%d’ prints ‘x0’.