Array variables and their expansion in bash
Quote escape, word splitting, array usage usually become big struggles for new bash developers.
This post will cover around array in bash and related topics.
There will be two sections. The first section introduces fundamentals and definitions of arrays in bash. After that, the second section will raise a real-life example that makes use of an array in bash. This real-life example is the main reason for the born of this post.
If you have already known about array and basic operations with an array in bash. You can skip the first section and go-ahead to the second section.
Array fundamentals
Declaration
There are two types of an array in bash: indexed array (zero-based index) and associative array (arbitrary index).
To declare a variable as an indexed array variable
To declare an associative array variable
Similar to declare
, local
and readonly
builtins accept -a
and -A
optional to declare array variables.read
builtin accepts -a
option to assign a list of words read from the standard input to an array.mapfile
command ( -t
flag is recommended) accepts a variable name and creates an indexed array whose elements keep lines fed from standard input.
In an indexed array, negative indices count back from the end of the array. i.e. foo[-2]
is the second-last element of the indexed array foo
.
Note that read
is a builtin, not a command in bash
. If you run read my_var
, then enter hello \
, the input prompt will wait for your next input. In opposite, if you run read -r my_var
, then enter hello \
, the input prompt will end immediately.
Operations
Adding elements to an array
#simply defining new key
foo[4]="new value"
asc_array[key3]="value 3"
#use += operator
foo+=("trailing val1" "trailing val2")
asc_array+=([key4]="value 4" [key5]="value 5")
Deleting an element from an array
unset foo[4]
unset asc_array[key3]
Delete entire an array
unset foo
unset foo[*]
unset foo[@]
unset asc_array
unset asc_array[*]
unset asc_array[@]
Expansion
${foo[1]}
or ${asc_array[key1]}
are referenced to a single array element.${foo[@]}
and ${foo[*]}
expand to all members of the array foo
."${foo[@]}"
expands each element to a single word, while "${foo[*]}"
expands all elements to a single word.
Let's see an example: my_arr=("a b" "c d")
.
Create a test script to count the number of parameters passed, called test.sh
#!/usr/bin/env bash
echo $#
my_arr=("a b" "c d")
./test.sh ${my_arr} #print 2. same meaning as ./test.sh ${my_arr[0]}
./test.sh ${my_arr[1]} #print 2
./test.sh ${my_arr[1]} #print 2
./test.sh ${my_arr[@]} #print 4
./test.sh ${my_arr[*]} #print 4
./test.sh "${my_arr[@]}" #print 2
./test.sh "${my_arr[*]}" #print 1
./test.sh "e${my_arr[@]}f" #print 2. expanded to "ea b" and "c df"
./test.sh "e${my_arr[*]}f" #print 1. expanded to "ea b c df"
${#my_arr[@]}
or ${#my_arr[*]}
refer to the number of elements in an array.
${!my_arr[@]}
and ${!my_arr[*]}
refer to the indices array of the array.
Referencing to an array variable without a subscript is equivalent to referencing with a subscript of 0.
Notice
- When expanding an array in a word,
[@]
usually produces an unexpected result. Thus,[*]
is recommended in this case. When being expanded alone"${name[@]}"
,[@]
is likely to be used especially (e.g. the array variable keeps a list of files, parameters, ...)
For example, there is an array variable flags=(--rm -it)
. The expansion "--flags=${flags[@]}"
is equivalent to two words --flags=--rm
and -it
, while the expansion with *
"--flags=${flags[*]}"
is usually a more expected result. Because *
concatenates all values in the array to a single word ---flags=--rm -it
.
*
concatenation bases onIFS
value
my_list=(a b)
echo "${my_list[*]}"
#print: a b
old_IFS="${IFS}"
IFS=,
echo "${my_list[*]}"
#print: a,b
IFS="${old_IFS}"
IFS=, eval 'echo "${my_list[*]}"'
#print: a,b
- Because the braces syntax regards filename expansion, care should be taken when the content between braces is dynamical, such as from variable, command substitution.
ls *.sh #for e.g. print 3 files with .sh extension
#a.sh b.sh c.sh
my_var="*.sh"
my_list=( ${my_var} )
echo ${#my_list[@]} #print 3
my_list=( $(echo "*.sh") )
echo ${#my_list[@]} #print 3
read
and mapfile
are recommended in this situation. mapfile
is used if there are multiple lines, while read
is used to separate space-delimited words from a single line.
my_var="*.sh *.txt"
read -ra my_list <<< "${my_var}"
read -ra my_list < <(echo "${my_var}")
my_var='*.sh
*.txt'
mapfile -t my_list <<< "${my_var}"
mapfile -t my_list < <(echo "*.sh"; echo "*.txt")
<(
syntax is called process substitution.
List of shell check rules related to array: SC2089, SC2206, SC2068, SC2145, SC2207.
Real-life example
There is a list of C source codes that needs to be compiled. We (bash script writers) want to keep this list in a variable to separate concern/dynamically manipulate this data.
Without array, storing all source code file paths in a variable
file_paths="my project/libs.c my project/main.c"
If there is space in the file path, a single variable can not fit the requirement of this problem. gcc ${file_paths}
interprets 4 files my
, project/libs.c
, my
, and project/main.c
. On the other hand, gcc "${file_paths}"
interprets only one file named my project/libs.c my project/main.c
.
With an array, this requirement can be achieved as follows
file_paths=("my project/libs.c" "my project/main.c")
To use this array, expand its value like gcc "${file_paths[@]}"
.
A little higher level of difficulty: how to send this command via ssh, or, encapsulate it in a string as a parameter to /bin/bash
command (e.g. to pass in a docker command).
Neither of [@]
nor [*]
work. In this situation, printf
can help us
ssh my_gcc_server -t "gcc $(printf "\"%s\" " "${file_paths[@]}")"
In a nutshell, the printf
command in the command substitution $()
expands all variables in file_paths
list to a string which contains all elements in a double quote, i.e. "my project/libs.c" "my project/main.c"
.
Look at printf official manual docs, when there are more variables ( 2
) than the number of variables required by format (as \"%s\"
only requires one variable), the format is reused multiple times.
The format argument is reused as necessary to convert all the given arguments. For example, the command ‘printf %s a b’ outputs ‘ab’.
Missing arguments are treated as null strings or as zeros, depending on whether the context expects a string or a number. For example, the command ‘printf %sx%d’ prints ‘x0’.