Updating JSON objects with JQ
Parsing entires in a JSON array
Given the following array containing directory paths:
Now stored as the paths
variable:
Removing the prefix
Removing the prefix with gsub
We can use gsub
to remove the /home/user/
directory prefix:
Which produces:
Removing the prefix with JQ map
We can also use JQ map
to remove the /home/user/
directory prefix:
Which produces:
Selecting entries based on a pattern
Selecting entries that match a pattern
Use select(test(...))
to create a new array containing only paths that end in .txt
:
Which produces:
Selecting entries that do not match a pattern
Use select(test(... | not))
to create a new array excluding paths that end in .log
:
Which produces:
Putting it all together
We now have an updated array stored as the paths
variable:
#!/usr/bin/env bash
# updated array
paths='[
"/home/user/project/file1.txt",
"/home/user/project/subdir/file2.log",
"/home/user/data/foo.txt",
"/home/user/data/bar.log",
"/home/user/data/baz.txt"
]'
# 1) remove the '/home/user/' prefix
no_prefix=$(jq -c '.[] |= gsub("^/home/user/";"")' <<< "$paths")
# 2) keep only '.txt' files
only_txt_files=$(jq -c 'map(select(test("\\.txt$")))' <<< "$no_prefix")
# 3) exclude '.txt' files
no_txt_files=$(jq -c 'map(select(test("\\.txt$") | not))' <<< "$no_prefix")
# 4) print the results
echo "Only '.txt' files: "
echo " ${only_txt_files}"
echo ""
echo "Exclude '.txt' files: "
echo " ${no_txt_files}"
Which produces:
Only '.txt' files:
["project/file1.txt","data/foo.txt","data/baz.txt"]
Exclude '.txt' files:
["project/subdir/file2.log","data/bar.log"]
Updating the JSON object
Updated JSON array
Now we have a more unique set of data in our JSON array
Given the following array containing directory paths:
[
"/home/user/projects/my-test-abc/foo.txt",
"/home/user/projects/my-test-abc/bar.log",
"/home/user/projects/dev-jkl/foo.txt",
"/home/user/projects/dev-jkl/bar.log",
"/home/user/projects/dev-jkl/baz.log",
"/home/user/projects/primary-test-project-xyz/foo.txt",
"/home/user/projects/primary-test-project-xyz/bar.log",
"/home/user/projects/primary-test-project-xyz/baz.txt"
]
Now stored as the paths
variable:
paths='[
"/home/user/projects/my-test-abc/foo.txt",
"/home/user/projects/my-test-abc/bar.log",
"/home/user/projects/dev-jkl/foo.txt",
"/home/user/projects/dev-jkl/bar.log",
"/home/user/projects/dev-jkl/baz.log",
"/home/user/projects/primary-test-project-xyz/foo.txt",
"/home/user/projects/primary-test-project-xyz/bar.log",
"/home/user/projects/primary-test-project-xyz/baz.txt"
]'
The planned output
I want the output to be usable by a GitHub actions matrix, so it should be:
{
"projects": [
"test-abc",
"test-jkl",
"test-xyz"
],
"include": [
{
"project": "test-abc",
"files": [
"my-test-abc/foo.txt",
"my-test-abc/bar.log"
]
},
{
"project": "dev-jkl",
"files": [
"dev-jkl/foo.txt",
"dev-jkl/bar.log",
"dev-jkl/baz.log"
]
},
{
"project": "primary-test-project-xyz",
"files": [
"primary-test-project-xyz/foo.txt",
"primary-test-project-xyz/bar.log",
"primary-test-project-xyz/baz.txt"
]
}
]
}
Building the new JSON object
jq -c '
# 1) Strip off the the '/home/user/projects' directory prefix
map(sub("^/home/user/projects/";""))
# 2) Turn each element into an object with 'project' and 'file' keys
| map({ project: (split("/")[0]), file: . })
# 3) (Optional) Sort by project so group_by will work predictably
| sort_by(.project)
# 4) Group into arrays by project name
| group_by(.project)
# 5) Build the final output object:
| {
projects: map(.[0].project), # a simple list of project names
include: map({
project: .[0].project, # the project name
files: map(.file) # all files in that project
})
}
' <<<"$paths"
Which produces:
{"projects":["dev-jkl","my-test-abc","primary-test-project-xyz"],"include":[{"project":"dev-jkl","files":["dev-jkl/foo.txt","dev-jkl/bar.log","dev-jkl/baz.log"]},{"project":"my-test-abc","files":["my-test-abc/foo.txt","my-test-abc/bar.log"]},{"project":"primary-test-project-xyz","files":["primary-test-project-xyz/foo.txt","primary-test-project-xyz/bar.log","primary-test-project-xyz/baz.txt"]}]}
About the key filters
Some information about the key filters:
-
map(sub("^/home/user/projects/";""))
- Removes the fixed leading path so you are left with
"my-test-abc/foo.txt"
, etc.
- Removes the fixed leading path so you are left with
-
map({ project: (split("/")[0]), file: . })
- Splits each string on
/
and uses the first segment asproject
, the whole string asfile
.
- Splits each string on
-
sort_by(.project) | group_by(.project)
- Ensures identical projects are adjacent, then buckets them into arrays.
-
Building the output object
projects
becomes a flat array of each group’s name.include
is an array of{ project, files }
objects.
{ projects: map(.[0].project), # (1)! include: map({ # (2)! project: .[0].project, files: map(.file) }) }
projects
becomes a flat array of each group’s name.include
is an array of{ project, files }
objects.
Storing the output as a variable
matrix=$(jq -c '
map(sub("^/home/user/projects/";""))
| map({ project: (split("/")[0]), file: . })
| sort_by(.project)
| group_by(.project)
| {
projects: map(.[0].project),
include: map({ project: .[0].project, files: map(.file) })
}
' <<<"$paths")
Which produces:
~ echo $matrix
{"projects":["dev-jkl","my-test-abc","primary-test-project-xyz"],"include":[{"project":"dev-jkl","files":["dev-jkl/foo.txt","dev-jkl/bar.log","dev-jkl/baz.log"]},{"project":"my-test-abc","files":["my-test-abc/foo.txt","my-test-abc/bar.log"]},{"project":"primary-test-project-xyz","files":["primary-test-project-xyz/foo.txt","primary-test-project-xyz/bar.log","primary-test-project-xyz/baz.txt"]}]}
or if you want it more readable issue echo $matrix | jq
:
~ echo $matrix | jq
{
"projects": [
"dev-jkl",
"my-test-abc",
"primary-test-project-xyz"
],
"include": [
{
"project": "dev-jkl",
"files": [
"dev-jkl/foo.txt",
"dev-jkl/bar.log",
"dev-jkl/baz.log"
]
},
{
"project": "my-test-abc",
"files": [
"my-test-abc/foo.txt",
"my-test-abc/bar.log"
]
},
{
"project": "primary-test-project-xyz",
"files": [
"primary-test-project-xyz/foo.txt",
"primary-test-project-xyz/bar.log",
"primary-test-project-xyz/baz.txt"
]
}
]
}
Other changes or improvements
The following changes or improvements can be made:
- The sort order (e.g. alphabetical) can be updated by changing or removing
sort_by(.project)
- We can rename the project slugs by inserting a
| gsub("^(my-|dev-|primary-test-project-)";"")
on.project
- This could be helpful if we need to drop prefixes
- Would probably make sense to parse the directory name prefix to a variable like
env
Validating the paths variable
If there is a situtation where the paths
variable is set to a single path that is not formatted properly as an array, we can check it and fix it using jq
:
Checking the paths
variable:
We can evaluate the 'type' using jq
:
Fixing the paths
variable:
We can evaludate the 'type' using jq
:
Putting it together:
paths='/home/user/projects/my-test-abc/foo.txt'
# 1) if it's not already a JSON array, wrap it in one
if ! jq -e 'type == "array"' <<<"$paths" >/dev/null 2>&1; then
paths=$(jq -nc --arg p "$paths" '[$p]')
#
# now $paths is guaranteed to be a JSON array
echo "Normalized paths: $paths"
# → ["\/home\/user\/projects\/my-test-abc\/foo.txt"]
fi