Skip to content

Updating JSON objects with JQ

Note

This page was originally posted in the Updating JSON objects with JQ blog.

Parsing entires in a JSON array

Given the following array containing directory paths:

[
    "/home/user/project/file1.txt",
    "/home/user/project/subdir/file2.log"
]

Now stored as the paths variable:

paths='[
    "/home/user/project/file1.txt",
    "/home/user/project/subdir/file2.log"
]'

Removing the prefix

Removing the prefix with gsub

We can use gsub to remove the /home/user/ directory prefix:

gsub_no_prefix=$(jq -c '.[] |= gsub("^/home/user/";"")' <<< "$paths")

Which produces:

~ echo $gsub_no_prefix
["project/file1.txt","project/subdir/file2.log"]

Removing the prefix with JQ map

We can also use JQ map to remove the /home/user/ directory prefix:

map_no_prefix=$(echo "$paths" |
  jq -c 'map(gsub("^/home/user/";""))'
)

Which produces:

~ echo $map_no_prefix
["project/file1.txt","project/subdir/file2.log"]

Selecting entries based on a pattern

Selecting entries that match a pattern

Use select(test(...)) to create a new array containing only paths that end in .txt:

dot_txt=$(jq -c 'map(select(test("\\.txt$")))' <<< "$paths")

Which produces:

~ echo $dot_txt
["/home/user/project/file1.txt"]

Selecting entries that do not match a pattern

Use select(test(... | not)) to create a new array excluding paths that end in .log:

no_dot_log=$(jq -c 'map(select(test("\\.log$") | not))' <<< "$paths")

Which produces:

~ echo $no_dot_log
["/home/user/project/file1.txt"]

Putting it all together

We now have an updated array stored as the paths variable:

#!/usr/bin/env bash

# updated array
paths='[
    "/home/user/project/file1.txt",
    "/home/user/project/subdir/file2.log",
    "/home/user/data/foo.txt",
    "/home/user/data/bar.log",
    "/home/user/data/baz.txt"
]'

# 1) remove the '/home/user/' prefix
no_prefix=$(jq -c '.[] |= gsub("^/home/user/";"")' <<< "$paths")

# 2) keep only '.txt' files
only_txt_files=$(jq -c 'map(select(test("\\.txt$")))' <<< "$no_prefix")

# 3) exclude '.txt' files
no_txt_files=$(jq -c 'map(select(test("\\.txt$") | not))' <<< "$no_prefix")

# 4) print the results
echo "Only '.txt' files: "
echo "  ${only_txt_files}"
echo ""
echo "Exclude '.txt' files: "
echo "  ${no_txt_files}"

Which produces:

Only '.txt' files:
  ["project/file1.txt","data/foo.txt","data/baz.txt"]

Exclude '.txt' files:
  ["project/subdir/file2.log","data/bar.log"]

Updating the JSON object

Updated JSON array

Now we have a more unique set of data in our JSON array

Given the following array containing directory paths:

[
    "/home/user/projects/my-test-abc/foo.txt",
    "/home/user/projects/my-test-abc/bar.log",
    "/home/user/projects/dev-jkl/foo.txt",
    "/home/user/projects/dev-jkl/bar.log",
    "/home/user/projects/dev-jkl/baz.log",
    "/home/user/projects/primary-test-project-xyz/foo.txt",
    "/home/user/projects/primary-test-project-xyz/bar.log",
    "/home/user/projects/primary-test-project-xyz/baz.txt"
]

Now stored as the paths variable:

paths='[
    "/home/user/projects/my-test-abc/foo.txt",
    "/home/user/projects/my-test-abc/bar.log",
    "/home/user/projects/dev-jkl/foo.txt",
    "/home/user/projects/dev-jkl/bar.log",
    "/home/user/projects/dev-jkl/baz.log",
    "/home/user/projects/primary-test-project-xyz/foo.txt",
    "/home/user/projects/primary-test-project-xyz/bar.log",
    "/home/user/projects/primary-test-project-xyz/baz.txt"
]'

The planned output

I want the output to be usable by a GitHub actions matrix, so it should be:

{
    "projects": [
        "test-abc",
        "test-jkl",
        "test-xyz"
    ],
    "include": [
        {
            "project": "test-abc",
            "files": [
                "my-test-abc/foo.txt",
                "my-test-abc/bar.log"
            ]
        },
        {
            "project": "dev-jkl",
            "files": [
                "dev-jkl/foo.txt",
                "dev-jkl/bar.log",
                "dev-jkl/baz.log"
            ]
        },
        {
            "project": "primary-test-project-xyz",
            "files": [
                "primary-test-project-xyz/foo.txt",
                "primary-test-project-xyz/bar.log",
                "primary-test-project-xyz/baz.txt"
            ]
        }
    ]
}

Building the new JSON object

jq -c '
  # 1) Strip off the the '/home/user/projects' directory prefix
  map(sub("^/home/user/projects/";""))

  # 2) Turn each element into an object with 'project' and 'file' keys
  | map({ project: (split("/")[0]), file: . })

  # 3) (Optional) Sort by project so group_by will work predictably
  | sort_by(.project)

  # 4) Group into arrays by project name
  | group_by(.project)

  # 5) Build the final output object:
  | {
      projects:   map(.[0].project),                # a simple list of project names
      include:    map({
                     project: .[0].project,         # the project name
                     files:   map(.file)            # all files in that project
                 })
    }
' <<<"$paths"

Which produces:

{"projects":["dev-jkl","my-test-abc","primary-test-project-xyz"],"include":[{"project":"dev-jkl","files":["dev-jkl/foo.txt","dev-jkl/bar.log","dev-jkl/baz.log"]},{"project":"my-test-abc","files":["my-test-abc/foo.txt","my-test-abc/bar.log"]},{"project":"primary-test-project-xyz","files":["primary-test-project-xyz/foo.txt","primary-test-project-xyz/bar.log","primary-test-project-xyz/baz.txt"]}]}

About the key filters

Some information about the key filters:

  • map(sub("^/home/user/projects/";""))

    • Removes the fixed leading path so you are left with "my-test-abc/foo.txt", etc.
  • map({ project: (split("/")[0]), file: . })

    • Splits each string on / and uses the first segment as project, the whole string as file.
  • sort_by(.project) | group_by(.project)

    • Ensures identical projects are adjacent, then buckets them into arrays.
  • Building the output object

    • projects becomes a flat array of each group’s name.
    • include is an array of { project, files } objects.
    {
      projects: map(.[0].project), # (1)!
      include:  map({ # (2)!
                  project: .[0].project,
                  files:   map(.file)
                })
    }
    
    1. projects becomes a flat array of each group’s name.
    2. include is an array of { project, files } objects.

Storing the output as a variable

matrix=$(jq -c '
  map(sub("^/home/user/projects/";""))
  | map({ project: (split("/")[0]), file: . })
  | sort_by(.project)
  | group_by(.project)
  | {
      projects: map(.[0].project),
      include:  map({ project: .[0].project, files: map(.file) })
    }
' <<<"$paths")

Which produces:

~ echo $matrix
{"projects":["dev-jkl","my-test-abc","primary-test-project-xyz"],"include":[{"project":"dev-jkl","files":["dev-jkl/foo.txt","dev-jkl/bar.log","dev-jkl/baz.log"]},{"project":"my-test-abc","files":["my-test-abc/foo.txt","my-test-abc/bar.log"]},{"project":"primary-test-project-xyz","files":["primary-test-project-xyz/foo.txt","primary-test-project-xyz/bar.log","primary-test-project-xyz/baz.txt"]}]}

or if you want it more readable issue echo $matrix | jq:

~ echo $matrix | jq
{
  "projects": [
    "dev-jkl",
    "my-test-abc",
    "primary-test-project-xyz"
  ],
  "include": [
    {
      "project": "dev-jkl",
      "files": [
        "dev-jkl/foo.txt",
        "dev-jkl/bar.log",
        "dev-jkl/baz.log"
      ]
    },
    {
      "project": "my-test-abc",
      "files": [
        "my-test-abc/foo.txt",
        "my-test-abc/bar.log"
      ]
    },
    {
      "project": "primary-test-project-xyz",
      "files": [
        "primary-test-project-xyz/foo.txt",
        "primary-test-project-xyz/bar.log",
        "primary-test-project-xyz/baz.txt"
      ]
    }
  ]
}

Other changes or improvements

The following changes or improvements can be made:

  • The sort order (e.g. alphabetical) can be updated by changing or removing sort_by(.project)
  • We can rename the project slugs by inserting a | gsub("^(my-|dev-|primary-test-project-)";"") on .project
    • This could be helpful if we need to drop prefixes
    • Would probably make sense to parse the directory name prefix to a variable like env

Validating the paths variable

If there is a situtation where the paths variable is set to a single path that is not formatted properly as an array, we can check it and fix it using jq:

Checking the paths variable:

We can evaluate the 'type' using jq:

jq -e 'type == "array"' <<<"$paths"

Fixing the paths variable:

We can evaludate the 'type' using jq:

jq -nc --arg p "$paths" '[$p]'

Putting it together:

paths='/home/user/projects/my-test-abc/foo.txt'

# 1) if it's not already a JSON array, wrap it in one
if ! jq -e 'type == "array"' <<<"$paths" >/dev/null 2>&1; then
    paths=$(jq -nc --arg p "$paths" '[$p]')
    #
    # now $paths is guaranteed to be a JSON array
    echo "Normalized paths: $paths"
    # → ["\/home\/user\/projects\/my-test-abc\/foo.txt"]
fi