Skip to content

coding

Updating JSON objects with JQ

Parsing entires in a JSON array

Given the following array containing directory paths:

[
    "/home/user/project/file1.txt",
    "/home/user/project/subdir/file2.log"
]

Now stored as the paths variable:

paths='[
    "/home/user/project/file1.txt",
    "/home/user/project/subdir/file2.log"
]'

Removing the prefix

Removing the prefix with gsub

We can use gsub to remove the /home/user/ directory prefix:

gsub_no_prefix=$(jq -c '.[] |= gsub("^/home/user/";"")' <<< "$paths")

Which produces:

~ echo $gsub_no_prefix
["project/file1.txt","project/subdir/file2.log"]
Removing the prefix with JQ map

We can also use JQ map to remove the /home/user/ directory prefix:

map_no_prefix=$(echo "$paths" |
  jq -c 'map(gsub("^/home/user/";""))'
)

Which produces:

~ echo $map_no_prefix
["project/file1.txt","project/subdir/file2.log"]

Selecting entries based on a pattern

Selecting entries that match a pattern

Use select(test(...)) to create a new array containing only paths that end in .txt:

dot_txt=$(jq -c 'map(select(test("\\.txt$")))' <<< "$paths")

Which produces:

~ echo $dot_txt
["/home/user/project/file1.txt"]
Selecting entries that do not match a pattern

Use select(test(... | not)) to create a new array excluding paths that end in .log:

no_dot_log=$(jq -c 'map(select(test("\\.log$") | not))' <<< "$paths")

Which produces:

~ echo $no_dot_log
["/home/user/project/file1.txt"]

Putting it all together

We now have an updated array stored as the paths variable:

#!/usr/bin/env bash

# updated array
paths='[
    "/home/user/project/file1.txt",
    "/home/user/project/subdir/file2.log",
    "/home/user/data/foo.txt",
    "/home/user/data/bar.log",
    "/home/user/data/baz.txt"
]'

# 1) remove the '/home/user/' prefix
no_prefix=$(jq -c '.[] |= gsub("^/home/user/";"")' <<< "$paths")

# 2) keep only '.txt' files
only_txt_files=$(jq -c 'map(select(test("\\.txt$")))' <<< "$no_prefix")

# 3) exclude '.txt' files
no_txt_files=$(jq -c 'map(select(test("\\.txt$") | not))' <<< "$no_prefix")

# 4) print the results
echo "Only '.txt' files: "
echo "  ${only_txt_files}"
echo ""
echo "Exclude '.txt' files: "
echo "  ${no_txt_files}"

Which produces:

Only '.txt' files:
  ["project/file1.txt","data/foo.txt","data/baz.txt"]

Exclude '.txt' files:
  ["project/subdir/file2.log","data/bar.log"]

Updating the JSON object

Updated JSON array

Now we have a more unique set of data in our JSON array

Given the following array containing directory paths:

[
    "/home/user/projects/my-test-abc/foo.txt",
    "/home/user/projects/my-test-abc/bar.log",
    "/home/user/projects/dev-jkl/foo.txt",
    "/home/user/projects/dev-jkl/bar.log",
    "/home/user/projects/dev-jkl/baz.log",
    "/home/user/projects/primary-test-project-xyz/foo.txt",
    "/home/user/projects/primary-test-project-xyz/bar.log",
    "/home/user/projects/primary-test-project-xyz/baz.txt"
]

Now stored as the paths variable:

paths='[
    "/home/user/projects/my-test-abc/foo.txt",
    "/home/user/projects/my-test-abc/bar.log",
    "/home/user/projects/dev-jkl/foo.txt",
    "/home/user/projects/dev-jkl/bar.log",
    "/home/user/projects/dev-jkl/baz.log",
    "/home/user/projects/primary-test-project-xyz/foo.txt",
    "/home/user/projects/primary-test-project-xyz/bar.log",
    "/home/user/projects/primary-test-project-xyz/baz.txt"
]'

The planned output

I want the output to be usable by a GitHub actions matrix, so it should be:

{
    "projects": [
        "test-abc",
        "test-jkl",
        "test-xyz"
    ],
    "include": [
        {
            "project": "test-abc",
            "files": [
                "my-test-abc/foo.txt",
                "my-test-abc/bar.log"
            ]
        },
        {
            "project": "dev-jkl",
            "files": [
                "dev-jkl/foo.txt",
                "dev-jkl/bar.log",
                "dev-jkl/baz.log"
            ]
        },
        {
            "project": "primary-test-project-xyz",
            "files": [
                "primary-test-project-xyz/foo.txt",
                "primary-test-project-xyz/bar.log",
                "primary-test-project-xyz/baz.txt"
            ]
        }
    ]
}

Building the new JSON object

jq -c '
  # 1) Strip off the the '/home/user/projects' directory prefix
  map(sub("^/home/user/projects/";""))

  # 2) Turn each element into an object with 'project' and 'file' keys
  | map({ project: (split("/")[0]), file: . })

  # 3) (Optional) Sort by project so group_by will work predictably
  | sort_by(.project)

  # 4) Group into arrays by project name
  | group_by(.project)

  # 5) Build the final output object:
  | {
      projects:   map(.[0].project),                # a simple list of project names
      include:    map({
                     project: .[0].project,         # the project name
                     files:   map(.file)            # all files in that project
                 })
    }
' <<<"$paths"

Which produces:

{"projects":["dev-jkl","my-test-abc","primary-test-project-xyz"],"include":[{"project":"dev-jkl","files":["dev-jkl/foo.txt","dev-jkl/bar.log","dev-jkl/baz.log"]},{"project":"my-test-abc","files":["my-test-abc/foo.txt","my-test-abc/bar.log"]},{"project":"primary-test-project-xyz","files":["primary-test-project-xyz/foo.txt","primary-test-project-xyz/bar.log","primary-test-project-xyz/baz.txt"]}]}
About the key filters

Some information about the key filters:

  • map(sub("^/home/user/projects/";""))

    • Removes the fixed leading path so you are left with "my-test-abc/foo.txt", etc.
  • map({ project: (split("/")[0]), file: . })

    • Splits each string on / and uses the first segment as project, the whole string as file.
  • sort_by(.project) | group_by(.project)

    • Ensures identical projects are adjacent, then buckets them into arrays.
  • Building the output object

    • projects becomes a flat array of each group’s name.
    • include is an array of { project, files } objects.
    {
      projects: map(.[0].project), # (1)!
      include:  map({ # (2)!
                  project: .[0].project,
                  files:   map(.file)
                })
    }
    
    1. projects becomes a flat array of each group’s name.
    2. include is an array of { project, files } objects.

Storing the output as a variable

matrix=$(jq -c '
  map(sub("^/home/user/projects/";""))
  | map({ project: (split("/")[0]), file: . })
  | sort_by(.project)
  | group_by(.project)
  | {
      projects: map(.[0].project),
      include:  map({ project: .[0].project, files: map(.file) })
    }
' <<<"$paths")

Which produces:

~ echo $matrix
{"projects":["dev-jkl","my-test-abc","primary-test-project-xyz"],"include":[{"project":"dev-jkl","files":["dev-jkl/foo.txt","dev-jkl/bar.log","dev-jkl/baz.log"]},{"project":"my-test-abc","files":["my-test-abc/foo.txt","my-test-abc/bar.log"]},{"project":"primary-test-project-xyz","files":["primary-test-project-xyz/foo.txt","primary-test-project-xyz/bar.log","primary-test-project-xyz/baz.txt"]}]}

or if you want it more readable issue echo $matrix | jq:

~ echo $matrix | jq
{
  "projects": [
    "dev-jkl",
    "my-test-abc",
    "primary-test-project-xyz"
  ],
  "include": [
    {
      "project": "dev-jkl",
      "files": [
        "dev-jkl/foo.txt",
        "dev-jkl/bar.log",
        "dev-jkl/baz.log"
      ]
    },
    {
      "project": "my-test-abc",
      "files": [
        "my-test-abc/foo.txt",
        "my-test-abc/bar.log"
      ]
    },
    {
      "project": "primary-test-project-xyz",
      "files": [
        "primary-test-project-xyz/foo.txt",
        "primary-test-project-xyz/bar.log",
        "primary-test-project-xyz/baz.txt"
      ]
    }
  ]
}

Other changes or improvements

The following changes or improvements can be made:

  • The sort order (e.g. alphabetical) can be updated by changing or removing sort_by(.project)
  • We can rename the project slugs by inserting a | gsub("^(my-|dev-|primary-test-project-)";"") on .project
    • This could be helpful if we need to drop prefixes
    • Would probably make sense to parse the directory name prefix to a variable like env

Validating the paths variable

If there is a situtation where the paths variable is set to a single path that is not formatted properly as an array, we can check it and fix it using jq:

Checking the paths variable:

We can evaluate the 'type' using jq:

jq -e 'type == "array"' <<<"$paths"

Fixing the paths variable:

We can evaludate the 'type' using jq:

jq -nc --arg p "$paths" '[$p]'

Putting it together:

paths='/home/user/projects/my-test-abc/foo.txt'

# 1) if it's not already a JSON array, wrap it in one
if ! jq -e 'type == "array"' <<<"$paths" >/dev/null 2>&1; then
    paths=$(jq -nc --arg p "$paths" '[$p]')
    #
    # now $paths is guaranteed to be a JSON array
    echo "Normalized paths: $paths"
    # → ["\/home\/user\/projects\/my-test-abc\/foo.txt"]
fi

Multi-line strings in YAML

Content Source

This content is from this wonderful answer from stackoverflow.

There are 5 6 NINE (or 63*, depending how you count) different ways to write multi-line strings in YAML.

TL;DR

  • Use > if you want to break a string up for readability but for it to still be treated as a single-line string: interior line breaks will be stripped out, there will only be one line break at the end:
        key: >
          Your long
          string here.
  • Use | if you want those line breaks to be preserved as \n (for instance, embedded markdown with paragraphs).
        key: |
          ### Heading

          * Bullet
          * Points
  • Use >- or |- instead if you don't want a line break appended at the end.

  • Use "" if you need to split lines in the middle of words or want to literally type line breaks as \n:

        key: "Antidisestab\
         lishmentarianism.\n\nGet on it."
  • YAML is crazy.

Block scalar styles (>, |)

These allow characters such as \ and " without escaping, and add a new line (\n) to the end of your string.

> Folded style removes single newlines within the string (but adds one at the end, and converts double newlines to singles):

    Key: >
      this is my very very very
      long string

this is my very very very long string\n

Extra leading space is retained and causes extra newlines. See note below.

Advice: Use this. Usually this is what you want.

| Literal style turns every newline within the string into a literal newline, and adds one at the end:

    Key: |
      this is my very very very 
      long string

this is my very very very\nlong string\n

Here's the official definition from the YAML Spec 1.2.2

Scalar content can be written in block notation, using a literal style (indicated by “|”) where all line breaks are significant. Alternatively, they can be written with the folded style (denoted by “>”) where each line break is folded to a space unless it ends an empty or a more-indented line.

Advice: Use this for inserting formatted text (especially Markdown) as a value.

Block styles with block chomping indicator (>-, |-, >+, |+)

You can control the handling of the final new line in the string, and any trailing blank lines (\n\n) by adding a block chomping indicator character:

  • >, |: "clip": keep the line feed, remove the trailing blank lines.
  • >-, |-: "strip": remove the line feed, remove the trailing blank lines.
  • >+, |+: "keep": keep the line feed, keep trailing blank lines.

"Flow" scalar styles (, ", ')

These have limited escaping, and construct a single-line string with no new line characters. They can begin on the same line as the key, or with additional newlines first, which are stripped. Doubled newline characters become one newline.

plain style (no escaping, no # or : combinations, first character can't be ", ' or many other punctuation characters ):

    Key: this is my very very very 
      long string

Advice: Avoid. May look convenient, but you're liable to shoot yourself in the foot by accidentally using forbidden punctuation and triggering a syntax error.

double-quoted style (\ and " must be escaped by \, newlines can be inserted with a literal \n sequence, lines can be concatenated without spaces with trailing \):

    Key: "this is my very very \"very\" loooo\
      ng string.\n\nLove, YAML."

"this is my very very \"very\" loooong string.\n\nLove, YAML."

Advice: Use in very specific situations. This is the only way you can break a very long token (like a URL) across lines without adding spaces. And maybe adding newlines mid-line is conceivably useful.

single-quoted style (literal ' must be doubled, no special characters, possibly useful for expressing strings starting with double quotes):

    Key: 'this is my very very "very"
      long string, isn''t it.'

"this is my very very \"very\" long string, isn't it."

Advice: Avoid. Very few benefits, mostly inconvenience.

Block styles with indentation indicators

Just in case the above isn't enough for you, you can add a "block indentation indicator" (after your block chomping indicator, if you have one):

    - >8
            My long string
            starts over here
    - |+1
     This one
     starts here

Note: Leading spaces in Folded style (>)

If you insert extra spaces at the start of not-the-first lines in Folded style, they will be kept, with a bonus newline. (This doesn't happen with flow styles.) Section 6.5 says:

In addition, folding does not apply to line breaks surrounding text lines that contain leading white space. Note that such a more-indented line may consist only of such leading white space.

    - >
        my long
          string

        many spaces above
    - my long
          string

        many spaces above

["my long\n string\n \nmany spaces above\n","my long string\nmany spaces above"]

Summary

In this table: _ means space character, \n means "newline character" except were noted. "Leading space" refers to an additional space character on the second line, when the first is only spaces (which establishes the indent).

> | >- |- >+ |+ " '
Spaces/newlines converted to:
Trailing space → _ _ _ _ _ _
Leading space → \n_ \n_ \n_ \n_ \n_ \n_
Single newline → _ \n _ \n _ \n _ _ _
Double newline → \n \n\n \n \n\n \n \n\n \n \n \n
Final newline → \n \n \n \n
Final double newline → \n \n \n\n \n\n
How to create a literal:
Single quote ' ' ' ' ' ' ' ' ''
Double quote " " " " " " " \" "
Backslash \ \ \ \ \ \ \ \\ \
Other features
In-line newlines with literal \n 🚫 🚫 🚫 🚫 🚫 🚫 🚫 🚫
Spaceless newlines with \ 🚫 🚫 🚫 🚫 🚫 🚫 🚫 🚫
# or : in value 🚫
Can start on same
line as key 🚫 🚫 🚫 🚫 🚫 🚫

Examples

Note the trailing spaces on the line before "spaces."

    - >
      very "long"
      'string' with

      paragraph gap, \n and        
      spaces.
    - | 
      very "long"
      'string' with

      paragraph gap, \n and        
      spaces.
    - very "long"
      'string' with

      paragraph gap, \n and        
      spaces.
    - "very \"long\"
      'string' with

      paragraph gap, \n and        
      s\
      p\
      a\
      c\
      e\
      s."
    - 'very "long"
      ''string'' with

      paragraph gap, \n and        
      spaces.'
    - >- 
      very "long"
      'string' with

      paragraph gap, \n and        
      spaces.

    [
      "very \"long\" 'string' with\nparagraph gap, \\n and         spaces.\n", 
      "very \"long\"\n'string' with\n\nparagraph gap, \\n and        \nspaces.\n", 
      "very \"long\" 'string' with\nparagraph gap, \\n and spaces.", 
      "very \"long\" 'string' with\nparagraph gap, \n and spaces.", 
      "very \"long\" 'string' with\nparagraph gap, \\n and spaces.", 
      "very \"long\" 'string' with\nparagraph gap, \\n and         spaces."
    ]

*2 block styles, each with 2 possible block chomping indicators (or none), and with 9 possible indentation indicators (or none), 1 plain style and 2 quoted styles: 2 x (2 + 1) x (9 + 1) + 1 + 2 = 63

Some of this information has also been summarised here.

Encouraging git hygine with commitlint

I added a GitHub Action that uses commitlint to my GitHub Actions Monorepo. The GitHub Action is based off of commitlint (commitlint GitHub) and has been added in an effort to encourage (enforce?) good git hygiene. Note: The original actions-ci workflow was added in the v0.1.12 release.

The workflow originated from the CI setup GitHub Actions section of the commitlint guides. The example workflow needed to be updated in order to run, but it should be working now.

The default commitlint configuration:

module.exports = {
    extends: [
        "@commitlint/config-conventional"
    ],
}

Enforcing good git hygiene

Part of ensuring proper commit messages (and pull requests) will help with automating releases. For example, the semantic release tool can be used in a GitHub action, via this semantic-release-action.

Here are some other write-ups on the topic:

  • https://www.vantage-ai.com/blog/how-to-enforce-good-pull-requests-on-github
  • https://hugooodias.medium.com/the-anatomy-of-a-perfect-pull-request-567382bb6067
Resources

commitlint guide links:

Actions that can be used with commitlint:

  • https://github.com/webiny/action-conventional-commits
  • https://github.com/wagoid/commitlint-github-action
  • https://github.com/commitizen/conventional-commit-types
  • https://github.com/amannn/action-semantic-pull-request
  • (deprecated) https://github.com/squash-commit-app/squash-commit-app
  • (deprecated) https://github.com/zeke/semantic-pull-requests

Examples with a semantic.yml file within a GitHub repo:

  • https://github.com/GoogleChrome/lighthouse-ci/blob/main/.github/semantic.yml
  • https://github.com/minecrafthome/minecrafthome/blob/master/semantic.yml
  • https://github.com/meltano/sdk/blob/main/.github/semantic.yml
  • https://github.com/vectordotdev/vector/blob/master/.github/semantic.yml

Here are links to other resources:

  • https://github.blog/changelog/2022-05-11-default-to-pr-titles-for-squash-merge-commit-messages/
  • https://semantic-release.gitbook.io/semantic-release/recipes/ci-configurations/github-actions
  • https://jamiewen00.medium.com/integrate-commitlint-to-your-repository-67d6524d0d24
  • https://ajcwebdev.com/semantic-github/