The recent leak as well as it being brought to my attention by Ian from Packetlabs, made me super interested in the malware Ive been hearing so much about. The internet is always lit ablaze whenever these happen so it was pretty exciting to get the chance at getting hands on with the sample.

Published openly by a group calling themselves TeamPCP, the repository last/most recent commit message “Shai-Hulud: A Gift From TeamPCP” is an aptly Dune reference to the great sandworm. Not sure how much of a gift that is but its a gift non the less…?

With all that preamble, I mainly used static analysis here with a more minor amount of dynamic analysis done on a Linux VM Im running on my own VPS. The goal of this blogpost is to understand how this thing works, where it hurts, and what defenders can actually do about it.

What Are We Looking At?

Shai-Hulud is a weaponized npm package a TypeScript/Bun project that trys to blend in as a normal dependency while doing some of the following things in the background:

  • Harvesting credentials from your local filesystem, shell environment, CI runners, cloud providers, Kubernetes clusters, and secrets managers
  • Exfiltrating everything to an attacker-controlled C2 domain (git-tanstack.com) or falling back to GitHub itself as a covert channel
  • Using any npm tokens it finds to backdoor packages you maintain and republish them with a preinstall hook
  • Using any GitHub tokens it finds to dump your GitHub Actions secrets via a disguised workflow(called Run Copilot)
  • Planting autorun hooks in .vscode/tasks.json and .claude/settings.json for persistence across repository clones
  • Installing a deadman switch on your system that executes rm -rf ~/ if you revoke its GitHub token without first killing the monitor

That last one deserves a re-read because its a truly mind blowing to include. if the malware installs itself and you revoke the token it stole without first shutting down the persistence service, it nukes your home directory. I will expand on this later on but its worth a call out for how nasty it is.

The Entry Point: How It Gets In

The primary infection vector is an npm package with a preinstall lifecycle script. When any developer or CI runner executes npm install on an infected package, the loader fires before the package itself is ready to use.

The preinstall hook is simple:

preinstall: "node setup.mjs"

The setup.mjs file, which the malware writes into the victim package, downloads Bun 1.3.13 from the official GitHub releases page if Bun is not present, then executes the malware’s main bundle. This is deliberately using a legitimate binary download path to avoid static detections that look for base64-encoded payloads or custom loaders.

There are also two alternative loaders embedded in the source, a Bash loader (BASH_LOADER.sh) and a Python loader (PYTHON_LOADER.py) for environments where one delivery method may not work.

Reference Note: The Bash loader has a logic quirk where it exits immediately if Bun is already installed, and config.mjs returns early if bun is on PATH. This appears to me at least to be a potential bug/mistake within the code. I havent tested it however.

Smart? Evasion Before Collection Starts

Before doing anything happens that is impactful on the victim, the infection runs a preflight() function that does a bunch of steps to control how/what the malware will do.

  1. The first thing preflight() does is check whether it’s running inside GitHub Actions against a specific repository (/opensearch-js) with a specific workflow (release-drafter.yml). If that exact context is detected, it skips everything else and immediately attempts to publish a backdoored version of @opensearch-project/opensearch using GitHub’s OIDC trusted publishing mechanism and then exits. No clue what this means, as if its targeted specifically to that as per the released sample or if it means its incorporated a precision strike within its worming behavior.

  2. If the system locale or any of LC_ALL, LC_MESSAGES, LANGUAGE, or LANG starts with ru, the process exits cleanly. It does not proceed with any additional activity, exploitative or reconnaissance. This does indicate that teamPCP or the authors of the malware are running out of Russia

  3. The malware checks for around 30 CI/CD environment variables; covering GitHub Actions, GitLab, Travis CI, CircleCI, Jenkins, Azure Pipelines, AWS CodeBuild, Buildkite, Netlify, Vercel, Cloudflare Pages etc. If it’s running in CI, it proceeds directly to collection. If it’s on a developer workstation then it goes to daemonizes first.

  4. On non-CI systems, the process respawns itself as a detached child process with __DAEMONIZED=1 set in the environment, ignores stdio, and exits the parent. This means a developer running npm install in a terminal sees nothing suspicious and so the malware continues running silently in the background

  5. The signal handlers are registered via obfuscated strings (scramble("SIGINT"), scramble("SIGTERM")) and are explicitly no-op functions (() => {})

Credential Collection: House-Harkonnen Level Looting

I like the name of this title, I thought it was a good play on words xD

Having completed the initial infection start steps we noted above, Shai-Hulud begins collecting information/artifacts on the victim in three stages.

Step 1

Three collectors run immediately/in parallel, filesystem, shell, and GitHub Actions runner memory. Those are selected because their results are needed not just for exfiltration, but to decide how to propagate and exfiltrate. Lets look forward as to what that means below.

The malware scans OS-specific hotspot lists up to 5 MB per file. On Linux, this includes:

  • npm: .npmrc, ~/.npmrc, yarn config
  • Git: .git/config, .git-credentials, ~/.gitconfig
  • Cloud: AWS credentials, Azure token caches, GCP credential DBs, Terraform credentials
  • Kubernetes: ~/.kube/config, service account tokens
  • SSH: private keys, config, authorized_keys, known_hosts
  • Shell history: .bash_history, .zsh_history, .python_history, .node_repl_history, .viminfo
  • Application data: Docker config, Slack cookies, Discord LevelDB, Telegram, Signal, FileZilla, VPN profiles
  • Crypto wallets: Bitcoin, Litecoin, Dogecoin, Zcash, Monero, Ethereum keystores, Exodus, Electrum
  • Development tooling: .claude.json, .claude/mcp.json, .kiro/settings/mcp.json
  • Web configs: .env, .env.local, .env.production, database.yml, wp-config.php

For all the files being looked for, they are defined in src/filesystem/filresystem.ts GitHub PATs and npm tokens are extracted from these files using regex patterns (ghp_/gho_ followed by 36 alphanumeric chars, npm_ followed by 36+).

Shell. Runs gh auth token via execSync and captures the output. Also captures the entire process.env object, which is every environment variable present at install time is exfiltrated.

GitHub Actions runner memory. If GITHUB_ACTIONS=true and RUNNER_OS=Linux, the malware pipes an embedded Python script into sudo python3, which reads /proc/<Runner.Worker pid>/mem directly, strips null bytes, and greps for serialized secret objects matching:

"[^"]+":\{"value":"[^"]*","isSecret":true\}

This bypasses GitHub’s log masking, since masked secrets don’t appear in logs, but they do live in the runner process’s memory, and this scrapes them directly. The only secret it skips is github_token itself; everything else gets taken up.

Step 2

After stage one, the infections collection expands to:

  1. AWS: Credential resolution attempts environment variables (with a bug in AWS_ACCESS_KEY_ID] with a random bracket, meaning this path quietly fails) in src/providers/aws/credintials.ts, a web identity token file, ECS container credentials from the metadata endpoint (169.254.170.2), EC2 IMDSv2 from 169.254.169.254, and AWS profiles. Valid credentials are then used to call STS GetCallerIdentity, enumerate SSM parameters with decryption, and list/retrieve Secrets Manager secrets

  2. Kubernetes:Collects the in-cluster service account token or kubeconfig token, enumerates namespaces, and retrieves all secrets from accessible namespaces (excluding common system namespaces). Secret data is decoded from base64. 10+ regex patterns are ran to extract GitHub tokens, npm tokens, Kubernetes JWTs, AWS keys, GCP credentials, Azure secrets, database connection strings, Stripe keys, Slack tokens, Twilio keys, private keys, Docker auth, and more.

  3. HashiCorp Vault: Tries to authenticate using VAULT_TOKEN, VAULT_AUTH_TOKEN, VAULT_API_TOKEN, token files at a dozen paths, Kubernetes auth using the service account JWT, and AWS auth. Defaults to http://127.0.0.1:8200 unless VAULT_ADDR is set. Enumerates KV mounts and also probes common mount names (secret, kv, cubbyhole, secret-v2). These findings can be seen in src/providers/vault/vault-secret.ts around

The IoCs, and regex patterns I have referenced here that the infection looks for have been documented elsewhere, on my blog in the following link: https://www.abdulmhsblog.com/cheatsheets/shaihulud-iocs/ This includes a lot of the IoCs as well as a quick checklist of things one can do/have a look at when considering containment.

Exfiltration: Three Fallback Channels

Shai-Hulud is pretty nifty in that its using more than one method of getting its loot back to its owner as well as propagating/continuing on itself down the supply chain.

Channel 1: Primary C2

All collected data is gzip-compressed, encrypted with a random AES-256-GCM key, and then that key is encrypted with an RSA public key embedded in the binary. The resulting envelope is POSTed to:

https://git-tanstack.com:443/router

The domain performs a healthcheck: a GET that expects HTTP 400 or 404 back (i.e., the C2 is “healthy” if it returns a client error). To me at least, I interpret this as an deliberate attempt to look like legitimate traffic from a WAF or firewall perspective.

Channel 2: Signed Commit Domain Discovery

If the primary C2 is unreachable, the malware searches GitHub’s public commit API for commits containing the string thebeautifulmarchoftime (note: the signature parser looks for thebeautifulsnadsoftime, these differ, which is either an intentional confusion layer to throw defenders off or a bug). A valid signed message provides a fallback C2 domain verified against an embedded verify_key.pub, an RSA public key.

Channel 3: GitHub as the Exfiltration Channel

If both of the above fail and the malware has a valid GitHub token (either found locally or fetched from the public commit broker string IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner), it creates a new public GitHub repository with:

  • A Dune-themed random name (<adjective>-<noun>-<0-999>) Note these have been referenced in my Shai-Hulud IoCs page found here
  • Description: Shai-Hulud: Here We Go Again
  • Encrypted result files committed to results/results-<timestamp>-<counter>.json

Remember that the data is encrypted with the malware operators keys so while it may be sitting in the open, the public key was used to sign it and thus its just encrypted in the open.

Propagation: How It Spreads

This is where Shai-Hulud earns its wormy name. I have tried my best to technically asses and identify how its moving/jumping but I did find limited success in dynamically running it to find all these elements and so alot were derived from looking at the source code.

npm Package Backdooring

The infection starts off by refering to the npm registry with any npm tokens that it finds. The worm appears to be looking specifically for automation tokens with bypass_2fa: true and package:write permissions.

Obviously the reason for this is that these tokens are what allow it to become a worm as then its able to modify an npm package programmatically as it doesn’t have 2fa access.

If the worm finds a npm token with the matching permissions, it then does the following:

  1. Downloads the latest tarball of every package the token can publish
  2. Extracts each tarball
  3. Injects the malware bundle as opensearch_init.js
  4. Writes setup.mjs (the Bun downloader/loader)
  5. Replaces all package scripts with just preinstall: "node setup.mjs"
  6. Bumps the patch version by 1
  7. Repacks and republishes to npm

While these are done, something additional that I suspect is happening is that TLS certificate verification is disabled for the publish request. The User-Agent is spoofed to npm/11.13.1 node/v24.10.0 to blend in with legitimate npm client traffic.

This then ultimately how every other developer that downloads and installs the compromised package end up becoming the next node in the graph of compromise: kick starting the process from point 1 of this blog and on repeat.

GitHub Actions Secret Dumping

If a discovered/looted GitHub token has workflow scope, the worm creates a branch named dependabot/github_actions/format/setup-formatter (which appears to be done to look like automated Dependabot activity) in writable repositories, commits a workflow file at .github/workflows/codeql_analysis.yml using the committer identity github-advanced-security[bot], and waits for GitHub to execute it.

The embedded workflow, named “Run Copilot” to blend in, which then writes ${{ toJSON(secrets) }} to a file and uploads it as an artifact named format-results. The malware then downloads the artifact, parses the secrets, and deletes both the workflow run and the branch in an attempt to clean up.

Repository Autorun Backdoor

Using GitHub tokens with ghs_* form (short-lived GitHub Actions tokens), the malware commits to up to 50 recently active branches (excluding dependabot/** and copilot/**) in accessible repositories. The files it plants:

  • .vscode/tasks.json; a VS Code task with "runOn": "folderOpen" that executes node .claude/setup.mjs the next time any developer opens the project
  • .claude/settings.json; a Claude Code session hook that runs node .vscode/setup.mjs on SessionStart
  • .claude/setup.mjs and .vscode/setup.mjs; the Bun loader

The commit message is chore: update dependencies with a co-author trailer of Co-authored-by: claude <claude@users.noreply.github.com>. Its design is overall pretty smart, and shows the marks of professionals. The caveat is though that this still sticks out if your company isn’t using Claude or allows anyone to have their AI appear in commits. This can be a pretty strong way to stop if something unexpected such as this emerges.

The Deadman Switch: The most nasty thing about it

DEADMAN_SWITCH.sh is the piece that is pretty House of Horkonnen coded about the infection

When the malware exfiltrates data via a stolen GitHub token and that token belongs to a solo user (no organization), it installs a user-level background service that runs every 60 seconds for 24 hours. The service:

  1. Calls https://api.github.com/user with the stolen token
  2. If GitHub returns HTTP 40x (token revoked, account suspended, etc.), it executes the stored handler

The stored handler is rm -rf ~/.

This creates a sort of mine for incident responders. The first response to discovering a compromised GitHub token is to revoke it. If this deadman monitor is running when you do that, and your machine is online then it ends up reacting by deleting your whole home directory

ACCORDING TO MY OPINON The correct containment order is:

  1. Disconnect the machine from the network
  2. Check for and remove the monitor service (paths listed in the IoC file)
  3. Stop and disable the service
  4. Then revoke the token from a different machine

The 4th step isn’t necessary per say but so long as we are smart about killing the process/persistence, disconnecting the infected devices host and then removing the token then the blast radius should be reduced significantly.

Obfuscation: Making Triage Hard

No good malware infection is without its obstacles! The code in the repo isnt what directly gets deployed on the victims computers. Instead the following things are done to obfuscate and complicate DF/IR responses:

  • A scramble() wrapper obfuscates string literals throughout the source
  • Build scripts replace environment variable references (process.env.FOO) with runtime-decoded equivalents
  • A per-build passphrase is used to decode strings at runtime
  • All assets (loaders, workflow YAML, settings files, public keys, the deadman script) are AES-256-GCM encrypted and gzip-compressed into a generated TypeScript file (src/generated/index.ts)
  • javascript-obfuscator applies control-flow flattening and base64 string array encoding to the final bundle
  • Log calls can be stripped entirely if isSilent is set to true

Since people like to see it, and I see it often on the nicer/more professional blogs; I recruited the help of my trusty AI to help me take my blog and see how well it can map it against the malware.

Bugs, Quirks, and Friends

A few observations from analyzing this were rather surprising. Its a fact that no software is without its bugs and flows(otherwise I would be out of a job), and this is no exception to that.

Some of these may be flaws of my analysis process, my lack of knowledge or deep foresight but here’s a few things I thought appeared to be off:

Worm related filenames mismatch: The npm mutator injects the malware bundle as opensearch_init.js, but the config.mjs loader it writes expects to execute ai_init.js. Unless a build process renames the bundle outside of what’s visible here, the preinstall execution chain may silently fail in the packaged version.

My reasoning for this follows:

// src/utils/config.ts:1
export const SCRIPT_NAME = scramble("opensearch_init.js");
// src/mutator/npm/index.ts:59,65,68,71
copyFileSync(Bun.main, path.join(tmpDir, "package", SCRIPT_NAME));
const pkgSetupPath = path.join(tmpDir, "package", scramble("setup.mjs"));
pkg.scripts.preinstall = scramble("node setup.mjs");
await Bun.write(pkgSetupPath, config);

So the npm mutator copies the current malware entry file into the victim package as opensearch_init.js , then writes setup.mjs from config.

But that config asset says:

// src/assets/config.mjs:12
const E = "ai_init.js";

and later:

// src/assets/config.mjs:187,195
const ep = path.join(D, E);
execFileSync(bp, [ep], { stdio: "inherit", cwd: D });

Note that there are also additional loader/executor paths. The GitHub branch mutator also writes .claude/opensearch_init.js plus .claude/setup.mjs/.vscode/setup.mjs; the OIDC path injects an optional dependency instead of the same preinstall copy path. A campaign could succeed through a variant or path not represented exactly by this repo

AWS credential mispelling: The environment variable credential source reads process.env["AWS_ACCESS_KEY_ID]"] with an extra closing bracket. This means AWS credentials from the environment won’t be captured by this specific path. Other AWS paths (profiles, IMDS, ECS) are unaffected.

C2 fallback search/verify mismatch: The search query for the signed fallback domain is thebeautifulmarchoftime but the verification regex looks for thebeautifulsnadsoftime. These are different strings, so the fallback domain discovery is likely broken in this specific version of the malware but it could also be doing two separate things that I have conflated with each other?

npm based propogation doesnt happen on windows:NpmClient.execute() gates on darwin/linux. Windows developer machines won’t be used as npm propagation vectors. This is helpful to note to focus response efforts

Final Thoughts

Shai-Hulud is a well-designed, worm. As expected, its been wrecking havoc across the industry and its most certainly been beyond potent. I hope that the analysis covered in this post provides defenders and incident responders with sufficiently deep enough detail to allow them to appropriately counter the threat when it appears

For red and purple teamers; this might give you some ideas on things to do on your next engagement to test your clients CI/CD pipelines ; )

Defensive/DevSec Ops Advice

Response When Potential Compromise is suspected

Before anything, do not revoke GitHub tokens from a potentially infected machine. Check for the deadman monitor first since if thats there, you will have your file system nuked.

Look for these files/services:

  • ~/.config/gh-token-monitor/
  • ~/.local/bin/gh-token-monitor.sh
  • ~/Library/LaunchAgents/com.user.gh-token-monitor.plist
  • ~/.config/systemd/user/gh-token-monitor.service

If any of these are present, my advice would be: disconnect network, disable/remove the service, then revoke tokens from a clean machine.

After containing the threat or potentially in parallel with contain efforts, the following elements should be tracked down and checked/sanitized:

  • npm packages you maintain for unexpected preinstall scripts, new setup.mjs files, or patch-only version bumps with no legitimate code changes
  • GitHub repositories for branch dependabot/github_actions/format/setup-formatter, workflow .github/workflows/codeql_analysis.yml, and artifacts named format-results
  • Your GitHub account for newly created public repos with Dune-themed names or the description Shai-Hulud: Here We Go Again
  • Rotate npm, GitHub, AWS, Kubernetes, Vault, GCP, Azure, SSH, Docker, Slack, Stripe, Twilio, and database credentials

General Developer Workstation Hardening Stuff

  • Use npm ci --ignore-scripts in CI pipelines where lifecycle scripts aren’t required
  • Use isolated containers for installing and auditing untrusted packages
  • Users should avoid storing long-lived PATs locally; prefer short-lived, device-bound tokens
  • Where possible, minimize GitHub token scopes , workflow scope is particularly important as its abuse use cases are dangerous
  • Keep cloud credentials short-lived; require MFA/session brokering
  • Egress allowlist for package install environments with unknown POSTs to arbitrary domains should be blocked where possible.
  • Disable automatic VS Code task execution from untrusted repositories. Review .vscode/tasks.json before opening any project
  • Treat .claude/settings.json and similar AI tool hook configs as executable code

npm Advice and Registry Controls

  • Monitor for packages that suddenly gain a preinstall script, especially alongside a patch-only version bump. Note this is a bit off a how you go and so should be taken with a grain of salt
  • Flag packages adding small loader files that download external binaries (specifically Bun from oven-sh/bun/releases)
  • Restrict automation tokens to individual packages; eliminate bypass_2fa tokens where possible
  • Do not treat Sigstore provenance as sufficient if the trusted publishing workflow can itself be modified

General GitHub Organization Controls

  • Set default GITHUB_TOKEN permissions to read-only, which then require explicit per-workflow elevation
  • Protect release and default branches; require signed commits and then restrict who can create workflow files
  • Alert on toJSON(secrets) combined with upload-artifact in any workflow being newly changed/registered
  • Monitor commit messages for IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner
  • Alert on public repo creation via tokens, especially repos with unusual names and results/*.json files

Some Cloud and CI Specific Advice

  • Keep cloud admin credentials out of developer environments that install npm packages
  • Set EC2 IMDSv2 hop limits to 1 for build containers; restrict IMDS access
  • Use projected short-lived Kubernetes service account tokens; apply tight RBAC
  • Restrict Vault token policies; require short TTLs and monitor broad KV listing
  • Scope GitHub Actions secrets to environments with protection rules, with that don’t make secrets broadly available to all workflows

Ive seen a lot of fancy blog posts of this nature do this and quite frankly Im not the greatest and so I just feed my blogpost into AI and had it blurt this out:

ATT&CK Mapping

Tactic Technique Implementation
Initial Access Supply Chain Compromise (T1195.002) Malicious npm package preinstall hooks
Execution Command and Scripting Interpreter (T1059) preinstall, VS Code folder-open tasks, Claude Code session hooks, GitHub Actions workflows
Persistence Boot/Logon Autostart (T1547) macOS LaunchAgent, Linux systemd user service
Defense Evasion Obfuscated Files or Information (T1027) String scrambling, asset encryption, JS obfuscator, log stripping, daemonization
Defense Evasion Masquerading (T1036) Bot-like committer names, Dependabot-style branch names, CodeQL workflow names
Credential Access Credentials from Files (T1552.001) Filesystem hotspot scan
Credential Access Unsecured Credentials in Environment (T1552.007) Full process.env capture
Credential Access Cloud Instance Metadata API (T1552.005) EC2 IMDSv2, ECS metadata endpoint
Collection Data from Local System (T1005) Filesystem collection
Collection Data from Cloud Storage (T1530) AWS SSM, Secrets Manager, Kubernetes secrets, Vault KV
Exfiltration Exfiltration Over C2 Channel (T1041) HTTPS POST to git-tanstack.com
Exfiltration Exfiltration to Code Repository (T1567.001) GitHub public repo commits
Impact Data Destruction (T1485) rm -rf ~/ via deadman handler
Impact Resource Hijacking (T1496) npm package republication