Working with monorepos

Learn strategies to best work with large monorepos.

Large monorepos are a reality for many organizations. Since monorepos can have anywhere from tens to even hundreds of packages scanning all packages in a monorepo can take significant periods of time. While the time requirements may vary based on your development team and pipeline times, in general, development teams need quick testing times to improve their productivity while security teams need full visibility across a monorepo. These two needs can conflict without performance engineering or an asynchronous scanning strategy. This documentation outlines some performance engineering and scanning strategies for large monorepos.

Asynchronous scanning strategies

When scanning a large monorepo, a common approach taken by security teams is to run an asynchronous cron job outside of a CI/CD-based environment. This is often the point of least friction but is prohibitive. With this approach, inline blocking of critical issues is not generally possible. We would be remiss not to mention this as a scanning strategy for monorepos but this approach is NOT recommended beyond a step to get initial visibility into a large monorepo.

Performance Enhancements for inline scanning strategies

The following performance enhancements may be used with Endor Labs to enable the scanning of large monorepos:

Scoping scans based on changed files

For many CI/CD systems path filters are readily available. For example, with GitHub actions, dorny path filters is a readily accessible way to establish a set of filters by a path. This is generally the most effective path to handle monorepo deployments but does require the highest level of investment in terms of human time. The human time investment is made up for by the time saved by reducing the need to scan everything on each change.

Based on the paths that change you can scope scans based on the files that have actually changed. For example, you can scan only the packages in a monorepo that are housed under the ui/ directory when this path has changed by running a scan such as endorctl scan --include=ui/ when this path has been modified.

Using a path filtering approach each team working in a monorepo would need to be responsible for the packages that they maintain, but generally, each team may be associated with one to several pre-defined directory paths.

Parallelizing scans for many packages

When scanning a large monorepo organizations can choose to regularly scan the whole monorepo based on the packages or directories they’d like to scan. Different jobs may be created that scan each directory simultaneously.

Parallelizing with scoped scans

Using scoped scans for monorepos with multiple parallel include patterns is a common performance optimization for monorepos.

Below is an example parallel GitHub action scan that can be used as a reference. In this example, the directory ui/ and backend/ are both scanned simultaneously and the results are aggregated by Endor Labs.

This approach can improve the overall scan performance across a monorepo where each directory can be scanned independently.

name: Parallel Actions
on:
  push:
    branches: [main]
jobs:
  scan-ui:
    runs-on: ubuntu-latest
    steps:
      - name: UI Endor Labs Scan
        run: endorctl scan --include=ui/
  scan-backend:
    runs-on: ubuntu-latest
    steps:
      - name: Backend Endor Labs Scan
        run: endorctl scan --include=backend/

To include or exclude a package based on its directory.

endorctl scan --include="directory/path/"

See scoping scans for more information on approaches to scoping scans.

Parallelizing across languages

For teams that work out of smaller monorepos, it is often most reasonable to parallelize scanning based on the language that is being scanned and performance optimize for individual languages based on need.

Below is an example parallel GitHub action scan that can be used as a reference. In this example, Javascript and Java are scanned at the same time and aggregated together by Endor Labs. This approach can improve the overall scan performance across a monorepo with multiple languages.

name: Parallel Actions
on:
  push:
    branches: [main]
jobs:
  scan-java:
    runs-on: ubuntu-latest
    steps:
      - name: Java Endor Labs Scan
        run: endorctl scan --languages=java
  scan-javascript:
    runs-on: ubuntu-latest
    steps:
      - name: Javascript Endor Labs Scan
        run: endorctl scan --languages=javascript,typescript

To scan a project for only packages written in typescript or javascript use the command:

endorctl scan --languages=javascript,typescript

To scan a project for only packages used for packages written in java use the command:

endorctl scan --languages=java