Scanipy
Documentation

CLI Reference

Complete command-line options reference for Scanipy.

Command Usage

Run from source using python scanipy.py.

Synopsis

python scanipy.py --query QUERY [OPTIONS]

Required Arguments

Option Short Description
--query -q Code pattern to search for

Search Options

Option Short Description Default
--language -l Programming language filter None
--extension -e File extension filter None
--keywords -k Comma-separated keywords to filter by None
--additional-params Extra GitHub search parameters None
--pages -p Max pages to retrieve (max 10) 5
--search-strategy -s Search strategy: tiered or greedy tiered
--sort-by Sort by: stars or updated stars

API Service URLs

Option Description Default
--github-api-url GitHub Search API service URL http://localhost:8001 or $GITHUB_API_URL
--api-url Semgrep API service URL (required for --run-semgrep) $SEMGREP_API_URL

Authentication

Option Description Default
--github-token GitHub Personal Access Token $GITHUB_TOKEN

Output Options

Option Short Description Default
--output -o Output JSON file path repos.json
--input-file -i Load repos from JSON file (skip search) None
--verbose -v Enable verbose output False

Semgrep Options

Option Description Default
--run-semgrep Run Semgrep on top 10 repos (requires --api-url and --s3-bucket) False
--semgrep-args Additional Semgrep arguments None
--pro Use Semgrep Pro False
--rules Custom Semgrep rules path None
--api-url Semgrep API service URL (required for --run-semgrep) $SEMGREP_API_URL
--s3-bucket S3 bucket for storing results (required for --run-semgrep) None
--k8s-namespace Kubernetes namespace for jobs default
--max-parallel-jobs Maximum parallel Kubernetes jobs 10
--max-wait-time Maximum time to wait for analysis (seconds) 3600
--poll-interval Interval between status polls (seconds) 10

CodeQL Options

Option Description Default
--run-codeql Run CodeQL on top 10 repos False
--codeql-queries CodeQL query suite or path Default suite
--codeql-format CodeQL output format (sarif-latest, csv, text) sarif-latest
--codeql-output-dir Directory to save SARIF results ./codeql_results
--codeql-results-db SQLite database for storing analysis results None
--codeql-resume Resume previous CodeQL analysis False

Environment Variables

Variable Description
GITHUB_TOKEN GitHub Personal Access Token
GITHUB_API_URL GitHub Search API service URL (default: http://localhost:8001)
SEMGREP_API_URL Semgrep API service URL

Examples

Basic Search

python scanipy.py --query "pickle.loads" --language python

With Semgrep Analysis

python scanipy.py --query "extractall" --language python --run-semgrep \
  --api-url http://scanipy-api:8000 \
  --s3-bucket scanipy-results \
  --k8s-namespace scanipy \
  --max-parallel-jobs 20

With CodeQL Analysis

python scanipy.py --query "extractall" --language python --run-codeql \
  --codeql-output-dir ./results

Full Example

python scanipy.py \
  --query "subprocess" \
  --language python \
  --keywords "shell=True,user" \
  --pages 10 \
  --search-strategy tiered \
  --run-semgrep \
  --api-url http://scanipy-api:8000 \
  --s3-bucket scanipy-results \
  --verbose

Exit Codes

Code Description
0 Success
1 Error (missing token, invalid arguments, etc.)
2 Argument parsing error