Scanipy
Documentation

Usage Guide

Command Usage

Run from source using python scanipy.py.

Basic Search

Search for a code pattern across GitHub repositories:

# Search for a code pattern
python scanipy.py --query "pickle.loads"

# Search with a specific language
python scanipy.py --query "eval(" --language python

Language & Extension Filtering

# Search in Python files only
python scanipy.py --query "subprocess.call" --language python

# Search in specific file extensions
python scanipy.py --query "os.system" --extension ".py"

# Combine both
python scanipy.py --query "exec(" --language python --extension ".py"

Keyword Filtering

Filter results to only include files containing specific keywords:

# Find extractall usage that also mentions path or directory
python scanipy.py --query "extractall" --keywords "path,directory,zip"

Search Strategies

Scanipy offers two search strategies:

Strategy Description Best For
tiered (default) Searches repositories in star tiers (100k+, 50k-100k, 20k-50k, etc.) Finding popular, well-maintained code
greedy Standard GitHub search, faster but may miss high-star repos Quick searches, less popular patterns
# Use tiered search (default) - prioritizes popular repos
python scanipy.py --query "extractall" --search-strategy tiered

# Use greedy search - faster but less targeted
python scanipy.py --query "extractall" --search-strategy greedy

Sorting Results

# Sort by stars (default)
python scanipy.py --query "extractall" --sort-by stars

# Sort by recently updated
python scanipy.py --query "extractall" --sort-by updated

Pagination

Control how many pages of results to retrieve:

# Get more results (max 10 pages)
python scanipy.py --query "extractall" --pages 10

# Quick search with fewer results
python scanipy.py --query "extractall" --pages 2

Output Options

# Save results to a custom file
python scanipy.py --query "extractall" --output my_results.json

# Enable verbose output
python scanipy.py --query "extractall" --verbose

Using Saved Results

Scanipy automatically saves search results to repos.json. You can continue analysis later without re-running the search:

# First, run a search (results saved to repos.json)
python scanipy.py --query "memcpy" --language c --output repos.json

# Later, continue with analysis using saved results
python scanipy.py --query "memcpy" --input-file repos.json --run-semgrep

# Use a custom input file
python scanipy.py --query "extractall" -i my_repos.json --run-semgrep

Advanced GitHub Search

Use additional GitHub search qualifiers:

# Search with GitHub search qualifiers
python scanipy.py --query "eval(" --additional-params "stars:>1000 -org:microsoft"

# Combine multiple filters
python scanipy.py \
  --query "subprocess" \
  --language python \
  --keywords "shell=True,user" \
  --pages 10 \
  --search-strategy tiered