CodeQL Integration
Scanipy can run CodeQL semantic analysis on the top 10 repositories. CodeQL provides deep semantic security scanning using GitHub's code analysis engine.
Command Usage
Run from source using python scanipy.py.
Prerequisites
Install the CodeQL CLI before using this feature:
- Download from GitHub Releases
- Extract and add to your PATH
# Download and extract (Linux)
wget https://github.com/github/codeql-cli-binaries/releases/latest/download/codeql-linux64.zip
unzip codeql-linux64.zip
# Add to PATH
export PATH="$PWD/codeql:$PATH"
# Verify installation
codeql --version
For detailed instructions, see the CodeQL CLI documentation.
Basic Usage
CodeQL requires a language to be specified:
python scanipy.py --query "extractall" --language python --run-codeql
Supported Languages
| Language | CodeQL Identifier |
|---|---|
| Python | python |
| JavaScript | javascript |
| TypeScript | javascript (uses JS extractor) |
| Java | java |
| Kotlin | java (uses Java extractor) |
| C | cpp |
| C++ | cpp |
| C# | csharp |
| Go | go |
| Ruby | ruby |
| Swift | swift |
Custom Query Suites
# Use a different query suite
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-queries "python-security-extended"
# Run a specific query for faster analysis
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-queries "codeql/python-queries:Security/CWE-022/TarSlip.ql"
Output Formats
# SARIF format (default)
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-format sarif-latest
# CSV format
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-format csv
# Text format
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-format text
Saving SARIF Results
Save SARIF results to files for later analysis:
# Save to default directory (./codeql_results)
python scanipy.py --query "extractall" --language python --run-codeql
# Save to custom directory
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-output-dir ./my_sarif_results
SARIF files are saved with timestamped filenames:
my_sarif_results/
├── owner_repo1_20251229_120000.sarif
├── owner_repo2_20251229_120100.sarif
└── ...
Resume Capability
CodeQL analysis can be interrupted and resumed from where it left off. This is useful for long-running analyses that may be interrupted by network issues, Ctrl+C, or system restarts.
Basic Resume
# Start analysis with a results database
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-results-db codeql_analysis.db
# If interrupted, resume from the same session
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-results-db codeql_analysis.db --codeql-resume
How It Works
- Session Tracking: Each analysis run creates a session tracked by query, language, and query suite
- Incremental Saves: Results are saved to SQLite after analyzing each repository
- Smart Resume: Already-analyzed repositories are automatically skipped
- Survives Interruptions: Analysis can survive Ctrl+C, network errors, or system crashes
Example Workflow
# Day 1: Start analyzing 100 repositories
python scanipy.py --query "path traversal" --language python --run-codeql \
--codeql-results-db path_traversal.db --pages 10
# Analysis interrupted after 40 repos...
# Ctrl+C
# Day 2: Resume analysis (skips first 40 repos)
python scanipy.py --query "path traversal" --language python --run-codeql \
--codeql-results-db path_traversal.db --codeql-resume
# Continue where you left off - remaining 60 repos analyzed
Session Matching
Resume works by matching:
- Query: The search query used
- Language: The programming language
- Query Suite: The CodeQL query suite (if specified)
If any of these change, a new session is created:
# Creates session 1
python scanipy.py --query "pickle.loads" --language python --run-codeql \
--codeql-results-db analysis.db
# Creates session 2 (different query)
python scanipy.py --query "eval" --language python --run-codeql \
--codeql-results-db analysis.db
# Creates session 3 (different query suite)
python scanipy.py --query "pickle.loads" --language python --run-codeql \
--codeql-results-db analysis.db --codeql-queries "security-extended"
Viewing Results
The database stores:
- Repository names and URLs
- Success/failure status
- Error messages (for failures)
- SARIF file paths (for successes)
- Analysis timestamps
You can query the database directly using SQLite:
sqlite3 codeql_analysis.db "SELECT repo_name, success FROM codeql_results"
Best Practices
-
Use Descriptive Database Names: Name databases after the vulnerability or pattern you're searching for
--codeql-results-db sql_injection_scan.db -
Always Use Resume Flag: When continuing analysis, always specify
--codeql-resume -
Match Parameters: Ensure query, language, and query suite match the original analysis
-
Check Session Info: The tool prints session information showing how many repos were already analyzed
Performance Tips
Use Specific Queries
Running the full security suite can take a long time. For faster analysis, use specific queries:
# Full suite (slow)
python scanipy.py --query "extractall" --language python --run-codeql
# Specific query (fast)
python scanipy.py --query "extractall" --language python --run-codeql \
--codeql-queries "codeql/python-queries:Security/CWE-022/TarSlip.ql"
Limit Pages
Reduce the number of repositories to analyze:
python scanipy.py --query "extractall" --language python --run-codeql --pages 1
CodeQL Options Reference
| Option | Description | Default |
|---|---|---|
--run-codeql |
Enable CodeQL analysis | False |
--codeql-queries |
Query suite or path | Default suite |
--codeql-format |
Output format (sarif-latest, csv, text) | sarif-latest |
--codeql-output-dir |
Directory to save SARIF results | ./codeql_results |
--codeql-results-db |
Path to SQLite database for results | None |
--codeql-resume |
Resume from previous session | False |
Understanding Results
CodeQL results are displayed in a summary format:
--- CodeQL results for owner/repo ---
[ERROR] py/tarslip at src/file.py:42
This file extraction depends on a potentially untrusted source.
Total findings: 1
SARIF files contain detailed information including:
- Rule descriptions and severity
- Code locations (file, line, column)
- Code flow paths
- Remediation suggestions