Skip to content

Commit e6fee97

Browse files
author
neenza
committed
Initial commit of LeetCode Scraper project
0 parents  commit e6fee97

File tree

7 files changed

+475
-0
lines changed

7 files changed

+475
-0
lines changed

.github/workflows/python-app.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Python Application
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
build:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v2
15+
- name: Set up Python
16+
uses: actions/setup-python@v2
17+
with:
18+
python-version: '3.9'
19+
- name: Install dependencies
20+
run: |
21+
python -m pip install --upgrade pip
22+
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
23+
- name: Lint with flake8
24+
run: |
25+
pip install flake8
26+
# stop the build if there are Python syntax errors or undefined names
27+
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
28+
# exit-zero treats all errors as warnings
29+
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

.gitignore

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
env/
8+
build/
9+
develop-eggs/
10+
dist/
11+
downloads/
12+
eggs/
13+
.eggs/
14+
lib/
15+
lib64/
16+
parts/
17+
sdist/
18+
var/
19+
*.egg-info/
20+
.installed.cfg
21+
*.egg
22+
23+
# Virtual Environment
24+
venv/
25+
ENV/
26+
env/
27+
28+
# IDE
29+
.idea/
30+
.vscode/
31+
*.swp
32+
*.swo
33+
34+
# Project specific
35+
problems/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 neenza
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# LeetCode Scraper
2+
3+
A Python tool to scrape problem details from LeetCode and save them in JSON format.
4+
5+
## Features
6+
7+
- Scrape LeetCode problems by slug (URL name)
8+
- Extract problem title, description, examples, and constraints
9+
- Extract hints, follow-ups, and solutions when available
10+
- Save data in structured JSON format
11+
- Get a list of available problems with filtering options
12+
13+
## Installation
14+
15+
1. Clone this repository
16+
2. Install the required dependencies:
17+
18+
```bash
19+
pip install -r requirements.txt
20+
```
21+
22+
## Usage
23+
24+
### Scrape a Specific Problem
25+
26+
```python
27+
from leetcode_scraper import LeetCodeScraper
28+
29+
scraper = LeetCodeScraper()
30+
problem_data = scraper.scrape_problem("two-sum")
31+
print(problem_data)
32+
```
33+
34+
### Scrape Multiple Problems
35+
36+
```python
37+
scraper = LeetCodeScraper()
38+
problem_list = scraper.scrape_problem_list(limit=5) # Get 5 problems
39+
40+
for problem in problem_list:
41+
print(f"Scraping: {problem['title']}")
42+
scraper.scrape_problem(problem['slug'])
43+
time.sleep(2) # Add delay between requests
44+
```
45+
46+
## Output Format
47+
48+
The scraper saves each problem as a JSON file with the following structure:
49+
50+
```json
51+
{
52+
"title": "Two Sum",
53+
"problem_id": "1",
54+
"frontend_id": "1",
55+
"difficulty": "Easy",
56+
"problem_slug": "two-sum",
57+
"topics": ["Array", "Hash Table"],
58+
"description": "Given an array of integers nums and an integer target...",
59+
"examples": [
60+
{
61+
"example_num": 1,
62+
"example_text": "Input: nums = [2,7,11,15], target = 9\nOutput: [0,1]"
63+
}
64+
],
65+
"constraints": [
66+
"2 <= nums.length <= 10^4",
67+
"-10^9 <= nums[i] <= 10^9",
68+
"-10^9 <= target <= 10^9"
69+
],
70+
"follow_ups": [
71+
"Follow-up: Can you come up with an algorithm that is less than O(n²) time complexity?"
72+
],
73+
"hints": [
74+
"A really brute force way would be to search for all possible pairs of numbers but that would be too slow.",
75+
"Try to use the fact that the array is sorted and use two pointers to speed up the search."
76+
],
77+
"code_snippets": {
78+
"python": "class Solution:\n def twoSum(self, nums: List[int], target: int) -> List[int]:\n "
79+
}
80+
}
81+
```
82+
83+
## Notes
84+
85+
- Be respectful of LeetCode's servers and avoid making too many requests in a short period
86+
- The tool adds a delay between requests to avoid being rate-limited

example_usage.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
from leetcode_scraper import LeetCodeScraper
2+
import json
3+
import time
4+
5+
def print_problem_details(problem_data):
6+
"""Print formatted problem details"""
7+
if not problem_data:
8+
print("No problem data available")
9+
return
10+
11+
print("="*80)
12+
print(f"TITLE: {problem_data.get('title')}")
13+
print(f"DIFFICULTY: {problem_data.get('difficulty')}")
14+
print("-"*80)
15+
print("DESCRIPTION:")
16+
print(problem_data.get('description', 'No description available'))
17+
print("-"*80)
18+
19+
# Print examples
20+
print("EXAMPLES:")
21+
for example in problem_data.get('examples', []):
22+
print(f"Example {example.get('example_num')}:")
23+
print(example.get('example_text'))
24+
print()
25+
26+
# Print constraints
27+
print("CONSTRAINTS:")
28+
for constraint in problem_data.get('constraints', []):
29+
print(f"- {constraint}")
30+
31+
# Print follow-ups if available
32+
follow_ups = problem_data.get('follow_ups', [])
33+
if follow_ups:
34+
print("-"*80)
35+
print("FOLLOW-UPS:")
36+
for follow_up in follow_ups:
37+
print(f"- {follow_up}")
38+
39+
# Print hints if available
40+
hints = problem_data.get('hints', [])
41+
if hints:
42+
print("-"*80)
43+
print("HINTS:")
44+
for i, hint in enumerate(hints, 1):
45+
print(f"Hint {i}: {hint}")
46+
47+
print("="*80)
48+
49+
if __name__ == "__main__":
50+
scraper = LeetCodeScraper()
51+
52+
# Example 1: Scrape a single problem
53+
print("Scraping 'set-matrix-zeroes' problem...")
54+
problem_data = scraper.scrape_problem("set-matrix-zeroes")
55+
print_problem_details(problem_data)
56+
57+
# Example 2: Get a list of problems and scrape the first 3
58+
print("\nGetting list of problems...")
59+
problem_list = scraper.scrape_problem_list(limit=3)
60+
61+
print(f"Found {len(problem_list)} problems:")
62+
for i, problem in enumerate(problem_list, 1):
63+
print(f"{i}. {problem['title']} (Difficulty: {'Easy' if problem['difficulty'] == 1 else 'Medium' if problem['difficulty'] == 2 else 'Hard'})")
64+
65+
# Uncomment to scrape all problems in the list
66+
"""
67+
print("\nScraping all problems in the list...")
68+
for problem in problem_list:
69+
print(f"Scraping {problem['title']}...")
70+
scraper.scrape_problem(problem['slug'])
71+
time.sleep(2) # Add delay between requests
72+
"""

0 commit comments

Comments
 (0)