Skip to content

Commit 922aa96

Browse files
authored
Merge branch 'pre/beta' into search_links_node
2 parents 8ab6032 + b481fd7 commit 922aa96

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+618
-133
lines changed

.github/workflows/release.yml

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
name: Release
2+
on:
3+
push:
4+
branches:
5+
- main
6+
- pre/*
7+
8+
jobs:
9+
build:
10+
name: Build
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Install git
14+
run: |
15+
sudo apt update
16+
sudo apt install -y git
17+
- name: Install Python Env and Poetry
18+
uses: actions/setup-python@v5
19+
with:
20+
python-version: '3.9'
21+
- run: pip install poetry
22+
- name: Install Node Env
23+
uses: actions/setup-node@v4
24+
with:
25+
node-version: 20
26+
- name: Checkout
27+
uses: actions/checkout@v4.1.1
28+
with:
29+
fetch-depth: 0
30+
persist-credentials: false
31+
- name: Build app
32+
run: |
33+
poetry install
34+
poetry build
35+
id: build_cache
36+
if: success()
37+
- name: Cache build
38+
uses: actions/cache@v2
39+
with:
40+
path: ./dist
41+
key: ${{ runner.os }}-build-${{ hashFiles('dist/**') }}
42+
if: steps.build_cache.outputs.id != ''
43+
44+
release:
45+
name: Release
46+
runs-on: ubuntu-latest
47+
needs: build
48+
environment: development
49+
if: |
50+
github.event_name == 'push' && github.ref == 'refs/heads/main' ||
51+
github.event_name == 'push' && github.ref == 'refs/heads/pre/beta' ||
52+
github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged && github.event.pull_request.base.ref == 'main' ||
53+
github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged && github.event.pull_request.base.ref == 'pre/beta'
54+
permissions:
55+
contents: write
56+
issues: write
57+
pull-requests: write
58+
id-token: write
59+
steps:
60+
- name: Checkout repo
61+
uses: actions/checkout@v4.1.1
62+
with:
63+
fetch-depth: 0
64+
persist-credentials: false
65+
- name: Semantic Release
66+
uses: cycjimmy/semantic-release-action@v4.1.0
67+
with:
68+
semantic_version: 23
69+
extra_plugins: |
70+
semantic-release-pypi@3
71+
@semantic-release/git
72+
@semantic-release/commit-analyzer@12
73+
@semantic-release/release-notes-generator@13
74+
@semantic-release/github@10
75+
@semantic-release/changelog@6
76+
conventional-changelog-conventionalcommits@7
77+
env:
78+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
79+
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,4 @@ poetry.lock
3535

3636
# lock files
3737
*.lock
38+
poetry.lock

.releaserc.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
plugins:
2+
- - "@semantic-release/commit-analyzer"
3+
- preset: conventionalcommits
4+
- - "@semantic-release/release-notes-generator"
5+
- writerOpts:
6+
commitsSort:
7+
- subject
8+
- scope
9+
preset: conventionalcommits
10+
presetConfig:
11+
types:
12+
- type: feat
13+
section: Features
14+
- type: fix
15+
section: Bug Fixes
16+
- type: chore
17+
section: chore
18+
- type: docs
19+
section: Docs
20+
- type: style
21+
hidden: true
22+
- type: refactor
23+
section: Refactor
24+
- type: perf
25+
section: Perf
26+
- type: test
27+
section: Test
28+
- type: build
29+
section: Build
30+
- type: ci
31+
section: CI
32+
- "@semantic-release/changelog"
33+
- "semantic-release-pypi"
34+
- "@semantic-release/github"
35+
- - "@semantic-release/git"
36+
- assets:
37+
- CHANGELOG.md
38+
- pyproject.toml
39+
message: |-
40+
ci(release): ${nextRelease.version} [skip ci]
41+
42+
${nextRelease.notes}
43+
branches:
44+
#child branches coming from tagged version for bugfix (1.1.x) or new features (1.x)
45+
#maintenance branch
46+
- name: "+([0-9])?(.{+([0-9]),x}).x"
47+
channel: "stable"
48+
#release a production version when merging towards main
49+
- name: "main"
50+
channel: "stable"
51+
#prerelease branch
52+
- name: "pre/beta"
53+
channel: "dev"
54+
prerelease: "beta"
55+
debug: true
56+

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
## [0.3.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.2.8...v0.3.0-beta.1) (2024-04-26)
2+
3+
4+
### Features
5+
6+
* trigger new beta release ([6f028c4](https://github.com/VinciGit00/Scrapegraph-ai/commit/6f028c499342655851044f54de2a8cc1b9b95697))
7+
8+
9+
### CI
10+
11+
* add ci workflow to manage lib release with semantic-release ([92cd040](https://github.com/VinciGit00/Scrapegraph-ai/commit/92cd040dad8ba91a22515f3845f8dbb5f6a6939c))
12+
* remove pull request trigger and fix plugin release train ([876fe66](https://github.com/VinciGit00/Scrapegraph-ai/commit/876fe668d97adef3863446836b10a3c00a2eb82d))

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ To get started with contributing, follow these steps:
2626

2727
## Contributing Guidelines
2828

29-
Please adhere to the following guidelines when contributing to AmazScraper:
29+
Please adhere to the following guidelines when contributing to ScrapeGraphAI:
3030

3131
- Follow the code style and formatting guidelines specified in the [Code Style](#code-style) section.
3232
- Make sure your changes are well-documented and include any necessary updates to the project's documentation.
@@ -61,7 +61,7 @@ If you encounter any issues or have suggestions for improvements, please open an
6161

6262
## License
6363

64-
AmazScraper is licensed under the **Apache License 2.0**. See the [LICENSE](LICENSE) file for more information.
64+
ScrapeGraphAI is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more information.
6565
By contributing to this project, you agree to license your contributions under the same license.
6666

6767
Can't wait to see your contributions! :smile:

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
[![Downloads](https://static.pepy.tech/badge/scrapegraphai)](https://pepy.tech/project/scrapegraphai)
44
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen)](https://github.com/pylint-dev/pylint)
55
[![Pylint](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/pylint.yml)
6+
[![CodeQL](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml/badge.svg)](https://github.com/VinciGit00/Scrapegraph-ai/actions/workflows/codeql.yml)
67
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
78
[![](https://dcbadge.vercel.app/api/server/gkxQDAjfeX)](https://discord.gg/gkxQDAjfeX)
89

@@ -53,12 +54,11 @@ graph_config = {
5354
"model": "ollama/mistral",
5455
"temperature": 0,
5556
"format": "json", # Ollama needs the format to be specified explicitly
56-
"base_url": "http://localhost:11434", # set Ollama URL arbitrarily
57+
"base_url": "http://localhost:11434", # set Ollama URL
5758
},
5859
"embeddings": {
5960
"model": "ollama/nomic-embed-text",
60-
"temperature": 0,
61-
"base_url": "http://localhost:11434", # set Ollama URL arbitrarily
61+
"base_url": "http://localhost:11434", # set Ollama URL
6262
}
6363
}
6464

@@ -79,7 +79,7 @@ print(result)
7979
Note: before using the local model remember to create the docker container!
8080
```text
8181
docker-compose up -d
82-
docker exec -it ollama ollama run stablelm-zephyr
82+
docker exec -it ollama ollama pull stablelm-zephyr
8383
```
8484
You can use which models avaiable on Ollama or your own model instead of stablelm-zephyr
8585
```python

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The following sections will guide you through the installation process and the u
2121
:caption: Getting Started
2222

2323
getting_started/installation
24-
getting_started/examples
24+
getting_started/examples
2525
modules/modules
2626

2727
Indices and tables
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
OPENAI_APIKEY="your openai api key"
1+
OPENAI_APIKEY="your openai key here"

examples/benchmarks/GenerateScraper/Readme.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ The time is measured in seconds
99

1010
The model runned for this benchmark is Mistral on Ollama with nomic-embed-text
1111

12+
In particular, is tested with ScriptCreatorGraph
13+
1214
| Hardware | Model | Example 1 | Example 2 |
1315
| ---------------------- | --------------------------------------- | --------- | --------- |
1416
| Macbook 14' m1 pro | Mistral on Ollama with nomic-embed-text | 30.54s | 35.76s |
15-
| Macbook m2 max | Mistral on Ollama with nomic-embed-text | | |
16-
| Macbook 14' m1 pro<br> | Llama3 on Ollama with nomic-embed-text | 27.82s | 29.986s |
17-
| Macbook m2 max<br> | Llama3 on Ollama with nomic-embed-text | | |
17+
| Macbook m2 max | Mistral on Ollama with nomic-embed-text | 18,46s | 19.59 |
18+
| Macbook 14' m1 pro<br> | Llama3 on Ollama with nomic-embed-text | 27.82s | 29.98s |
19+
| Macbook m2 max<br> | Llama3 on Ollama with nomic-embed-text | 20.83s | 12.29s |
1820

1921

2022
**Note**: the examples on Docker are not runned on other devices than the Macbook because the performance are to slow (10 times slower than Ollama).
@@ -23,17 +25,17 @@ The model runned for this benchmark is Mistral on Ollama with nomic-embed-text
2325
**URL**: https://perinim.github.io/projects
2426
**Task**: List me all the projects with their description.
2527

26-
| Name | Execution time (seconds) | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
27-
| ------------------- | ------------------------ | ------------ | ------------- | ----------------- | ------------------- | -------------- |
28-
| gpt-3.5-turbo | 24.215268 | 1892 | 1802 | 90 | 1 | 0.002883 |
29-
| gpt-4-turbo-preview | 6.614 | 1936 | 1802 | 134 | 1 | 0.02204 |
28+
| Name | Execution time | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
29+
| ------------------- | ---------------| ------------ | ------------- | ----------------- | ------------------- | -------------- |
30+
| gpt-3.5-turbo | 4.50s | 1897 | 1802 | 95 | 1 | 0.002893 |
31+
| gpt-4-turbo | 7.88s | 1920 | 1802 | 118 | 1 | 0.02156 |
3032

3133
### Example 2: Wired
3234
**URL**: https://www.wired.com
3335
**Task**: List me all the articles with their description.
3436

3537
| Name | Execution time (seconds) | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
3638
| ------------------- | ------------------------ | ------------ | ------------- | ----------------- | ------------------- | -------------- |
37-
| gpt-3.5-turbo | | | | | | |
38-
| gpt-4-turbo-preview | | | | | | |
39+
| gpt-3.5-turbo | Error (text too long) | - | - | - | - | - |
40+
| gpt-4-turbo | Error (TPM limit reach)| - | - | - | - | - |
3941

examples/benchmarks/GenerateScraper/benchmark_openai_gpt35.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
# Define the configuration for the graph
2020
# ************************************************
2121

22-
openai_key = os.getenv("GPT35_KEY")
22+
openai_key = os.getenv("OPENAI_APIKEY")
2323

2424
graph_config = {
2525
"llm": {

examples/benchmarks/GenerateScraper/benchmark_openai_gpt4.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,12 @@
1919
# Define the configuration for the graph
2020
# ************************************************
2121

22-
openai_key = os.getenv("GPT4_KEY")
22+
openai_key = os.getenv("OPENAI_APIKEY")
2323

2424
graph_config = {
2525
"llm": {
2626
"api_key": openai_key,
27-
"model": "gpt-4-turbo-preview",
27+
"model": "gpt-4-turbo-2024-04-09",
2828
},
2929
"library": "beautifoulsoup"
3030
}
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
OPENAI_APIKEY="your openai api key"
1+
OPENAI_APIKEY="your openai key here"

examples/benchmarks/SmartScraper/Readme.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,35 +5,37 @@ The two websites benchmark are:
55

66
Both are strored locally as txt file in .txt format because in this way we do not have to think about the internet connection
77

8+
In particular, is tested with SmartScraper
9+
810
| Hardware | Moodel | Example 1 | Example 2 |
911
| ------------------ | --------------------------------------- | --------- | --------- |
1012
| Macbook 14' m1 pro | Mistral on Ollama with nomic-embed-text | 11.60s | 26.61s |
1113
| Macbook m2 max | Mistral on Ollama with nomic-embed-text | 8.05s | 12.17s |
12-
| Macbook 14' m1 pro | Llama3 on Ollama with nomic-embed-text | 29.871 | 35.32 |
13-
| Macbook m2 max | Llama3 on Ollama with nomic-embed-text | | |
14+
| Macbook 14' m1 pro | Llama3 on Ollama with nomic-embed-text | 29.871s | 35.32s |
15+
| Macbook m2 max | Llama3 on Ollama with nomic-embed-text | 18.36s | 78.32s |
1416

1517

1618
**Note**: the examples on Docker are not runned on other devices than the Macbook because the performance are to slow (10 times slower than Ollama). Indeed the results are the following:
1719

1820
| Hardware | Example 1 | Example 2 |
1921
| ------------------ | --------- | --------- |
20-
| Macbook 14' m1 pro | 139.89 | Too long |
22+
| Macbook 14' m1 pro | 139.89s | Too long |
2123
# Performance on APIs services
2224
### Example 1: personal portfolio
2325
**URL**: https://perinim.github.io/projects
2426
**Task**: List me all the projects with their description.
2527

26-
| Name | Execution time (seconds) | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
27-
| ------------------- | ------------------------ | ------------ | ------------- | ----------------- | ------------------- | -------------- |
28-
| gpt-3.5-turbo | 25.22 | 445 | 272 | 173 | 1 | 0.000754 |
29-
| gpt-4-turbo-preview | 9.53 | 449 | 272 | 177 | 1 | 0.00803 |
28+
| Name | Execution time | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
29+
| ------------------- | ---------------| ------------ | ------------- | ----------------- | ------------------- | -------------- |
30+
| gpt-3.5-turbo | 5.58s | 445 | 272 | 173 | 1 | 0.000754 |
31+
| gpt-4-turbo | 9.76s | 445 | 272 | 173 | 1 | 0.00791 |
3032

3133
### Example 2: Wired
3234
**URL**: https://www.wired.com
3335
**Task**: List me all the articles with their description.
3436

3537
| Name | Execution time (seconds) | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
3638
| ------------------- | ------------------------ | ------------ | ------------- | ----------------- | ------------------- | -------------- |
37-
| gpt-3.5-turbo | 25.89 | 445 | 272 | 173 | 1 | 0.000754 |
38-
| gpt-4-turbo-preview | 64.70 | 3573 | 2199 | 1374 | 1 | 0.06321 |
39+
| gpt-3.5-turbo | 6.50 | 2442 | 2199 | 243 | 1 | 0.003784 |
40+
| gpt-4-turbo | 76.07 | 3521 | 2199 | 1322 | 1 | 0.06165 |
3941

examples/benchmarks/SmartScraper/benchmark_docker.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
Basic example of scraping pipeline using SmartScraper from text
33
"""
44

5-
import os
65
from scrapegraphai.graphs import SmartScraperGraph
76
from scrapegraphai.utils import prettify_exec_info
87

examples/benchmarks/SmartScraper/benchmark_openai_gpt35.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
# Define the configuration for the graph
2020
# ************************************************
2121

22-
openai_key = os.getenv("GPT35_KEY")
22+
openai_key = os.getenv("OPENAI_APIKEY")
2323

2424
graph_config = {
2525
"llm": {

examples/benchmarks/SmartScraper/benchmark_openai_gpt4.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@
2020
# Define the configuration for the graph
2121
# ************************************************
2222

23-
openai_key = os.getenv("GPT4_KEY")
23+
openai_key = os.getenv("OPENAI_APIKEY")
2424

2525
graph_config = {
2626
"llm": {
2727
"api_key": openai_key,
28-
"model": "gpt-4-turbo-preview",
28+
"model": "gpt-4-turbo",
2929
},
3030
}
3131

examples/gemini/smart_scraper_gemini.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
import os
66
from dotenv import load_dotenv
7+
from scrapegraphai.utils import prettify_exec_info
78
from scrapegraphai.graphs import SmartScraperGraph
89
load_dotenv()
910

@@ -34,3 +35,10 @@
3435

3536
result = smart_scraper_graph.run()
3637
print(result)
38+
39+
# ************************************************
40+
# Get graph execution info
41+
# ************************************************
42+
43+
graph_exec_info = smart_scraper_graph.get_execution_info()
44+
print(prettify_exec_info(graph_exec_info))

0 commit comments

Comments
 (0)