This project involves processing and analyzing text data from articles. Below is a guide on how to run each file and the sequence in which they should be executed.
- scrape_articles.py: This script is used to scrape articles from the web and save them in
articles.txt
. - stopwords_processing.py: This file processes stopwords to clean the text data.
- cleaned_articles.txt: This file contains the cleaned version of the articles after stopwords processing.
- text_analysis.py: This script performs text analysis on the cleaned articles.
- check_files.py: This script checks the integrity and correctness of the files generated.
- create_input_csv.py: This script creates an input CSV file (
Input.csv
) from the cleaned articles. - csv_operations.py: This script performs various operations on the CSV file.
- convert_excel.py: This script converts the CSV file (
Input.csv
) into an Excel file (Input.xlsx
).
-
Scrape Articles: Run
scrape_articles.py
to gather articles and save them inarticles.txt
.python scrape_articles.py
-
Process Stopwords: Run the stopwords processing script to clean the articles.
python stopwords_processing.py
-
Text Analysis: Perform text analysis on the cleaned articles using
text_analysis.py
.python text_analysis.py
-
Check Files: Verify the integrity of the generated files using
check_files.py
.python check_files.py
-
Create Input CSV: Generate the input CSV file from the cleaned articles using
create_input_csv.py
.python create_input_csv.py
-
CSV Operations: Perform any necessary operations on the CSV file using
csv_operations.py
.python csv_operations.py
-
Convert to Excel: Convert the CSV file to an Excel file using
convert_excel.py
.python convert_excel.py
- Ensure that all required dependencies are installed before running the scripts.
- Modify the scripts as needed to suit your specific requirements.
- If any script fails, check the error message and debug accordingly.
- Make sure the input files exist before running the scripts dependent on them.