Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(youtube_downloader.py): Implement YoutubeDownloader class functionality #21

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
2 changes: 1 addition & 1 deletion .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ version: 1
early_access: false
enable_free_tier: true
language: en
tone_instructions: 'You are an expert Python code reviewer in an enterprise team. Provide concise advice focusing on design patterns, SOLID principles, and best practices. Suggest relevant patterns briefly. Elaborate only when asked. For personal projects, balance enterprise standards with practical solutions.'
tone_instructions: 'You are an expert Python code reviewer in an enterprise team. Provide concise advice focusing on design patterns and best practices. Suggest relevant patterns briefly. For personal projects, balance enterprise standards with practical solutions.'
reviews:
profile: chill
high_level_summary: true
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ __pycache__
.venv
poetry.lock
data/downloaded_videos/*
data/processed_videos/*
data/processed_videos/*
.env
logs/
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ python = "^3.10"
tqdm = "^4.66.5"
instaloader = "^4.13.1"
bandit = "^1.8.2"
dotenv = "^0.9.9"
google-api-python-client = "^2.161.0"

[tool.poetry.dev-dependencies]
ruff = "^0.1.0"
Expand Down
267 changes: 267 additions & 0 deletions reelfeeder/core/youtube_downloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
import concurrent.futures
import logging
import os
import subprocess
import sys
import time

from dotenv import load_dotenv
from googleapiclient.discovery import build


class YoutubeDownloader:
def __init__(
self, url: str, logname: str, output_path: str, channel_ids: list[str] = None
) -> None:
"""
Initialization of YoutubeDownloader class
Args:
url: Youtube video URL
logname: Path to log file
output_path: Path for downloaded content
channel_ids: List of channel IDs to process
Raises:
ValueError: If required parameters are invalide
EnvironmentError: If required environment variables are missing
"""
if not all([url, logname, output_path]):
raise ValueError("All parameters must be non-empty")
self.channel_ids = channel_ids or []
self.url = url
self.output_path = output_path
if not all([os.getenv("GOOGLE_API_KEY")]):
raise OSError("Missing required environment variables")
Comment on lines +67 to +68
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate all required environment variables.

The environment validation is incomplete. CHANNEL_ID is used later but not validated here.

-        if not all([os.getenv("GOOGLE_API_KEY")]):
+        if not all([os.getenv("GOOGLE_API_KEY"), os.getenv("CHANNEL_ID")]):
             raise OSError("Missing required environment variables")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not all([os.getenv("GOOGLE_API_KEY")]):
raise OSError("Missing required environment variables")
if not all([os.getenv("GOOGLE_API_KEY"), os.getenv("CHANNEL_ID")]):
raise OSError("Missing required environment variables")

self.logger = logging
self.logger.basicConfig(
filename=logname,
filemode="a",
format="%(asctime)s,%(msecs)03d %(name)s %(levelname)s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=logging.DEBUG,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid global logging configuration.

Using basicConfig affects global logging state and could interfere with other modules.

-        self.logger = logging
-        self.logger.basicConfig(
-            filename=logname,
-            filemode="a",
-            format="%(asctime)s,%(msecs)03d %(name)s %(levelname)s %(message)s",
-            datefmt="%Y-%m-%d %H:%M:%S",
-            level=logging.DEBUG,
-        )
+        self.logger = logging.getLogger(__name__)
+        if not self.logger.handlers:
+            handler = logging.FileHandler(logname)
+            handler.setFormatter(
+                logging.Formatter(
+                    "%(asctime)s,%(msecs)03d %(name)s %(levelname)s %(message)s",
+                    datefmt="%Y-%m-%d %H:%M:%S"
+                )
+            )
+            self.logger.addHandler(handler)
+            self.logger.setLevel(logging.DEBUG)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.logger = logging
self.logger.basicConfig(
filename=logname,
filemode="a",
format="%(asctime)s,%(msecs)03d %(name)s %(levelname)s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=logging.DEBUG,
)
self.logger = logging.getLogger(__name__)
if not self.logger.handlers:
handler = logging.FileHandler(logname)
handler.setFormatter(
logging.Formatter(
"%(asctime)s,%(msecs)03d %(name)s %(levelname)s %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
)
self.logger.addHandler(handler)
self.logger.setLevel(logging.DEBUG)

self.youtube = build("youtube", "v3", developerKey=os.getenv("GOOGLE_API_KEY"))

@staticmethod
def __measure_execution_time(func, *args, **kwargs):
"""
Measures the execution time of a function.

:param func: The function to measure.
:param args: Positional arguments to pass to the function.
:param kwargs: Keyword arguments to pass to the function.
:return: A tuple containing the result of the function and the time taken in
seconds.
"""
start_time = time.time()
result = func(*args, **kwargs)
end_time = time.time()
duration = end_time - start_time
return result, duration

def __run_yt_dlp_command(self, flag: str, timeout: int = 30) -> tuple[str, str]:
"""
Execute yt-dlp command with the given flag.
Args:
flag: yt-dlp command flag
timeout: Command execution timeout in seconds

Returns:
Tuple of (flag, command output)

Raises:
ValueError: If command execution fails
TimeoutError: If command execution times out
"""
if not isinstance(self.url, str) or not self.url.startswith(
("http://", "https://")
):
raise ValueError(f"Invalid URL: {self.url}")
command = ["yt-dlp", flag, self.url]
try:
result = subprocess.run(
command, capture_output=True, text=True, timeout=timeout, check=True
)
return flag, result.stdout.strip()
except subprocess.TimeoutExpired:
self.logger.error(f"Command timed out after {timeout} seconds: {command}")
raise TimeoutError(f"Command timed out: {command}") from None
except subprocess.CalledProcessError as e:
self.logger.error(
f"Command failed with exit code {e.returncode}: {command}"
)
self.logger.error(f"Error output: {e.stderr}")
raise ValueError(f"Command failed: {e.stderr}") from e

def __get_video_info(self, timeout: int = 60) -> dict:
"""
Retrieve video information using yt-dlp command concurrently.
Args:
timeout: Overall operation timeout in seconds
Returns:
Dictionary containing video information
Raises:
TimeoutError: If operation times out
"""
flags = {
"--get-id": "id",
"--get-title": "title",
"--get-duration": "duration",
"--get-thumbnail": "thumbnail",
"--get-format": "format",
"--get-url": "url",
}
video_info = {}
try:
with concurrent.futures.ThreadPoolExecutor(
max_workers=len(flags)
) as executor:
future_to_flag = {
executor.submit(self.__run_yt_dlp_command, flag): key
for flag, key in flags.items()
}

for future in concurrent.futures.as_completed(
future_to_flag, timeout=timeout
):
flag = future_to_flag[future]
try:
flag, result = future.result()
if flag in flags:
video_info[flags[flag]] = result
except Exception as exc:
print(f"{flag} generated an exception: {exc}")
Comment on lines +159 to +160
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use logger instead of print for consistency.

Replace print statement with logger for consistent error handling.

-                        print(f"{flag} generated an exception: {exc}")
+                        self.logger.error(f"{flag} generated an exception: {exc}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as exc:
print(f"{flag} generated an exception: {exc}")
except Exception as exc:
self.logger.error(f"{flag} generated an exception: {exc}")


return video_info
except Exception as exec:
self.logger.error(f"Failed to get video information: {str(exec)}")
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix variable name typo in exception handling.

The variable name exec is a typo and shadows the built-in exec function.

-        except Exception as exec:
-            self.logger.error(f"Failed to get video information: {str(exec)}")
+        except Exception as exc:
+            self.logger.error(f"Failed to get video information: {str(exc)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as exec:
self.logger.error(f"Failed to get video information: {str(exec)}")
raise
except Exception as exc:
self.logger.error(f"Failed to get video information: {str(exc)}")
raise


def get_channel_info(self, channel_id: str) -> dict:
"""
Retrieve ca channel information using the Youtube API.

Args:
channel_id: Youtube channel ID
Returns:
Dictionary containing channel information
Raises:
ValueError: If channel_id is invalid or channel not found
RuntimeError: If API request fails
"""
if not channel_id:
raise ValueError("channel_id must be non-empty")

self.logger.info(f"Fetching info for channel : {channel_id}")
try:
request = self.youtube.channels().list(
part="snippet,statistics", id=channel_id
)
response = request.execute()

if "items" in response:
channel = response["items"][0]
info = {
"title": channel["snippet"]["title"],
"description": channel["snippet"]["description"],
"subscriber_count": channel["statistics"]["subscriberCount"],
"view_count": channel["statistics"]["viewCount"],
"video_count": channel["statistics"]["videoCount"],
"published_at": channel["snippet"]["publishedAt"],
}
self.logger.info(f"Channel Info: {info}")
return info
else:
raise ValueError(f"Channel not found: {channel_id}")
except Exception as e:
self.logger.error(f"Failed to get channel info: {str(e)}")
raise RuntimeError(f"API request failed: {str(e)}") from e

def channel_info(self) -> tuple[dict, float]:
"""
Get video and execution time information.
Returns:
Tuple of (video information dict, execution time in seconds)
Raises:
RuntimeError: If video information cannot be retrieved.
"""
try:
video_info, duration = self.__measure_execution_time(self.__get_video_info)
self.logger.info("Video Info: %s", video_info)
self.logger.info("Execution Time: %s seconds", duration)
return video_info, duration
except Exception as e:
self.logger.error("Failed to get channel info: %s", str(e))
raise RuntimeError(f"Failed to get channel info: {str(e)}") from e


def logs_dir():
"""
Resolve and create logs directory.

Returns:
Absolute path to logs directory

Raises:
OSError: If directory creation fails
"""
try:
# Use a more reliable method to get the project root
project_root = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
logs_path = os.path.join(project_root, "logs")

# Create logs directory if it doesn't exist
os.makedirs(logs_path, exist_ok=True)

return logs_path
except OSError as e:
raise OSError(f"Failed to create logs directory: {str(e)}") from e


def load_environment() -> None:
"""Load environemnt variables from .env file."""
if not load_dotenv():
logging.warning("Failed to load environment variables from .env file")


def main() -> None:
"""
Main entry point for the Youtube downloader.

Environment variables required:
- Channel ID: CHANNEL_ID
- Google API Key: GOOGLE_API_KEY
"""
try:
# Load Environment Variables
load_environment()
directory = logs_dir()
logname = os.path.join(directory, "youtube_downloader.log")

# Validate environment variables
if not all([os.getenv("CHANNEL_ID"), os.getenv("GOOGLE_API_KEY")]):
raise OSError("Missing required environment variables")
downloader = YoutubeDownloader(
url=os.getenv("YOUTUBE_URL"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate YOUTUBE_URL environment variable.

The URL is retrieved from environment but not validated. This could cause silent failures.

-            url=os.getenv("YOUTUBE_URL"),
+            url=os.getenv("YOUTUBE_URL") or "https://youtube.com/example",  # Add default URL

Also add YOUTUBE_URL to the environment validation:

-        if not all([os.getenv("CHANNEL_ID"), os.getenv("GOOGLE_API_KEY")]):
+        if not all([os.getenv("CHANNEL_ID"), os.getenv("GOOGLE_API_KEY"), os.getenv("YOUTUBE_URL")]):

Committable suggestion skipped: line range outside the PR's diff.

logname=logname,
output_path=os.path.join(directory, "downloads"),
channel_ids=[os.getenv("CHANNEL_ID")],
)

# Get channel information
video_info, duration = downloader.channel_info()
logging.info(
"Successfully retrieved video information in %.2f seconds", duration
)
logging.info("Video Info: %s", video_info)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use class logger instead of root logger.

Using the root logger here is inconsistent with the rest of the code that uses the class logger.

-        logging.info(
+        downloader.logger.info(
             "Successfully retrieved video information in %.2f seconds", duration
         )
-        logging.info("Video Info: %s", video_info)
+        downloader.logger.info("Video Info: %s", video_info)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logging.info(
"Successfully retrieved video information in %.2f seconds", duration
)
logging.info("Video Info: %s", video_info)
downloader.logger.info(
"Successfully retrieved video information in %.2f seconds", duration
)
downloader.logger.info("Video Info: %s", video_info)


except Exception as e:
logging.error("Failed to execute main: %s", str(e))
raise


if __name__ == "__main__":
try:
main()
except Exception as e:
logging.error("Application Failed: %s", str(e))
sys.exit(1)