Introduction
As someone passionate about English education and blogging, I often found myself overwhelmed by the manual work required to turn YouTube videos into meaningful learning content. Extracting subtitles, generating quizzes, formatting HTML, and posting to WordPress—each step took time, effort, and was prone to error.
So, I decided to build a fully automated pipeline in Python that does everything for me.
The result? I now drop a YouTube URL into a Google Sheet, and the system does the rest: fetches the transcript, generates quiz content using OpenAI, formats it into blog-ready HTML, and publishes it to my WordPress blog—all automatically.
Goals of the Project
This project was born out of both necessity and curiosity. My primary objectives were:
- Automatically extract English subtitles from YouTube videos
- Use OpenAI to generate listening quizzes and vocabulary cards
- Format the content as styled HTML for WordPress
- Post it via the WordPress REST API
- Track everything in Google Sheets for record-keeping and troubleshooting
Key Technologies Used
- Python 3.11: Core scripting language
- YouTube Transcript API: For fetching video transcripts
- OpenAI GPT (gpt-3.5-turbo): For quiz and vocabulary generation
- Google Sheets API (via gspread): For metadata management and automation triggers
- WordPress REST API: For posting HTML to my blog
- dotenv: For environment variable and credential management
Major Challenges and How I Solved Them
🚧 Unicode File Path Errors on Windows
Problem: Windows file paths like C:\Users\username\...
caused unicodeescape
errors in Python.
Solution: Replaced all backslashes with forward slashes (/
) or escaped them (\\
).
❌ Google Sheets API 403 Error
Problem: My script failed with a 403 error because the Sheets API wasn’t enabled or the service account lacked permission.
Solution: Enabled Sheets API in Google Cloud Console and shared the sheet with the service account email.
❌ OpenAI NameError
Problem: I used OpenAI(...)
without importing the correct module.
Solution: Switched to import openai
and set the API key via openai.api_key = ...
. Much cleaner.
❌ Missing Configuration Files
Problem: post_settings.json
and .env
were missing or mislocated.
Solution: Created a template for post_settings.json
and ensured .env
was read correctly using load_dotenv()
.
❌ Mixing Colab and Local Paths
Problem: Original code used Colab-specific paths like /content/drive/...
, which broke in local Windows environment.
Solution: Refactored all path settings to use os.path.join()
and a defined base_dir
.
Sample Output
Once a video is processed, the system outputs an HTML page with:
- A YouTube embed
- An introductory paragraph
- A set of multiple-choice listening quiz questions
- Vocabulary cards styled as flip boxes with audio buttons
Everything is posted to WordPress via API and logged with a timestamp in my Google Sheet.
Lessons Learned
- Automating boring tasks makes creative work so much more enjoyable.
- Most problems are either path-related or authentication-related.
- The key to API-heavy workflows is good logging and modular functions.
- Avoid hardcoding anything: use
.env
, use JSON configs, and log everything.
What’s Next
- Add support for GPT-4 for higher quiz quality
- Introduce keyword tagging and category classification using AI
- Integrate Discord or LINE notification system on publish
- Schedule the script to run daily using Windows Task Scheduler
Conclusion
After many hours of debugging, trial-and-error, and a few “aha!” moments, I now have a robust, local Python system that can:
- Take a video URL from a spreadsheet
- Generate English quiz content using GPT
- Format it into beautiful HTML
- Post it automatically to my WordPress blog
This project has not only boosted my blog productivity but also taught me valuable lessons in working with APIs, automation, and error handling. I hope it inspires others to automate their own workflows—because if you can teach a computer to do it, why not?
Thanks for reading!
Write something…