# What is this?
I built this tool as a workaround for Instagram's strong anti-bot measures. It turns screen recordings of Instagram comments into structured data by extracting frames from videos, applying OCR to the images, and filtering the results into clean text. The system processes the video in chunks, identifies the comment regions, and compiles everything into a CSV. You can use this to analyze trends/narratives, collect feedback on products, or gather training data for sentiment analysis models.
This is more of a proof-of-concept vs. a tool that would be used in production. The idea is to sidestep Instagram's anti-bot technology to convert engagement into digestible data.
# Code
```python
import cv2
import pytesseract
import re
import argparse
import os
import csv
from PIL import Image
from datetime import datetime
import numpy as np
def filter_line(line):
line_stripped = line.strip().lower()
if not line_stripped:
return False
if line_stripped.startswith("view ") or line_stripped == "reply":
return False
if re.match(r'^\d+