Reactors info isn't scraping #1121

muhammad-faizan087 · 2024-10-18T11:22:10Z

below is the code which will give an output file containing all the data about a particular post but i'm getting null value for reactors, i tried updating the module, setting up waits but nothing changes

import sys
import json
import time
import argparse
import logging
import facebook_scraper as fs
import requests

Set up logging for debugging

logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler()],
)

logging.debug("Starting the Facebook scraper script...")

Load custom headers from mbasicHeaders.json

headers = {}
try:
with open(
"/home/buyfans/domains/buyfans.pl/public_html/scraper2024-2/new_venv/bin/mbasicHeaders.json",
"r",
) as file:
headers = json.load(file)
logging.debug("Headers loaded successfully.")
except FileNotFoundError:
logging.error("mbasicHeaders.json file not found. Proceeding without headers.")
except json.JSONDecodeError as e:
logging.error(f"Error decoding JSON in headers: {e}")

Argument parsing

parser = argparse.ArgumentParser()
parser.add_argument("-pid", "--post-id", help="Post ID (URL)", required=True)
parser.add_argument("-f", "--output-file", help="Output file", required=True)
parser.add_argument("-c", "--cookies", help="Cookies file", required=True)
args = parser.parse_args()

logging.debug(f"Post ID (URL): {args.post_id}")
logging.debug(f"Output file: {args.output_file}")
logging.debug(f"Cookies file: {args.cookies}")

Function to handle retries in case of connection issues or failures

def fetch_post_with_retries(post_url, options, cookies, headers, retries=3, delay=5):
for attempt in range(retries):
try:
# Start scraping the Facebook post
logging.debug(f"Starting scraping for post URL: {post_url}")

        # Scrape the post using facebook_scraper
        gen = fs.get_posts(post_urls=[post_url], options=options, cookies=cookies)
        post_data = next(gen)
        # print(post_data)
        logging.debug(f"Successfully scraped data: {post_data}")
        return post_data

    except requests.exceptions.RequestException as e:
        logging.error(
            f"Error fetching post: {e}, retrying ({attempt + 1}/{retries})..."
        )
        time.sleep(delay)  # Wait before retrying
    except StopIteration:
        logging.error(f"No data found for the post URL: {post_url}")
        return None
logging.error(f"Failed to fetch the post after {retries} retries.")
return None

Options to ensure we retrieve complete data

options = {
"reactors": True, # Fetch reactors (people who reacted)
"reactions": True, # Fetch reactions data (like, love, etc.)
"comments": True, # Fetch comments
"comments_full": True, # Fetch the full comment thread
"allow_extra_requests": True, # Enable additional requests for more data (shares, etc.)
}

try:
# Fetch the post data with retries
post_data = fetch_post_with_retries(args.post_id, options, args.cookies, headers)

if post_data:
    # Open output file and write the data in JSON format
    with open(args.output_file, "w") as json_file:
        logging.debug(f"Writing data to {args.output_file}")
        json.dump(post_data, json_file, default=str, indent=2)
    logging.info(f"Post data saved to {args.output_file}")
else:
    logging.error(f"Failed to scrape the post: {args.post_id}")

except Exception as e:
logging.error(f"An unexpected error occurred: {e}")

The text was updated successfully, but these errors were encountered:

kbalicki · 2024-10-18T16:14:05Z

Same problem here. Every try, always NULL next to reactors/reactions.

Has anybody succeded here?

boboliiii · 2024-10-28T11:51:27Z

Same for me. No reactors, no comments

chunchiehdev · 2024-10-30T10:01:06Z

Currently, I am only able to retrieve posts from private groups, but the comments are empty.
Are you experiencing the same issue, or are you able to successfully retrieve comments?

boboliiii · 2024-11-08T14:00:22Z

Working on the reactors at the moment I'll take a look at comments in a week or two

kbalicki · 2024-11-12T14:20:28Z

Working on the reactors at the moment I'll take a look at comments in a week or two

If you like to discuss something, email me at balkicki1981@gmail...

bobolii · 2024-11-27T12:13:14Z

well folks I spent two weeks getting everything working again (using mbasic), was just running diff to copy the parts to share here, but there's a major new problem - mbasic seems to be phased out. can't find any official word on this but all my accounts get warnings saying it will no longer be available then poof it's gone.

I tried my hand at scraping with the mobile version of fb, but the response is always a blank page with the fb logo - it seems to load that first then dynamically load the rest of the page (too late, and too complicated for "requests" library I'm assuming)?

any ideas how to get past this and I'd be happy to plug away at fixing the broken anchors again.

526319491 · 2024-11-29T01:39:53Z

well folks I spent two weeks getting everything working again (using mbasic), was just running diff to copy the parts to share here, but there's a major new problem - mbasic seems to be phased out. can't find any official word on this but all my accounts get warnings saying it will no longer be available then poof it's gone.

I tried my hand at scraping with the mobile version of fb, but the response is always a blank page with the fb logo - it seems to load that first then dynamically load the rest of the page (too late, and too complicated for "requests" library I'm assuming)?

any ideas how to get past this and I'd be happy to plug away at fixing the broken anchors again.

sorry， but mbasic is gone,Do you have a better way

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reactors info isn't scraping #1121

Reactors info isn't scraping #1121

muhammad-faizan087 commented Oct 18, 2024

kbalicki commented Oct 18, 2024

boboliiii commented Oct 28, 2024

chunchiehdev commented Oct 30, 2024

boboliiii commented Nov 8, 2024

kbalicki commented Nov 12, 2024

bobolii commented Nov 27, 2024

526319491 commented Nov 29, 2024

Reactors info isn't scraping #1121

Reactors info isn't scraping #1121

Comments

muhammad-faizan087 commented Oct 18, 2024

Set up logging for debugging

Load custom headers from mbasicHeaders.json

Argument parsing

Function to handle retries in case of connection issues or failures

Options to ensure we retrieve complete data

kbalicki commented Oct 18, 2024

boboliiii commented Oct 28, 2024

chunchiehdev commented Oct 30, 2024

boboliiii commented Nov 8, 2024

kbalicki commented Nov 12, 2024

bobolii commented Nov 27, 2024

526319491 commented Nov 29, 2024