Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reactors info isn't scraping #1121

Open
muhammad-faizan087 opened this issue Oct 18, 2024 · 7 comments
Open

Reactors info isn't scraping #1121

muhammad-faizan087 opened this issue Oct 18, 2024 · 7 comments

Comments

@muhammad-faizan087
Copy link

below is the code which will give an output file containing all the data about a particular post but i'm getting null value for reactors, i tried updating the module, setting up waits but nothing changes

import sys
import json
import time
import argparse
import logging
import facebook_scraper as fs
import requests

Set up logging for debugging

logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler()],
)

logging.debug("Starting the Facebook scraper script...")

Load custom headers from mbasicHeaders.json

headers = {}
try:
with open(
"/home/buyfans/domains/buyfans.pl/public_html/scraper2024-2/new_venv/bin/mbasicHeaders.json",
"r",
) as file:
headers = json.load(file)
logging.debug("Headers loaded successfully.")
except FileNotFoundError:
logging.error("mbasicHeaders.json file not found. Proceeding without headers.")
except json.JSONDecodeError as e:
logging.error(f"Error decoding JSON in headers: {e}")

Argument parsing

parser = argparse.ArgumentParser()
parser.add_argument("-pid", "--post-id", help="Post ID (URL)", required=True)
parser.add_argument("-f", "--output-file", help="Output file", required=True)
parser.add_argument("-c", "--cookies", help="Cookies file", required=True)
args = parser.parse_args()

logging.debug(f"Post ID (URL): {args.post_id}")
logging.debug(f"Output file: {args.output_file}")
logging.debug(f"Cookies file: {args.cookies}")

Function to handle retries in case of connection issues or failures

def fetch_post_with_retries(post_url, options, cookies, headers, retries=3, delay=5):
for attempt in range(retries):
try:
# Start scraping the Facebook post
logging.debug(f"Starting scraping for post URL: {post_url}")

        # Scrape the post using facebook_scraper
        gen = fs.get_posts(post_urls=[post_url], options=options, cookies=cookies)
        post_data = next(gen)
        # print(post_data)
        logging.debug(f"Successfully scraped data: {post_data}")
        return post_data

    except requests.exceptions.RequestException as e:
        logging.error(
            f"Error fetching post: {e}, retrying ({attempt + 1}/{retries})..."
        )
        time.sleep(delay)  # Wait before retrying
    except StopIteration:
        logging.error(f"No data found for the post URL: {post_url}")
        return None
logging.error(f"Failed to fetch the post after {retries} retries.")
return None

Options to ensure we retrieve complete data

options = {
"reactors": True, # Fetch reactors (people who reacted)
"reactions": True, # Fetch reactions data (like, love, etc.)
"comments": True, # Fetch comments
"comments_full": True, # Fetch the full comment thread
"allow_extra_requests": True, # Enable additional requests for more data (shares, etc.)
}

try:
# Fetch the post data with retries
post_data = fetch_post_with_retries(args.post_id, options, args.cookies, headers)

if post_data:
    # Open output file and write the data in JSON format
    with open(args.output_file, "w") as json_file:
        logging.debug(f"Writing data to {args.output_file}")
        json.dump(post_data, json_file, default=str, indent=2)
    logging.info(f"Post data saved to {args.output_file}")
else:
    logging.error(f"Failed to scrape the post: {args.post_id}")

except Exception as e:
logging.error(f"An unexpected error occurred: {e}")

@kbalicki
Copy link

Same problem here. Every try, always NULL next to reactors/reactions.

Has anybody succeded here?

@boboliiii
Copy link

Same for me. No reactors, no comments

@chunchiehdev
Copy link

Currently, I am only able to retrieve posts from private groups, but the comments are empty.
Are you experiencing the same issue, or are you able to successfully retrieve comments?

@boboliiii
Copy link

Working on the reactors at the moment I'll take a look at comments in a week or two

@kbalicki
Copy link

Working on the reactors at the moment I'll take a look at comments in a week or two

If you like to discuss something, email me at balkicki1981@gmail...

@bobolii
Copy link

bobolii commented Nov 27, 2024

well folks I spent two weeks getting everything working again (using mbasic), was just running diff to copy the parts to share here, but there's a major new problem - mbasic seems to be phased out. can't find any official word on this but all my accounts get warnings saying it will no longer be available then poof it's gone.

I tried my hand at scraping with the mobile version of fb, but the response is always a blank page with the fb logo - it seems to load that first then dynamically load the rest of the page (too late, and too complicated for "requests" library I'm assuming)?

any ideas how to get past this and I'd be happy to plug away at fixing the broken anchors again.

@526319491
Copy link

well folks I spent two weeks getting everything working again (using mbasic), was just running diff to copy the parts to share here, but there's a major new problem - mbasic seems to be phased out. can't find any official word on this but all my accounts get warnings saying it will no longer be available then poof it's gone.

I tried my hand at scraping with the mobile version of fb, but the response is always a blank page with the fb logo - it seems to load that first then dynamically load the rest of the page (too late, and too complicated for "requests" library I'm assuming)?

any ideas how to get past this and I'd be happy to plug away at fixing the broken anchors again.

sorry, but mbasic is gone,Do you have a better way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants