Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firecrawl Tool:Expecting value: line 1 column 1 (char 0) #9337

Closed
5 tasks done
louisLiu1004 opened this issue Oct 14, 2024 · 7 comments
Closed
5 tasks done

Firecrawl Tool:Expecting value: line 1 column 1 (char 0) #9337

louisLiu1004 opened this issue Oct 14, 2024 · 7 comments
Labels
🐞 bug Something isn't working 🤔 cant-reproduce We can not reproduce it or the information is limited

Comments

@louisLiu1004
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Using Firecrawl (Self Hosted Docker) to request any website from any node.

✔️ Expected Behavior

Crawl and read the returned webpage content.

❌ Actual Behavior

image

@dosubot dosubot bot added the 🐞 bug Something isn't working label Oct 14, 2024
Copy link

dosubot bot commented Oct 14, 2024

@louisLiu1004 Hello there! I'm here to assist you with bug fixes, questions, and contributing to the project. Let's work together to resolve any issues you encounter while waiting for a human maintainer.

I found a similar discussion regarding the JSON parsing error "Expecting value: line 1 column 1 (char 0)" when using the Firecrawl tool [1].

To resolve this issue, ensure that the server response is not empty before parsing it as JSON. You can modify the _post_request and _get_request methods to check if the response content is not empty:

def _post_request(self, url, data, headers, retries=3, backoff_factor=0.5):
    for attempt in range(retries):
        response = requests.post(url, headers=headers, json=data)
        if response.status_code == 502:
            time.sleep(backoff_factor * (2**attempt))
        else:
            if response.content:  # Check if response is not empty
                return response
            else:
                raise ValueError("Empty response received")
    return response

def _get_request(self, url, headers, retries=3, backoff_factor=0.5):
    for attempt in range(retries):
        response = requests.get(url, headers=headers)
        if response.status_code == 502:
            time.sleep(backoff_factor * (2**attempt))
        else:
            if response.content:  # Check if response is not empty
                return response
            else:
                raise ValueError("Empty response received")
    return response

This should help prevent the JSON parsing error by handling cases where the server returns an empty response [2][3].

To continue talking to Dosu, mention @dosu.

@yusa-n
Copy link

yusa-n commented Oct 14, 2024

I have the same issue. Currently I use JinaReader instead at this moment.
It seems to be not reproduced in Cloud version.

This cloud be related:
#8792

@hjlarry
Copy link
Contributor

hjlarry commented Oct 15, 2024

can't reproduce, I also use a self-host firecrawl, it works well

@crazywoola crazywoola added the 🤔 cant-reproduce We can not reproduce it or the information is limited label Oct 15, 2024
@yusa-n
Copy link

yusa-n commented Oct 16, 2024

@hjlarry So any suggestions?

@hjlarry
Copy link
Contributor

hjlarry commented Oct 16, 2024

@hjlarry So any suggestions?

I think you can try the scrape tool of firecrawl first, because it is simpler than the crawl tool.
if it not works, try to debug this line to check what is the response
the firecrawl also supply admin panel http://localhost:3002/admin/@/queues, you can get more information here.

@TChengZ
Copy link

TChengZ commented Nov 4, 2024

can't reproduce, I also use a self-host firecrawl, it works well

do u self-host firecrawl in your computer or in the service, it works well when i self-hosted in my computer,but once i publish it on the service, it won't work

@hjlarry
Copy link
Contributor

hjlarry commented Nov 4, 2024

do u self-host firecrawl in your computer or in the service, it works well when i self-hosted in my computer,but once i publish it on the service, it won't work

dify just send a simple http request to the firecrawl server, so the key point to solve this issue is to check what's the response of this request from dify's api server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working 🤔 cant-reproduce We can not reproduce it or the information is limited
Projects
None yet
Development

No branches or pull requests

5 participants