Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Intro Course Exercise 3 not being set up #452

Open
semibroiled opened this issue Jul 30, 2023 · 5 comments
Open

SQL Intro Course Exercise 3 not being set up #452

semibroiled opened this issue Jul 30, 2023 · 5 comments

Comments

@semibroiled
Copy link

The setup imports are written to import hacker_news with the actual being hacker-news which may be the possible cause behind this.

@semibroiled
Copy link
Author

Collecting git+https://github.com/Kaggle/learntools.git
Cloning https://github.com/Kaggle/learntools.git to /tmp/pip-req-build-l71k0skd
Running command git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git /tmp/pip-req-build-l71k0skd
fatal: unable to access 'https://github.com/Kaggle/learntools.git/': Could not resolve host: github.com
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git class="ansi-green-fg"> /tmp/pip-req-build-l71k0skd did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/Kaggle/learntools.git class="ansi-green-fg"> /tmp/pip-req-build-l71k0skd did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Using Kaggle's public dataset BigQuery integration.

NotFound Traceback (most recent call last)
Cell In[4], line 9
4 binder.bind(globals())
5 ##import os
6 #if not os.path.exists('/kaggle/input/hacker_news/comments'):
7 # os.symlink('/input/hacker-news/comments','.../input/hacker_news/')
8 # pass
----> 9 from learntools.sql.ex3 import *
10 print("Setup Complete")

File /opt/conda/lib/python3.10/site-packages/learntools/sql/ex3.py:17
10 prolific_commenters_query = """
11 SELECT author, COUNT(id) AS NumPosts
12 FROM bigquery-public-data.hacker_news.comments
13 GROUP BY author
14 HAVING COUNT(id) > 10000
15 """
16 query_job = client.query(prolific_commenters_query)
---> 17 prolific_commenters_answer = query_job.to_dataframe()
19 # (2) NumDeletedPosts
20 deleted_posts_query = """
21 SELECT COUNT(1) AS num_deleted_posts
22 FROM bigquery-public-data.hacker_news.comments
23 WHERE deleted = True
24 """

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/job/query.py:1695, in QueryJob.to_dataframe(self, bqstorage_client, dtypes, progress_bar_type, create_bqstorage_client, date_as_object, max_results, geography_as_object)
1613 def to_dataframe(
1614 self,
1615 bqstorage_client: "bigquery_storage.BigQueryReadClient" = None,
(...)
1621 geography_as_object: bool = False,
1622 ) -> "pandas.DataFrame":
1623 """Return a pandas DataFrame from a QueryJob
1624
1625 Args:
(...)
1693 :mod:shapely library cannot be imported.
1694 """
-> 1695 query_result = wait_for_query(self, progress_bar_type, max_results=max_results)
1696 return query_result.to_dataframe(
1697 bqstorage_client=bqstorage_client,
1698 dtypes=dtypes,
(...)
1702 geography_as_object=geography_as_object,
1703 )

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/_tqdm_helpers.py:88, in wait_for_query(query_job, progress_bar_type, max_results)
84 progress_bar = get_progress_bar(
85 progress_bar_type, "Query is running", default_total, "query"
86 )
87 if progress_bar is None:
---> 88 return query_job.result(max_results=max_results)
90 i = 0
91 while True:

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/job/query.py:1499, in QueryJob.result(self, page_size, max_results, retry, timeout, start_index, job_retry)
1496 if retry_do_query is not None and job_retry is not None:
1497 do_get_result = job_retry(do_get_result)
-> 1499 do_get_result()
1501 except exceptions.GoogleAPICallError as exc:
1502 exc.message = _EXCEPTION_FOOTER_TEMPLATE.format(
1503 message=exc.message, location=self.location, job_id=self.job_id
1504 )

File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:349, in Retry.call..retry_wrapped_func(*args, **kwargs)
345 target = functools.partial(func, *args, **kwargs)
346 sleep_generator = exponential_sleep_generator(
347 self._initial, self._maximum, multiplier=self._multiplier
348 )
--> 349 return retry_target(
350 target,
351 self._predicate,
352 sleep_generator,
353 self._timeout,
354 on_error=on_error,
355 )

File /opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:191, in retry_target(target, predicate, sleep_generator, timeout, on_error, **kwargs)
189 for sleep in sleep_generator:
190 try:
--> 191 return target()
193 # pylint: disable=broad-except
194 # This function explicitly must deal with broad exceptions.
195 except Exception as exc:

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/job/query.py:1489, in QueryJob.result..do_get_result()
1486 self._retry_do_query = retry_do_query
1487 self._job_retry = job_retry
-> 1489 super(QueryJob, self).result(retry=retry, timeout=timeout)
1491 # Since the job could already be "done" (e.g. got a finished job
1492 # via client.get_job), the superclass call to done() might not
1493 # set the self._query_results cache.
1494 self._reload_query_results(retry=retry, timeout=timeout)

File /opt/conda/lib/python3.10/site-packages/google/cloud/bigquery/job/base.py:728, in _AsyncJob.result(self, retry, timeout)
725 self._begin(retry=retry, timeout=timeout)
727 kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
--> 728 return super(_AsyncJob, self).result(timeout=timeout, **kwargs)

File /opt/conda/lib/python3.10/site-packages/google/api_core/future/polling.py:261, in PollingFuture.result(self, timeout, retry, polling)
256 self._blocking_poll(timeout=timeout, retry=retry, polling=polling)
258 if self._exception is not None:
259 # pylint: disable=raising-bad-type
260 # Pylint doesn't recognize that this is valid in this case.
--> 261 raise self._exception
263 return self._result

NotFound: 404 Not found: Table bigquery-public-data:hacker_news.comments was not found in location US

Location: US
Job ID: 9c49c97a-7566-47e9-bdc5-cc58ce59a18a

@agamemnonc
Copy link

agamemnonc commented Oct 13, 2023

Any updates on this?

@semibroiled
Copy link
Author

No unfortunately my method didnt work as intended as the whole datasets path is changed. I did find a workaround from a comment in Kaglge itself. I'll post it here for your convenience

@semibroiled
Copy link
Author

Ex - 3 Solution steps

  1. Add this into the first cell :

`
from learntools.core import binder

binder.bind(globals())
with open("/opt/conda/lib/python3.10/site-packages/learntools/sql/ex3.py","r") as f :
ex3 = f.read()
ex3 = ex3.replace("SELECT author","SELECT by")
ex3 = ex3.replace("GROUP BY author","GROUP BY by")
ex3 = ex3.replace("'author'","'by'")
ex3 = ex3.replace("author","`by`")
ex3 = ex3.replace("bigquery-public-data.hacker_news.comments","bigquery-public-data.hacker_news.full")

with open("/opt/conda/lib/python3.10/site-packages/learntools/sql/ex3_v2.py","w") as f2:
f2.write(ex3)
from learntools.sql.ex3_v2 import *`
2. change the table name from comments to full.
table_ref = dataset_ref.table("full")

  1. This query works now :
    prolific_commenters_query = """SELECTby, COUNT(1) AS NumPosts FROMbigquery-public-data.hacker_news.full GROUP BYby HAVING COUNT(1) > 10000"""

@semibroiled
Copy link
Author

Solution comes from user Palak807 on Kaggle and worked for me. Turns out the dataset name itself has changed with respect to the _ and - characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants