Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'stats' #16

Open
eulereadgbe opened this issue May 8, 2022 · 7 comments
Open

KeyError: 'stats' #16

eulereadgbe opened this issue May 8, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@eulereadgbe
Copy link

@alanorth, when I tried running python -m dspace_statistics_api.indexer, I received this error:

  File "C:\Users\euler\.pyenv\pyenv-win\versions\3.7.9\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\euler\.pyenv\pyenv-win\versions\3.7.9\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\dspace-statistics-api\dspace_statistics_api\indexer.py", line 223, in <module>
    index_views("items", "id")
  File "D:\dspace-statistics-api\dspace_statistics_api\indexer.py", line 56, in index_views
    results_totalNumFacets = res.json()["stats"]["stats_fields"][facetField][
KeyError: 'stats'

I tried this in a repository with no shards, and another with sharded statistics. Both repositories are using DSpace version 6.3 running on Windows 2019 Server and tested with Python versions 3.7.9, 3.9.1, and 3.9.10. What could I be missing?

@alanorth alanorth added the bug Something isn't working label May 9, 2022
@alanorth
Copy link
Member

alanorth commented May 9, 2022

@eulereadgbe seems there is something wrong with current 1.4.4-dev. I just noticed the same bug in my test environment, but v1.4.3 works.

@eulereadgbe
Copy link
Author

I downloaded the v1.4.3 tag but I still have the same error as the master and v6_x branch. I also tested 1.2.0, 1.4.2, and 1.4.3 releases.

(venv) E:\dspace-statistics-api-1.4.3>python -m dspace_statistics_api.indexer
Traceback (most recent call last):
  File "C:\Users\Administrator\.pyenv\pyenv-win\versions\3.7.9\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Administrator\.pyenv\pyenv-win\versions\3.7.9\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\dspace-statistics-api-1.4.3\dspace_statistics_api\indexer.py", line 223, in <module>
    index_views("items", "id")
  File "E:\dspace-statistics-api-1.4.3\dspace_statistics_api\indexer.py", line 56, in index_views
    results_totalNumFacets = res.json()["stats"]["stats_fields"][facetField][
KeyError: 'stats'

I'm not sure if my issue is Windows OS related only. Sorry I don't have a non-Windows instance where I can test this.

@alanorth
Copy link
Member

alanorth commented May 9, 2022

Yes actually I was mistaken, v1.4.4-dev is working here also (and looking at the few git commits since v1.4.3 I haven't changed anything other than updating dependencies).

So back to your problem. Are you using the built-in Solr 4.10.x that comes with DSpace 6.x, or a standalone Solr? This is the HTTP request that the indexer makes to Solr:

http://localhost:8080/solr/statistics/select?q=type%3A2+AND+id%3A%2F.%7B36%7D%2F&fq=-isBot%3Atrue+AND+statistics_type%3Aview&fl=id&facet=true&facet.field=id&facet.mincount=1&facet.limit=1&facet.offset=0&stats=true&stats.field=id&stats.calcdistinct=true&shards=&rows=0&wt=json

Do you get a result if you paste this URL into your browser? Note that this assumes Solr is at http://localhost:8080/solr.

@eulereadgbe
Copy link
Author

I checked the Solr version I'm using and it is version 4.10.4 and this Solr comes with the DSpace 6.3 installation. Below is the result of the Solr query:

{"responseHeader":{"status":500,"QTime":135,"params":{"stats.calcdistinct":"true","facet.field":"id","fl":"id","fq":"-isBot:true AND statistics_type:view","rows":"0","q":"type:2 AND id:/.{36}/","facet.limit":"1","shards":"","stats":"true","facet.mincount":"1","facet":"true","wt":"json","facet.offset":"0","stats.field":"id"}},"response":{"numFound":146786,"start":0,"docs":[]},"facet_counts":{"facet_queries":{},"facet_fields":{"id":["34d62239-a4bf-4f19-b662-64b1820b0adc",1529]},"facet_dates":{},"facet_ranges":{},"facet_intervals":{}},"error":{"msg":"Invalid shift value in prefixCoded bytes (is encoded value really an INT?)","trace":"java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)
	at org.apache.lucene.util.NumericUtils.getPrefixCodedIntShift(NumericUtils.java:209)
	at org.apache.lucene.util.NumericUtils$2.accept(NumericUtils.java:497)
	at org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:244)
	at org.apache.lucene.search.FieldCacheImpl$Uninvert.uninvert(FieldCacheImpl.java:309)
	at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:712)
	at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:213)
	at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:600)
	at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:665)
	at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:213)
	at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:600)
	at org.apache.lucene.queries.function.valuesource.IntFieldSource.getValues(IntFieldSource.java:57)
	at org.apache.solr.handler.component.AbstractStatsValues.setNextReader(StatsValuesFactory.java:220)
	at org.apache.solr.handler.component.SimpleStats.getFieldCacheStats(StatsComponent.java:368)
	at org.apache.solr.handler.component.SimpleStats.getStatsFields(StatsComponent.java:326)
	at org.apache.solr.handler.component.SimpleStats.getStatsCounts(StatsComponent.java:290)
	at org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:79)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.dspace.solr.filters.LocalHostRestrictionFilter.doFilter(LocalHostRestrictionFilter.java:50)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
	at org.apache.catalina.valves.RequestFilterValve.process(RequestFilterValve.java:348)
	at org.apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:53)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
	at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:235)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)
","code":500}}

It seems it is returning a code 500 error. Is there something wrong with my Solr instance? I tried this also with the other 6.3 repositories I'm maintaining and the results were the same, however, when I tried this in an instance running 6.4-SNAPSHOT, the results were ok.

Results from an instance running 6.4-SNAPSHOT:

{
  "responseHeader": {
    "status": 0,
    "QTime": 15,
    "params": {
      "stats.calcdistinct": "true",
      "facet.field": "id",
      "fl": "id",
      "fq": "-isBot:true AND statistics_type:view",
      "rows": "0",
      "q": "type:2 AND id:/.{36}/",
      "facet.limit": "1",
      "shards": "",
      "stats": "true",
      "facet.mincount": "1",
      "facet": "true",
      "wt": "json",
      "facet.offset": "0",
      "stats.field": "id"
    }
  },
  "response": {
    "numFound": 523,
    "start": 0,
    "docs": []
  },
  "facet_counts": {
    "facet_queries": {},
    "facet_fields": {
      "id": [
        "650e8f6d-7b7a-48f8-9a2e-cb4a176a0a2d",
        58
      ]
    },
    "facet_dates": {},
    "facet_ranges": {},
    "facet_intervals": {}
  },
  "stats": {
    "stats_fields": {
      "id": {
        "min": "0b860c68-4bfe-4462-900b-283b9114f449",
        "max": "fd17ff2e-b892-460f-89ee-ad0b09aea0ac",
        "count": 523,
        "missing": 0,
        "distinctValues": [
          "0b860c68-4bfe-4462-900b-283b9114f449",
          "3d2528d2-adf7-43ac-bb21-844df1b6cd38",
          "41b18e82-f7b1-48d1-8a90-add8b778b064",
          "428a60e1-6af0-41ad-89d9-65b7116b00a2",
          "54ff34d8-61ba-4211-9d49-261f9d4458dc",
          "58ac556a-9b69-4546-a612-b3f80c442a17",
          "5add0527-1b67-461a-94de-18cb2bb9eb82",
          "650e8f6d-7b7a-48f8-9a2e-cb4a176a0a2d",
          "66e28449-8c0e-466b-a008-8b3f83b41139",
          "6aedd0b2-1e67-41df-9f94-307f8f2147f9",
          "8420fe5b-5adf-4cd7-99d3-5f498547b8ba",
          "85b53292-ff49-4e9d-8cf7-591cf615dc8e",
          "89b64227-fb99-4477-ae49-1f767eaa3093",
          "8c7bd6ad-ce10-4402-a88c-7182383b55c2",
          "92ac4ab2-5387-4452-859a-4d375246ed3c",
          "95ef3da4-4f90-4792-842a-7a368787b37b",
          "9ffdf464-a5ce-48bb-9985-c3111e3cc613",
          "b1630edf-3ce7-4856-b0f1-e5dd926da20f",
          "ee4b9cf1-1a98-43b3-8988-a2199ec9f33a",
          "fd17ff2e-b892-460f-89ee-ad0b09aea0ac"
        ],
        "countDistinct": 20,
        "facets": {}
      }
    }
  }
}

I'll try dspace-statistics-api in this repository where the Solr query you sent is working.

@alanorth
Copy link
Member

alanorth commented May 10, 2022

That's really strange. Seems to be something with Solr... I don't know, but this is weird:

java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)

I see some results on Google for that, related to Elasticsearch and Solr, both of which are based on Lucene. Unfortunately I am not an expert on Solr so this is beyond me. I manage two DSpace 6.3 installations and test this locally in my dev environment as well and it works on all...

@eulereadgbe
Copy link
Author

@alanorth , so I just tested this in a repository where your Solr query did not return an error. I just upgrade the Solr of this repository to use UUIDs, and I made sure that all INT IDs were migrated to UUIDs:

Connecting to http://localhost:8080/solr/statistics

=================================================================
        *** Statistics Records with Legacy Id ***

                   0    Bistream View
                   0    Item View
                   0    Collection View
                   0    Community View
                   0    Collection Search
                   0    Community Search
        --------------------------------------
                   0    TOTAL
=================================================================

                   0      TOTAL... (     1 sec; 0:00:01; DB cache:      0/       0; Docs:      0)

However, when I run python -m dspace_statistics_api.indexer, the console returned this message:

(venv) C:\Users\Administrator\Documents\dspace-statistics-api>python -m dspace_statistics_api.indexer
items: indexing views (page 1 of 11)
items: indexing views (page 2 of 11)
items: indexing views (page 3 of 11)
items: indexing views (page 4 of 11)
items: indexing views (page 5 of 11)
items: indexing views (page 6 of 11)
items: indexing views (page 7 of 11)
items: indexing views (page 8 of 11)
items: indexing views (page 9 of 11)
items: indexing views (page 10 of 11)
items: indexing views (page 11 of 11)
communities: indexing views (page 1 of 2)
Traceback (most recent call last):
  File "C:\Users\Administrator\.pyenv\pyenv-win\versions\3.7.9\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Administrator\.pyenv\pyenv-win\versions\3.7.9\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Administrator\Documents\dspace-statistics-api\dspace_statistics_api\indexer.py", line 224, in <module>
    index_views("communities", "owningComm")
  File "C:\Users\Administrator\Documents\dspace-statistics-api\dspace_statistics_api\indexer.py", line 105, in index_views
    psycopg2.extras.execute_values(cursor, sql, data, template="(%s, %s)")
  File "C:\Users\Administrator\Documents\dspace-statistics-api\venv\lib\site-packages\psycopg2\extras.py", line 1270, in execute_values
    cur.execute(b''.join(parts))
  File "C:\Users\Administrator\Documents\dspace-statistics-api\venv\lib\site-packages\psycopg2\extras.py", line 146, in execute
    return super().execute(query, vars)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for uuid: "90-unmigrated"
LINE 1: ...9),('dbd382f1-962d-410f-bb7c-89ac687599d8', 451),('90-unmigr...
                                                             ^

I don't understand why it is complaining about an unmigrated community ID when all the IDs were migrated to UUIDs. I also tried reindexing the Solr statistics but the result is still the same.

I tested this in Python versions 3.7.9 and 3.9.6.

Anyways, I am just curious and interested to try this API although I can't get past this indexing and I also found out that gunicorn doesn't run on Windows.

@alanorth
Copy link
Member

alanorth commented May 11, 2022

Oh yes, I've dealt with this issue of unmigrated IDs a few years ago when we upgrade to DSpace 6. It's a known issue according to the DSpace 6 docs:

If a UUID value cannot be found for a legacy id, the legacy id will be converted to the form "xxxx-unmigrated" where xxxx is the legacy id.

I purged them all like this, for each statistics core if it is sharded:

$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:/.*unmigrated.*/</query></delete>'

That uses a regular expression to match unmigrated IDs: /.*unmigrated.*/

I also found out that gunicorn doesn't run on Windows.

Oh! 😮 I haven't used Windows in twenty years so I have no idea. You will have to search for a WSGI server that runs on Windows. Sorry...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants