Census 2020 example #430

Azaya89 · 2024-10-17T00:34:24Z

Created a new example using the 2020 US census dataset.

minor updates on the 2010 census example - Moved to Update census 2010 example #459

maximlt · 2024-10-17T07:06:33Z

I suspect it is due to #429 but I'm not sure how to resolve it.

You need to re-create the conda environment locally following the contributing guide.

The test file added is a 0.1% sample of the full dataset but it is still about 8MB in size. I don't know if that is too large and should be reduced further.

It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely.

maximlt

Is there an absolute need to rename the original census project census_one? Without doing anything else, this is going to break all the links to its web page and deployment.

I would also not call the new one census_two but census2020.

Azaya89 · 2024-10-17T09:20:48Z

Is there an absolute need to rename the original census project census_one? Without doing anything else, this is going to break all the links to its web page and deployment.

I would also not call the new one census_two but census2020.

I imagine renaming the original from census to something else makes sense seeing as there are now more than one census notebooks in the examples gallery (and possibly more in the future). However, I tried renaming both to census2010 and census2020 but the doit validate step emits a warning that only lower case characters and underscore allowed in the naming. I wasn't sure ignoring that warning was ideal that is why I now renamed both to the current names.

maximlt · 2024-10-17T10:39:18Z

However, I tried renaming both to census2010 and census2020 but the doit validate step emits a warning that only lower case characters and underscore allowed in the naming

Sounds like a bug in the validation code, something like census2020 should be allowed.

Azaya89 · 2024-10-17T11:03:40Z

You need to re-create the conda environment locally following the contributing guide.

Done. Thanks

It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely.

Reduced it to <1MB now.

maximlt · 2024-10-21T06:58:52Z

Replying to your comment elsewhere:

Thank you. I'm still in favor of renaming the first one to census2010 though.

If you intend to rename it, then redirect links have to be set up:

Full link: https://examples.holoviz.org/gallery/census/census.html to https://examples.holoviz.org/gallery/census2010/census2010.html
Shortcut link: https://examples.holoviz.org/census to https://examples.holoviz.org/census2010
Unfortunately, it's not super easy to set a redirect link for the deployment itself (https://census-notebook.holoviz-demo.anaconda.com/notebooks/census.ipynb), so renaming would break it. We have recently broken them all (new subdomain) and no one complained as far as I know so it seems it wouldn't be too bad.

Alternatively, we could just:

Change the title property in the project YAML to Census 2010
Change the notebook top-level heading to Census 2010

Azaya89 · 2024-10-21T11:17:53Z

Alternatively, we could just:

Change the title property in the project YAML to Census 2010

Change the notebook top-level heading to Census 2010

I already did these in this PR. Would that be enough to differentiate both examples eventually?

maximlt · 2024-10-21T11:35:50Z

Would that be enough to differentiate both examples eventually?

I think so?

Azaya89 · 2024-10-21T14:34:31Z

I think so?

OK. I will revert the other renaming then

census2020/census2020.ipynb

maximlt · 2024-12-05T22:04:27Z

@Azaya89 just to let you know that I won't have the bandwidth to review this before the end of the year, in case it's urgent.

Azaya89 · 2024-12-05T22:21:30Z

@Azaya89 just to let you know that I won't have the bandwidth to review this before the end of the year, in case it's urgent.

No problem. I can't say it's urgent :)

maximlt · 2024-12-08T11:29:00Z

@jbednar a thought I had after your comment in #477 (comment), wouldn't it be more appropriate to extend the original census example instead of adding a new one? I haven't yet run Census 2020 (this PR), from having a quick look at the code it doesn't look so different so I have the impression maintaining the two notebooks with the same environment should be possible.

jbednar · 2024-12-09T00:46:03Z

I think there are three things to consider when deciding whether examples should be grouped into a single directory:

Are the examples thematically linked, such that you'd reasonably want to learn about one once you click on the other? (✔️ )
Do they share essentially the same environment, such that it makes sense to set up a single environment for both? (✔️ )
Can they use the same datasets? If different, does adding a new notebook add only minimal file or download size? (❌ ).

Here census2010 and census2020 clearly meet the first two criteria, but for 3, they have large and equal file sizes, and most people are only likely to want one or the other. So unless we have a way to download the right data per notebook rather than per example (which we might well end up with as we move away from anaconda_project), then combining 2010 and 2020 would cause most users to require double the disk space, which seems problematic.

maximlt · 2024-12-09T08:10:34Z

they have large and equal file sizes

I launched these two ZIP downloads and was surprised by the difference in file sizes, with 1.3GB for census2010 and 4GB for census2020.

We need to check why the files differ so much, are there more columns in census2020? If so, are they used? Or is the data stored differently (dtype, compression)?

@jbednar I'm asking this question as it is quite clear, to me at least, that examples are meant to be kept up-to-date, i.e. this site is not a blog. We often hear that we have too many APIs (last was during the last steering committee meeting on Friday), we should try to expose our users to the best practices only. It means that this growing number of examples needs to be maintained; this year, we have updated about half of them and it wasn't trivial. So I'm going to challenge adding census 2020 if it's just a slight variation of census 2010 with a slightly different dataset? To illustrate this, census 2010 (which was priority 4 on our NF SDG list, we almost got to it) has a bug in the legend and might be able to use hvPlot directly.

Azaya89 · 2024-12-09T15:31:48Z

We need to check why the files differ so much, are there more columns in census2020? If so, are they used? Or is the data stored differently (dtype, compression)?

I can point out here that the 2010 dataset contains ~306 million rows while the 2020 dataset contains over 334 million. That's an additional ~28 million rows. I'm sure that alone is not the reason for the large difference in file size but I think it also play a role?

maximlt · 2024-12-09T15:52:49Z

We need to check why the files differ so much, are there more columns in census2020? If so, are they used? Or is the data stored differently (dtype, compression)?

I can point out here that the 2010 dataset contains ~306 million rows while the 2020 dataset contains over 334 million. That's an additional ~28 million rows. I'm sure that alone is not the reason for the large difference in file size but I think it also play a role?

That's not enough to make it 3x bigger.

Azaya89 · 2024-12-09T16:06:29Z

We need to check why the files differ so much, are there more columns in census2020? If so, are they used? Or is the data stored differently (dtype, compression)?

I can point out here that the 2010 dataset contains ~306 million rows while the 2020 dataset contains over 334 million. That's an additional ~28 million rows. I'm sure that alone is not the reason for the large difference in file size but I think it also play a role?

That's not enough to make it 3x bigger.

Sure!

Edit: I think I have found the issue: The 2010 dataset 'x' and 'y' columns are in Float32 dtype while the 2020 dataset are in Float64 dtype. I will update the 2020 dataset to use Float32 and have it re-uploaded.

jbednar · 2024-12-09T18:37:40Z

Right; we explicitly chose 32 bits (and would use 16 bits if that were an easy option) to reduce file size.

Census 2020 is a special case of people expecting the data to be more up to date. The code should not be different from 2010, so the extra maintenance should not be prohibitive.

Azaya89 · 2024-12-10T10:07:11Z

The new dataset has been uploaded now. It's about 2.15 GB on disk.

jbednar · 2024-12-10T12:23:01Z

Is the race field a categorical? Category should be much smaller than string

maximlt · 2024-12-10T14:35:12Z

So I downloaded the two ZIP files (census 2010 and 2020) and compared them.

File sizes (zipped): old 1.44GB vs new 1.62GB
Parquet files in the folder: old 36 vs new 141
Race values:

Schemas:

Compression: both use SNAPPY

So while the file sizes are now closer, there are still a few differences:

way more partitions for the new one, what is the impact on performance?
I'm not sure why the x,y values aren't displayed as Float 32 in the new one (read with DuckDB)
~~race is not stored as categorical in any of them, I'm not sure why?~~ see messages below
race values are different, shortened to one letter in the old file while complete in the new one

I'd also note that this example installs fastparquet. I think that in the end it's pyarrow that is used as the engine as it's installed (see the lockfile) and it has become the default engine in Dask (https://docs.dask.org/en/stable/changelog.html#fastparquet-engine-deprecated). Worth checking though.

I would still prefer if we had a single census example, the above shows that it's not hard for two similar examples to diverge enough to cause maintenance pain. How about updating the original census to use the 2020 dataset and dropping entirely the 2010 one? Pretty sure the text in the text in the original example still applies to the new dataset.

Azaya89 · 2024-12-10T15:23:35Z

Is the race field a categorical? Category should be much smaller than string

Yeah (before I changed the 'x' and 'y' types)

maximlt · 2024-12-10T16:11:48Z

Yeah (before I changed the 'x' and 'y' types)

Ok then it's just that I don't know how to find out whether a column is categorical using DuckDB :) Interestingly, the only place where I can see there might be some categories is in the old file. So, yes clearly my approach was wrong, at least for finding out whether a column is categorical or not.

maximlt · 2024-12-10T16:16:31Z

@Azaya89 I see you closed #178. Tbh I'm not sure it should be closed. This issue is about implementing spatial indexing with spatialpandas for census. You wrote you updated the example in #459 but the changes this PR contains were unrelated to spatial indexing. @jbednar shouldn't we try to re-add spatial indexing to the census example? It seems like it could greatly benefit from it?

jbednar · 2024-12-10T22:15:43Z

Yes, very much so! Spatial indexing is very important.

BTW, I've had a hard time being sure when it is in use and when not. I seem to recall there being some app we made that showed how many partitions were used in the current view, which would be really valuable for showing the effect of spatial indexing.

jbednar · 2024-12-11T00:34:26Z

How about updating the original census to use the 2020 dataset and dropping entirely the 2010 one? Pretty sure the text in the text in the original example still applies to the new dataset.

I haven't seen the actual detailed results, but if you're right that the same conclusions apply, that might be a reasonable approach. I'd still want a separate notebook looking at the differences between the two, but that would be a bonus add-on that could load the older data and massage the newer data to match the older. If someone could render the old notebook with the new data and point me to a copy of both I'd be happy to make that call.

hoxbro · 2024-12-13T08:57:44Z

I propose having the 2010 census as a historical (static) document, with an admonition referring to the newer census, 2020.

Because the examples are not versioned, we lose some valuable knowledge by always pushing to the latest and greatest API we have decided to recommend. Census 2020 uses hvplot and panel, but I find it interesting that you can see how our tools developed in the current Census 2010, going from "pure" Datashader to using it along with HoloViews.

jbednar · 2024-12-14T00:54:59Z

Because the examples are not versioned, we lose some valuable knowledge by always pushing to the latest and greatest API we have decided to recommend.

Well, not versioned apart from Git, that is! :-)

maximlt

I discussed briefly with Simon about this example this week and realized that you all (Jim, Simon and Isaiah) agreed it'd be a worthwhile example to add, so I was a little embarrassed getting into your way with my recent messages. Even more so as I've been aware this was in preparation for a while too. Though I have to say the timing of adding a new example while we were trying to wrap up the NF SDG work and I was reviewing the new examples from Demetris was unfortunate, I had no time to dive into this example more before now.

I gave the whole thing a bit more thought and reviewed the example, and, in fact, I have to say I am not so embarrassed about my view 🙃

Any new example comes with some non-negligible maintenance cost, even if it doesn't look like so:

Except a couple examples when we moved from pyviz.org to holoviz.org, examples are neither archived nor removed, they're all exposed equally and are meant to be maintained over a long time. We have updated this year ~15 examples, some of them being very old (before I knew any Python!)
It can have deployments which have to be monitored and fixed when something goes wrong; Jean-Luc and I are basically unofficially on-call for this
We already know that all the examples will have to be migrated from anaconda-project to a new tool (TBD). And, we also know they will all have to be re-deployed on a new platform. While we have some testing in place, our tests are rather simple and only cover "does the notebook run without error?" which is often not enough to fully test an example. As such, during these migrations, the examples will have to be run manually one by one. I'm well aware of that as I've had to do exactly that end of last year when we had to re-deploy all the examples and had to migrate to the new and stricter validation put in place to ensure the notebook/site/deployment were in sync. This easily made it to the podium of my most miserable professional experiences, and I'm already sad for the person who will have to take care of that (even more so if it's me!).
Fix potential bugs, e.g. example not working on Windows, bugs in the 3k lines dodo.py or the 9 Github workflows or the custom Sphinx extensions that could affect this particular example.

Of course, I'm not against adding new examples and I think Isaiah's FIFA example, Demetris' neuroscience ones, and the planned Grabcut and Stable Diffusion, are all great additions to the gallery as they all add something new. In its own right there's nothing wrong with this example, but I have a hard time seeing what it adds to the gallery:

The Geospatial domain is already overly represented
The approach used is already shown in other examples
The dataset itself is simply an updated version of a dataset already used in another example (original census)

On the last point, I realized that updating the dataset is a task we didn't integrate with @droumis in the Example Modernization Checklist that was guiding Isaiah and Jason while updating the examples (https://github.com/holoviz-topics/examples/wiki/Example-Modernization-Checklist-(2024)). I don't recall if it was an explicit decision or not. There are many examples that could be updated to use a newer dataset (too many to list them here!), I'd be concerned with the precedent Census 2020 would set.

To conclude:

I'm in favor of having a single census example
I have no preference for the dataset used (2010, 2020, 2010 and 2020)
It should follow our best API practices:
- I have some questions about some of the choices made in this example (see review comments)
- Note that I think that, in the original Census example, the usage of Datashader only is still worthwhile to demonstrate as being able to easily generate a static image of a large dataset is a common need

(Note that the site cannot be built as this notebook when turned into HTML is too heavy - 170MB - exceeds Github's file size limit of 100MB)

maximlt · 2024-12-15T21:46:42Z

census2020/census2020.ipynb

+    "\n",
+    "x_range, y_range = CITIES['USA']\n",
+    "\n",
+    "df.hvplot.points('x', 'y', rasterize=True, cnorm='linear', xlim=x_range, ylim=y_range, cmap=gray)"


I think cmap accepts colorcet cmaps referenced as strings, i.g. cmap='gray'. Can you try that? And if it works, update your hvplot reference notebook on cmap/color_key/color to also include that?

It worked. Thanks

maximlt · 2024-12-15T21:49:16Z

census2020/census2020.ipynb

+    "\n",
+    "def plot_city(longitude_range, latitude_range, w=plot_width, h=plot_height, bgcolor='black'):\n",
+    "    \"\"\"Plots a map of the pop density aggregated by race within specified longitude and latitude ranges\"\"\"\n",
+    "    return df.hvplot.points('x', 'y', rasterize=True, dynspread=True, cnorm='eq_hist',\n",


Instead of instantiating an hv.Tiles element from an xyz layer, and overlaying it with the points plot, you can directly set tiles=xyz.<> in the hvplot call. This is documented there https://hvplot.holoviz.org/user_guide/Geographic_Data.html (it was implemented this year, motivated by one the examples that was updated part of the NF SDG work of this year).

Thanks. This worked.

Edit: On second thoughts, it does not work well because the labels are now under the data points so they don't show well when zoomed in.

Oops! Plus people often want both (background) tiles and (foreground) labels in the same plot. I guess that could be handled using a new option label_tiles to handle tiles that are meant to go on top, but at some point we have to stop adding more options to hvPlot. :-) Up to @maximlt .

maximlt · 2024-12-15T21:50:56Z

census2020/census2020.ipynb

+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "display(plot_city(*CITIES['New York City']))"


Why timing these ones and not the ones before too? I'm not asking you to do it, I would just like to understand why it's like this :)

My initial thinking then was that since the plots are now being called from a pre-defined function, it makes sense to time how long it takes to run but thinking about it further, there is no reason why the previous plots cannot also be timed. I guess I could also time them as well :)

Something to think about mabe. Serving the app with panel serve ... executes the notebook content every time a visitor visits the page. Calling display(plot(...)) forces the HoloViews plot to be generated, making the app longer to be served. In the streaming example of Demetris I mentioned it not pn.state.served: ... which could be used here inside a function to only execute display() when the app is not served.

In the streaming example of Demetris I mentioned if not pn.state.served: ... which could be used here inside a function to only execute display() when the app is not served.

Can you share a link to this example so I can see what it looks like?

In these examples I've generally put timing information on any cell that takes a surprising amount of time. Sometimes the surprise is "how can Datashader be so fast", and other times it's answering "how long does that dataset take to load".

Here's the streaming state bit: https://github.com/holoviz-topics/examples/pull/469/files#r1880352906

Something to think about mabe. Serving the app with panel serve ... executes the notebook content every time a visitor visits the page. Calling display(plot(...)) forces the HoloViews plot to be generated, making the app longer to be served.

In these examples I've generally put timing information on any cell that takes a surprising amount of time. Sometimes the surprise is "how can Datashader be so fast", and other times it's answering "how long does that dataset take to load".

Looking at both these comments, I am now reconsidering timing the plots at all. The plots actually display in good time (<2.5 secs for all of them), so while it would have been nice to showcase how fast it is, I don't think it's worth the extra overhead of making the app slow when served.

maximlt · 2024-12-15T21:55:44Z

census2020/census2020.ipynb

+    "    \"\"\"Plots a map of the pop density aggregated by race within specified longitude and latitude ranges\"\"\"\n",
+    "    return df.hvplot.points('x', 'y', rasterize=True, dynspread=True, cnorm='eq_hist',\n",
+    "                            xlim=longitude_range, ylim=latitude_range,\n",
+    "                            cmap=color_key, bgcolor=bgcolor, colorbar=False, hover=False,\n",


I'm wondering, why is hover set to False? And when hover=False, is there an advantage of using rasterize=True instead of datashade=True? cc @jbednar

I assume both hover and colorbar should be True here, because indeed, without at least one of those, I can't think of a reason to use rasterize.

Yes true I forgot about the colorbar.

Though I sort of understand why Isaiah would have disabled them, as they're both hard to parse.

Though I sort of understand why Isaiah would have disabled them, as they're both hard to parse.

Yes, that was exactly why.

The hover information is useful, but I agree it should be formatted better. I think @mattpap indicated to me long ago that this approach was a stopgap, if I'm remembering correctly, and that we might be able to format that as a set of key value pairs like x and y are instead. I think that's an issue to file separately.

The colorbar is not just hard to parse, but incorrect -- the brightness level indicates increasing population, but the colorbar suggest that it indicates decreasing population. That's either a bug in this notebook or another issue to file.

It would be great if we could easily select SI units for the tick formatting, so that 5.000e+5 would show up as 500K, but I don't know if anyone's implemented something like that.

The colorbar is not just hard to parse, but incorrect -- the brightness level indicates increasing population, but the colorbar suggest that it indicates decreasing population. That's either a bug in this notebook or another issue to file.

The colorbar only becomes incorrect when I add the color_key options to the plot. The previous plots using uniform colors showed more accurate, even if still hard to parse, colorbars. I don't know if it's a bug or what but it was definitely inaccurate/confusing which informed my decision to remove it entirely.

It would be great if we could easily select SI units for the tick formatting, so that 5.000e+5 would show up as 500K, but I don't know if anyone's implemented something like that.

I think this is possible in bokeh using a NumeralTickFormatter but I don't know if it is simple to do in hvPlot. I could go some levels lower to implement that in these plots but I fell it's just introducing unnecessary complexity to the code.
See also https://docs.bokeh.org/en/latest/docs/first_steps/first_steps_4.html#formatting-axis-ticks

For the colorbar colors, there's not a question of degree of accuracy; the colorbar is simply completely wrong. It is not actually reporting how the data is mapping onto brightness or value; it's reversed. Unless there's some obvious mistake in the example code, please file a bug report rather than working around the issue.

The colorbar ticks are not very useful right now, so if it is possible to provide a simple option like .opts(colorbar_tick_format=‘0 a’) and get ticks like '500k', it would be great to do so. If we have to jump through tons of hoops to get there, like defining custom plot hooks, then I agree with you that doing so is too complex, and then I agree with Maxime that we should disable the colorbar as not being very useful.

The hover information is useful, but I agree it should be formatted better. I think @mattpap indicated to me long ago that this approach was a stopgap, if I'm remembering correctly, and that we might be able to format that as a set of key value pairs like x and y are instead. I think that's an issue to file separately.

I recall there was a discussion about this probably in a meeting, but don't think any conclusions were ever recorded (in an issue or so). It would be best to start such issue/discussion with an example dataset (shape specifically) and expected output shape.

@mattpap , I agree a separate issue would be helpful here. What I'd personally like to see in such an issue is that instead of what the hover shows in the screenshot above, it shows:

lat: 30.2460° N lon: 87.7008° W Black: 157 Hispanic/Other: 3 Mixed: 12 Native American: 2 White: 91

(where I've omitted anything with zero count, since the more you zoom in the more sparsely the categories are populated and the less likely that most categories will need to be listed.)

Can you maybe take a look at this PR's code and set up a similar example in pure Bokeh? This one depends on the entire HoloViz stack, and is pretty unwieldy.

maximlt · 2024-12-15T22:02:28Z

census2020/census2020.ipynb

+    "dashboard.main.append(background)\n",
+    "dashboard.main.append(pop_map)\n",
+    "\n",
+    "dashboard.servable();"


When updating one of the widgets the map size briefly changes from the full window width to the original width. I recorded a GIF but it's too heavy for sharing on Github.

Is it similar to what I talked about here in point 2?

#430 (comment)

Yes, sorry I missed/forgot about this message!

maximlt · 2024-12-15T22:30:37Z

census2020/census2020.ipynb

+    "        \n",
+    "        self._update_plot()\n",
+    "        \n",
+    "    @param.depends('map_tile', 'map_alpha', 'city', watch=True)\n",


By re-creating the plot everytime map_alpha is updated in the app, we have an app that is slow at updating the alpha compared to e.g. NYC Taxi (https://examples.holoviz.org/gallery/nyc_taxi/dashboard.html). Not sure this is the kind of approach we want to promote? cc @jbednar

Agreed. The datashader_dashboard code shows how to avoid that update hit for alpha (e.g. using .apply.opts) and for tiles (which would be done as a separate callback, not folded into a single one). I'm only seeing these notes here; I'm not sure what this app is, as the original census doesn't have an app.

For plots that are quick to render, I don't mind a Panel-based approach that just re-renders the plot when a widget is updated, as this API (Panel's) is a bit easier to apprehend. In this case, we're rendering a large dataset and should indeed be a bit more careful when updating the plot, relying on HoloViews' API more.

maximlt · 2024-12-15T22:31:43Z

census2020/census2020.ipynb

+    "    def plot_view(self):\n",
+    "        return self.plot\n",
+    "    \n",
+    "    def __panel__(self, **kwargs):\n",


I think you could remove plot_view in favor of the following (to be tested):

@param.depends('plot') def __panel__(self): return self.plot

maximlt · 2024-12-15T22:33:18Z

census2020/census2020.ipynb

+    "\n",
+    "dashboard = pn.template.BootstrapTemplate(\n",
+    "    title=\"US population density map\",\n",
+    "    sidebar = [widgets],\n",


You can instantiate the template with main=[background, pop_map], instead of the two calls further down to append.

Thanks, I was following what was in the docs

maximlt · 2024-12-15T22:33:47Z

census2020/census2020.ipynb

+    "You can also zoom in and out or pan across the map to explore regions of interest in more detail.\n",
+    "\"\"\"\n",
+    "    \n",
+    "background = pn.Accordion(('Background', pn.pane.Markdown(intro_text, renderer_options={'breaks': False})\n",


We need to improve Panel to be able to simplify this really ugly code :)

maximlt · 2024-12-15T22:37:04Z

census2020/census2020.ipynb

+   "source": [
+    "pop_map = USPopulation()\n",
+    "\n",
+    "widgets = pn.WidgetBox(\n",


Since the widgets aren't customized in this app, instead of WidgetBox(pop_map.param.<>, ...), you could instead call pn.Param(pop_map) and set the precedence value of each Parameter in the class to tweak how they're supposed to be displayed (see https://panel.holoviz.org/how_to/param/examples/precedence.html and other pages in the docs)

jbednar · 2024-12-16T05:01:08Z

There are many examples that could be updated to use a newer dataset (too many to list them here!), I'd be concerned with the precedent Census 2020 would set.
I'm in favor of having a single census example
I have no preference for the dataset used (2010, 2020, 2010 and 2020)

I think the census example is in a different category; it's so iconic that it's nearly a logo for Datashader. We wouldn't take the time to update e.g. the gull_tracking example if it had new data, because it wouldn't change anything. But at least in the USA, people want to explore the actual data involved, e.g. to see how segregated their own neighborhood is. Having that information be 14 years out of date makes it much less effective in reaching people's actual interest. I don't think updating census will set any sort of precedent that would apply to other examples.

Also note that originally I was proposing that census 2020 be very basic compared to the 2010, as I haven't seen whether the data is suitable for doing all the sort of analyses we did for 2010. But if all the same analyses can be done and have qualitatively similar conclusions, then I'm happy for 2020 to replace 2010 for most purposes. Still, 2010 has been so widely promoted and referred to that it may be best to keep it at that same address. Anyway, I haven't seen any of the 2020 results myself, so I can't say anything about them!

maximlt · 2024-12-16T08:18:47Z

Anyway, I haven't seen any of the 2020 results myself, so I can't say anything about them!

FWIW the example is pretty easy to run :) I also recommend running it directly instead of waiting for the dev site to be built as all the plots require a live kernel for full interactivity.

Interestingly, the original census example is the most visited of all over the last 6 months (for reference, that's about 10% of the pageviews of the Tabulator component). I'm not sure where the visitors come from, the highest referrer being a redirect link.

jbednar · 2024-12-17T02:58:17Z

Interestingly, the original census example is the most visited of all over the last 6 months

Some fraction of those are me! I show those results to people all the time. But not 170 times a month, no. Anyway, yes, this is a popular example, and we should keep having that link work.

FWIW the example is pretty easy to run :)

I'm sure. I'm afraid to run the example until it's ready, because I either get stuck in some conda environment hell, or it works, and I notice a ton of stuff I want to update, and then I'm late for my next meeting or my family is asking where dinner is. :-) But if it's ready, I'll check it out.

maximlt · 2024-12-17T19:12:34Z

Anyway, yes, this is a popular example, and we should keep having that link work.

Oh yeah keeping the link working is a must (I've tried to carefully set redirects to any page we've removed), the question is what content it should display.

I either get stuck in some conda environment hell

It worked perfectly for me! I can't help you with your other problems 🙃

jbednar · 2024-12-18T21:16:23Z

FWIW the example is pretty easy to run :)

Ok, I did get it to run. I didn't have any dependency hell this time, but I did have nasty git merge conflicts to deal with thanks to the unpleasant interactions Jupyter has with revision control, for all the random notebooks in examples that I've run and happened to modify as I use them. I've dealt with all that, and now I can see that the 2020 data is not ready to replace 2010.

Here's Chicago in 2010:

and in 2020:

I don't know if any of the extra detail in 2020 is meaningful, but there are definitely patterns with the racial coloring that look entirely like artifacts rather than meaningful data. E.g. each little block seems to be sorted by race, with Asians living in the more Northern portion of the block, which surely isn't really true:

I would guess that we have bugs in how the individual points are being synthesized from the block data, e.g. maybe the data is sorted by race and then has locations generated in order? In any case the 2020 data is definitely not ready to replace the 2010 data.

Azaya89 requested a review from hoxbro October 17, 2024 00:34

maximlt reviewed Oct 17, 2024

View reviewed changes

Azaya89 added 17 commits October 19, 2024 09:51

minor updates

7890e28

renamed 2010 project

91ae8cd

created 2020 example dir

06fa9d8

further WIP

0057220

re-lock files

08c9413

more WIP

5ed6311

add thumbnails image and tag

e5cbae6

more WIP

a6751b2

renamed 2010 test data dir

5cc1021

added 2020 test data files

fcf8b95

renamed both census files

5175db5

re-lock files

55a1536

reduced test data size

d15fba4

renamed 2020 example

abdb229

renamed 2010 example

ea8a141

reduced 2020 test files

a53fa76

more renaming

b18baf4

Azaya89 force-pushed the azaya/census branch from 5bd8da9 to b18baf4 Compare October 19, 2024 10:32

revert census2010 renaming

5e7610e

Azaya89 commented Oct 22, 2024

View reviewed changes

census2020/census2020.ipynb Outdated Show resolved Hide resolved

maximlt reviewed Dec 15, 2024

View reviewed changes

Merge branch 'main' into azaya/census

43de7e6

Azaya89 added 3 commits December 19, 2024 16:36

update dashboard

6daba50

update lock files

2af2716

Merge branch 'main' into azaya/census

7cd897e

maximlt mentioned this pull request Jan 10, 2025

Doc build failing when project renamed #490

Open

Census 2020 example #430

Are you sure you want to change the base?

Census 2020 example #430

Conversation

Azaya89 commented Oct 17, 2024 • edited Loading

maximlt commented Oct 17, 2024

maximlt left a comment

Choose a reason for hiding this comment

Azaya89 commented Oct 17, 2024

maximlt commented Oct 17, 2024

Azaya89 commented Oct 17, 2024

maximlt commented Oct 21, 2024

Azaya89 commented Oct 21, 2024

maximlt commented Oct 21, 2024

Azaya89 commented Oct 21, 2024

maximlt commented Dec 5, 2024

Azaya89 commented Dec 5, 2024

maximlt commented Dec 8, 2024

jbednar commented Dec 9, 2024

maximlt commented Dec 9, 2024

Azaya89 commented Dec 9, 2024

maximlt commented Dec 9, 2024

Azaya89 commented Dec 9, 2024 • edited Loading

jbednar commented Dec 9, 2024

Azaya89 commented Dec 10, 2024

jbednar commented Dec 10, 2024

maximlt commented Dec 10, 2024 • edited Loading

Azaya89 commented Dec 10, 2024

maximlt commented Dec 10, 2024

maximlt commented Dec 10, 2024

jbednar commented Dec 10, 2024

jbednar commented Dec 11, 2024

hoxbro commented Dec 13, 2024

jbednar commented Dec 14, 2024

maximlt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Azaya89 Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Azaya89 Dec 16, 2024 • edited by jbednar Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbednar Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Azaya89 Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbednar Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximlt Dec 15, 2024 • edited by jbednar Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbednar commented Dec 16, 2024

maximlt commented Dec 16, 2024

jbednar commented Dec 17, 2024

maximlt commented Dec 17, 2024

Azaya89 commented Oct 17, 2024 •

edited

Loading

Azaya89 commented Dec 9, 2024 •

edited

Loading

maximlt commented Dec 10, 2024 •

edited

Loading

Azaya89 Dec 16, 2024 •

edited

Loading

Azaya89 Dec 16, 2024 •

edited by jbednar

Loading

jbednar Dec 17, 2024 •

edited

Loading

Azaya89 Dec 18, 2024 •

edited

Loading

jbednar Dec 16, 2024 •

edited

Loading

maximlt Dec 15, 2024 •

edited by jbednar

Loading

jbednar commented Dec 18, 2024 •

edited

Loading