Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

MuMiN-dataset / mumin-build Public

Notifications You must be signed in to change notification settings
Fork 0
Star 27

Code
Issues 3
Pull requests 4
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: MuMiN-dataset/mumin-build

Releases · MuMiN-dataset/mumin-build

v1.10.0

31 Jul 14:39

saattrupdan

This commit was signed with the committer’s verified signature.

saattrupdan Dan Saattrup Nielsen

GPG key ID: E0E6DFBD1D28BC10

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.10.0 Latest

Latest

Added

Added n_jobs and chunksize arguments to MuminDataset, to enable customisation
of these.

Changed

Lowered the default value of chunksize from 50 to 10, which also lowers the memory
requirements when processing articles and images, as fewer of these are kept in
memory at a time.
Now stores all images as uint8 NumPy arrays rather than int64, reducing memory
usage of images significantly.

Assets 2

Loading

KennethEnevoldsen and rymc reacted with hooray emoji

All reactions

🎉 2 reactions

2 people reacted

v1.9.0

22 Jul 10:53

saattrupdan

This commit was signed with the committer’s verified signature.

saattrupdan Dan Saattrup Nielsen

GPG key ID: E0E6DFBD1D28BC10

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.9.0

Added

Added checkpoint after rehydration. This means that if compilation fails for whatever
reason after this point, the next compilation will resume after the rehydration
process.
Added some more unit tests.

Fixed

Fixed bug on Windows where some tweet IDs were negative.
Fixed another bug on Windows where the timeout decorator did not work, due to the use
of signals, which are not available on Windows machines.
Fixed bug on MacOS causing Python to crash during parallel extraction of articles and
images.

Changed

Refactored repository to use the more modern pyproject.toml with poetry.

Assets 2

Loading

All reactions

v1.8.0

14 Apr 12:04

saattrupdan

Compare

Choose a tag to compare

Loading

v1.8.0

Changed

Now allows instantiation of MuminDataset without having any Twitter bearer
token, neither as an explicit argument nor as an environment variable, which
is useful for pre-compiled datasets. If the dataset needs to be compiled then
a RuntimeError will be raised when calling the compile method.

Assets 2

Loading

mkarbo reacted with hooray emoji

All reactions

🎉 1 reaction

1 person reacted

v1.7.0

24 Mar 10:05

saattrupdan

Compare

Choose a tag to compare

Loading

v1.7.0

Added

Now allows setting twitter_bearer_token=None in the constructor of
MuminDataset, which uses the environment variable TWITTER_API_KEY
instead, which can be stored in a separate .env file. This is now the
default value of twitter_bearer_token.

Changed

Replaced DataFrame.append calls with pd.concat, as the former is
deprecated and will be removed from pandas in the future.

Assets 2

Loading

All reactions

v1.6.2

21 Mar 18:57

saattrupdan

Compare

Choose a tag to compare

Loading

v1.6.2

Fixed

Now removes claims that are only connected to deleted tweets when calling
to_dgl. This previously caused a bug that was due to a mismatch between
nodes in the dataset (which includes deleted ones) and nodes in the DGL graph
(which does not contain the deleted ones).

Assets 2

Loading

All reactions

v1.6.1

17 Mar 13:13

saattrupdan

Compare

Choose a tag to compare

Loading

v1.6.1

Fixed

Now correctly catches JSONDecodeError during rehydration.

Assets 2

Loading

All reactions

v1.6.0

10 Mar 11:52

saattrupdan

Compare

Choose a tag to compare

Loading

v1.6.0

Changed the download link from Git-LFS to the official data.bris data
repository, with URI https://doi.org/10.5523/bris.23yv276we2mll25fjakkfim2ml.

Assets 2

Loading

All reactions

v1.5.0

19 Feb 20:04

saattrupdan

Compare

Choose a tag to compare

Loading

v1.5.0

Changed

Now using dicts rather than Series in to_dgl. This improved the wall time
from 1.5 hours to 2 seconds!

Fixed

There was a bug in the call to dgl.data.utils.load_graphs causing
load_dgl_graph to fail. This is fixed now.

Assets 2

Loading

All reactions

v1.4.1

19 Feb 17:03

saattrupdan

Compare

Choose a tag to compare

Loading

v1.4.1

Changed

Now only saves dataset at the end of add_embeddings if any embeddings were
added.

Assets 2

Loading

All reactions

v1.4.0

19 Feb 16:16

saattrupdan

Compare

Choose a tag to compare

Loading

v1.4.0

Added

The to_dgl method is now being parallelised, speeding export up
significantly.
Added convenience functions save_dgl_graph and load_dgl_graph, which
stores the Boolean train/val/test masks as unsigned 8-bit integers and
handles the conversion. Using the dgl-native save_graphs and
load_graphs causes an error, as it cannot handle Boolean tensors. These two
convenience functions can be loaded simply as
from mumin import save_dgl_graph, load_dgl_graph.

Assets 2

Loading

All reactions

Previous 1 2 3 4 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.