-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final launch TODOs #12
Comments
Thanks @Muennighoff! This is exciting!
I think (2) is some of the older results that accidentally got uploaded -- nothing currently gets sent to "side_by_side". I'll nuke the results though and refresh with the new format. To confirm this is what we want:
I'll remove
We could also do a subset if we want, although two is a fine start.
The GPU stuff seems concerning. Is there much documentation on how these Spaces with GPUZero work behind the scenes? Does it auto-provision AWS instances? Also worth including in this checklist the Git LFS issues (can't store the results as |
I think Battle & Side by Side are separate no? E.g. clustering_battle & clustering_side_by_side? We can also merge them if you think it is better. We will probably have to nuke the results once more right before the launch as there will still be a bit of testing we'll do I assume.
Will look into it & let you know 👍
Some details are here: https://huggingface.co/spaces/zero-gpu-explorers/README & it is attached to our space so you can play with it if you want to.
Oh I didn't notice this, sorry! Please feel free to merge the Git LFS PR & send me your bank account via Slack so I can wire you what you paid! |
Yeah this was confusing for me for a bit too. The Would we rather send them to separate results sections based on if the name is anonymous? I don't have a preference either way and now is a good time to switch if we want.
Nah it's really minor and it was me who forgot about the Git-LFS bandwidth. Really it's so inexpensive I was considering if we should just get a $60 sponsoring donation on Github for a year from some company and not worry about it... Could be an alternative if we want to keep Git LFS as it really is a trivial amount to be sponsored. Something to keep in mind. |
I think reusing the vote_last_response func is fine & we can separate them after? Is it a mistake with https://huggingface.co/datasets/mteb/arena-results then that it currently has side_by_side and battle? I do think it's nice to have them separated there into side_by_side & battle. I think that we should start with only anonymous results counting and later consider adding non-anon depending on the traction.
Yes, given we spent $5K+ already on the indices this should be easily doable, lmk! |
I've made a new space that utilizes local GPUs and is blazing fast: https://huggingface.co/spaces/mteb/arena-tmp . Unfortunately, I think this will break the GitHub syncing from this repository. Instead, I think I will have to just manually pull from the repo frequently. Are you okay with that @orionw ? Curious about your thoughts, too. If fine with you, maybe you can stop the syncing & rename the space. Or maybe we can sync it with my local clone somehow? Maybe just a crontab or something so I regularly push the results & sync the space 🤔 Also if someone has any cool idea for what we could do with the Zero GPU allocation, feel free to propose something / make use of them! |
🔥🔥🔥 Thanks @Muennighoff, works great!
Of course! Bummer the ZeroGPUs didn't work :/ Feel free to rename the other space and move this one in. We can stop the syncing by removing the .github actions file. |
Did you already implement this? Looks like I see it, thanks! I've updated the corpus without the newlines, you can see them here (although I guess you can't see without loading manually haha). Here's a screenshot:
I've separated them and made a PR in #13. The data issue will take some coordination, perhaps we do that over messaging (some notes in the PR). |
I removed the
Not yet! They are just the samples from Wikipedia atm 😅
Looks great! Are you confident it is better & good as is i.e. I can go ahead and recreate all arXiv indices? |
Sorry I'm a bit late to the party! a) occasional
|
I can take a look this weekend on adding bm25s (#6 / embeddings-benchmark/mteb#990) |
Yes I think they are usually the first ones in a while. Indeed maybe we should try increase the minimum replica count and see if it goes away.
That looks about right. We've spent around 6K USD atm.
Amazing! There's also results in the paper Table 3: https://arxiv.org/pdf/2407.03618 ; I think we are using the 1.5 0.75 Lucene variant, so you can just pick the results from there and maybe put them in a result file like e.g. here, add them to the results repo so they will show up on the LB, and then we also add the average score here. |
Yup, did some validation and it all looks great! Make sure it's the new one though here which has the whitespace normalized from abstract and titles: https://huggingface.co/datasets/orionweller/arxiv_7_2_24 |
Nice will recreate those indices. Also fixed the broken performance of SFR & Grit via the changes here; screenshot attached. The problem was that no query instruction was being used, which really matters a lot for them. |
For the random sample, I think we want to make it such that it is also random whether it is arXiv or Wikipedia? I.e. you press random sample and then a random corpus is selected & an accompanying random sample. Similar to how for Clustering the random sample button also changes the number of clusters. Currently, you select a corpus and then it will provide a random sample (which are all wikipedia-targeted nq samples atm i.e. still missing arxiv). |
@orionw can you create a subsampled version of Stack Exchange that is similar in size to Wikipedia & arXiv? (they are about the same). Maybe we can be smart about the subsampling & remove exchanges that are likely less interesting to our expected users 🤔 I think having Stack Exchange is would be great - it is a really nice & different source. After a few weeks we can think about upsizing it / adding other corpora depending on user feedback & monitoring corpus usage. |
@orionw it seems like the StackExchange corpus often contains multiple answers? Maybe only keeping the top answer would make it much smaller likely suffice. I think most voters won't have the patience to read through all answers anyways.
|
Also it seems like there are no titles for the Stack Exchange corpus --- Shouldn't we be able to get the question titles? |
Also if we could add the Exchange that each sample is from that'd be great I think. Could put it in the title or sth; would likely improve results a lot. |
For sure, these are good suggestions @Muennighoff! Probably the weirdness of the data comes from how RedPajamas/Olmo did the formatting. There’s probably an original sample that they got it from if we want to do more modifications. Unfortunately I won’t have cycles for this until Friday, currently at the SIGIR conference. Sorry! If that’s too late feel free to have someone else take a look at it. |
@Muennighoff added the google API here |
I think that's fine! We can delay launch to next week. Also, some of the Wikipedia passages are really long and seem to go beyond sections, e.g. I got the below corresponding to this Wikipedia page. I think this is problematic as people don't want to spend a few mins to vote but be quick thus the texts need to be short. Don't we want to
|
Everything that should be really easy to do with Wikipedia is unfortunately surprisingly difficult :( Their main way of providing data dumps is in a special format called MediaWiki which has caused so much pain that the main parser is called
This is solvable, depending on what you mean by passages. If you mean the word count, we currently allow up to 500 words. This is 426 ish words. We can definitely re-run and reduce the passage size if you think it's too big. Do you have a size in mind @Muennighoff?
These are un-intuitively actually really hard. It's not possible to easily tell which is a section vs which is a list item or just a short sentence - they come in formats with newlines only and sections are often only demarcated by double newlines which also exist in other places. I can separate on double newlines or short sentence heuristics but it will definitely break on list items and other edge cases and lead to a bunch of other problems. Similarly, for the trailing ending header we also have no way of telling if it's a list item, a short sentence, or a heading.
I agree, but worth noting that this is a trade-off, if we do shorter texts they won't be attached to their headings in as many cases as we will have more chunks. Existing retrieval datasets use Wiki (DPR, KILT), do 100 word passages, and aren't connected to the headings unless it is the first heading. Overall unless we have a Wikipedia parsing expert this is a really hard problem (weeks to months to build IMO and I don't have bandwidth to spend that much time on a parser). I would suggest we adopt DPR's/KILT's Wikipedia set if we don't want to be deal with these issues and just be okay with the short context and lack of contextualization, otherwise we will have to deal with the other end of the problem which is the multiple sections cases we have here. For the trailing section headers, I think we will have that in any option we choose. If we wait another few months, one of my labmates will have a version that maintains the hierarchy with headers for MegaWika v2 and we can adapt it but otherwise we are stuck with many bad heuristic options :/ |
@Muennighoff feel free to check out the KILT version here: https://huggingface.co/datasets/facebook/kilt_wikipedia |
Great explanation - And the current MegaWika is also worse? Else I see why it makes sense to stick with the existing setup, but maybe let's reduce to 200 words, what do you think? You can try a few samples in the arena if you want. I think they are a bit too long atm. arXiv meanwhile is good length-wise I think. Also cc @isaac-chung @KennethEnevoldsen for opinions. A few examples:
|
Hmm for me it is not the length in the examples but rather the formatting: E.g. why denote the passage/title (it should be self evident for the structure) e.g.
Works just fine (for me at least). Additionally, I would also take a look at the oddities:
Which seems like a faulty scrape/parsing. This makes it hard to reason about the quality of the retrieval as the document itself seems to be low quality. I would probably either wait for MegaWiki v2 or use KILT re length: I probably do agree that the retrieved document is too long as well, but I am unsure whether a better structure would solve it. |
Fixed the Connection reset error by just retrying; usually after 1 retry it works fine:
Also added arXiv random samples, lmk if thoughts. So only Stack Exchange corpus, fixing the result files & some final indices left I think 🙌 |
I've pushed all changes to main - I've added a new TODO which is fixing the scrolling bug in the below video. Are you getting it too? Would be amazing if someone has bandwidth to take a look, maybe @isaac-chung 🙌 scrollbug.mov |
@Muennighoff yeah unfortunately v1 was when we realized that a standard scrape didn't preserve the hierarchical structure of sections. Sam (the first author) has spend a lot of time getting it ready, although also adding a lot of things that not helpful to us here, like better citation scraping and better multilingual Wikipedia support.
I'll start one processing with 200 words and upload it.
I'll also upload a version of KILT we can look at in the HF viewer for comparison.
@KennethEnevoldsen do you mind clarifying? Are you referring to sections that are empty? Those are due to stripping out tables since tables look really weird in text Wikipedia form (they have a fun Wiki table template). I'm assuming we don't want to deal with table reasoning for this project. |
Sorry, coming back to the StackExchange now: So the Dolma one (taken from RedPajamas here) looks like:
We can grab the We can also definitely just take the first answer (with the assumption that was the best one) agree that will be better and keep the text more concise.
What do you mean by this @Muennighoff? Isn't that the "Q: What is the difference between Intel and PPC? What is the hardware and software differences between Intel and PPC Macs?" portion? |
Yes I think that'd be good!
Yes that's probably enough information, but maybe let's make it a separate column in the HF dataset so we can still revert to the older one if we want to (without having to recreate the HF dataset).
Oh yes seems like they just concatenate the question title and the question text. It may look a bit nicer if we could separate them but else also fine as is. |
How does this stackexchange one look for the We could filter based on the stackexchange source?
For the Wiki apparently it's a lot slower to process smaller chunks so the 200 word version of what I had before is almost done, but the KILT version looks like this. It has some weird quirks (section and bullet tags like The cons to using this dataset are:
The pros are:
Thoughts on this tradeoff @Muennighoff @KennethEnevoldsen? I'll post the 200 word version that's from the most recent Wikipedia using our creation method tomorrow when it's done running overnight. |
Amazing work! StackExchange:
tmp = '''Q: How to convert this to /summon? Well i need some help with this command /summon Item ~ ~ ~ {Item:{id:wheat_seeds,Count:1}},{HideFlags:16,display:{Name:"Dirt"},CanPlaceOn:["minecraft:stone","minecraft:grass","minecraft:dirt","minecraft:cobblestone","minecraft:planks","minecraft:sapling","minecraft:flowing_water","minecraft:water","minecraft:flowing_lava","minecraft:lava","minecraft:sand","minecraft:gravel","minecraft:gold_ore","minecraft:iron_ore","minecraft:coal_ore","minecraft:log","minecraft:leaves","minecraft:sponge","minecraft:glass","minecraft:lapis_ore","minecraft:dispenser","minecraft:sandstone","minecraft:noteblock","minecraft:bed","minecraft:golden_rail","minecraft:detector_rail","minecraft:sticky_piston","minecraft:web","minecraft:tallgrass","minecraft:deadbush","minecraft:piston","minecraft:piston_head","minecraft:wool","minecraft:yellow_flower","minecraft:red_flower","minecraft:brown_mushroom","minecraft:red_mushroom","minecraft:gold_block,","minecraft:iron_block","minecraft:double_stone_slab","minecraft:stone_slab","minecraft:brick_block","minecraft:tnt","minecraft:bookshelf","minecraft:mossy_cobblestone","minecraft:obsidian","minecraft:torch","minecraft:fire","minecraft:mob_spawner","minecraft:oak_stairs","minecraft:chest","minecraft:redstone_wire","minecraft:diamond_ore","minecraft:diamond_block","minecraft:crafting_table","minecraft:wheat","minecraft:farmland","minecraft:furnace","minecraft:lit_furnace,","minecraft:standing_sign","minecraft:wooden_door","minecraft:ladder","minecraft:rail","minecraft:stone_stairs","minecraft:wall_sign","minecraft:lever","minecraft:stone_pressure_plate","minecraft:iron_door","minecraft:wooden_pressure_plate","minecraft:redstone_ore","minecraft:lit_redstone_ore","minecraft:unlit_redstone_torch","minecraft:redstone_torch","minecraft:stone_button","minecraft:snow_layer","minecraft:ice","minecraft:snow","minecraft:cactus","minecraft:clay","minecraft:reeds","minecraft:jukebox","minecraft:fence","minecraft:pumpkin","minecraft:netherrack","minecraft:soul_sand","minecraft:glowstone","minecraft:portal","minecraft:lit_pumpkin","minecraft:cake","minecraft:unpowered_repeater","minecraft:powered_repeater","minecraft:stained_glass","minecraft:trapdoor","minecraft:monster_egg","minecraft:stonebrick","minecraft:brown_mushroom_block","minecraft:iron_bars","minecraft:glass_pane","minecraft:melon_block","minecraft:pumpkin_stem","minecraft:melon_stem","minecraft:vines","minecraft:fence_gate","minecraft:brick_stairs","minecraft:stone_brick_stairs","minecraft:mycelium","minecraft:waterlily","minecraft:nether_brick","minecraft:nether_brick_fence","minecraft:nether_brick_stairs","minecraft:nether_wart","minecraft:enchanting","minecraft:brewing_stand","minecraft:cauldron","minecraft:end_portal","minecraft:end_portal_frame","minecraft:end_stone","minecraft:dragon_egg","minecraft:redstone_lamp","minecraft:lit_redstone_lamp","minecraft:double_wooden_slab","minecraft:wooden_slab","minecraft:cocoa","minecraft:sandstone_stairs","minecraft:emerald_ore","minecraft:ender_chest","minecraft:tripwire_hook","minecraft:emerald_block","minecraft:spruce_stairs","minecraft:birch_stairs","minecraft:jungle_stairs","minecraft:command_block","minecraft:beacon","minecraft:cobblestone_wall","minecraft:flower_pot","minecraft:carrots","minecraft:potatoes","minecraft:wooden_button","minecraft:skull","minecraft:anvil","minecraft:trapped_chest","minecraft:light_weighted_pressure_plate","minecraft:heavy_weighted_pressure_plate","minecraft:unpowered_comparator","minecraft:powered_comparator","minecraft:daylight_detector","minecraft:redstone_block","minecraft:quartz_ore","minecraft:hopper","minecraft:quartz_block","minecraft:quartz_stairs","minecraft:activator_rail","minecraft:dropper","minecraft:stained_hardened_clay","minecraft:stained_glass_pane","minecraft:leaves2","minecraft:log2","minecraft:acacia_stairs","minecraft:dark_oak_stairs","minecraft:slime","minecraft:barrier","minecraft:iron_trapdoor","minecraft:prismarine","minecraft:sea_lantern","minecraft:hay_block","minecraft:carpet","minecraft:hardened_clay","minecraft:coal_block","minecraft:packed_ice","minecraft:double_plant","minecraft:standing_banner","minecraft:wall_banner","minecraft:daylight_detector_inverted","minecraft:red_sandstone","minecraft:double_sandstone_stairs","minecraft:double_stone_slab2","minecraft:spruce_fence_gate","minecraft:birch_fence_gate","minecraft:jungle_fence_gate","minecraft:dark_oak_fence_gate","minecraft:acacia_fence_gate","minecraft:spruce_fence","minecraft:birch_fence","minecraft:jungle_fence","minecraft:dark_oak_fence","minecraft:acacia_fence","minecraft:spruce_door","minecraft:birch_door","minecraft:jungle_door","minecraft:acacia_door","minecraft:dark_oak_door","minecraft:end_rod","minecraft:chorus_plant","minecraft:chorus_flower","minecraft:purpur_block","minecraft:purpur_pillar","minecraft:purpur_stairs","minecraft:purpur_double_slab","minecraft:purpur_slab","minecraft:end_bricks","minecraft:beetroots","minecraft:grass_path","minecraft:end_gateway","minecraft:frosted_ice","minecraft:magma","minecraft:nether_wart_block","minecraft:red_nether_brick","minecraft:bone_block","minecraft:observer","minecraft:white_shulker_box","minecraft:orange_shulker_box","minecraft:magenta_shulker_box","minecraft:light_blue_shulker_box","minecraft:yellow_shulker_box","minecraft:lime_shulker_box","minecraft:pink_shulker_box","minecraft:gray_shulker_box","minecraft:silver_shulker_box","minecraft:cyan_shulker_box","minecraft:purple_shulker_box","minecraft:blue_shulker_box","minecraft:brown_shulker_box","minecraft:green_shulker_box","minecraft:red_shulker_box","minecraft:black_shulker_box","minecraft:white_glazed_terracotta","minecraft:orange_glazed_terracotta","minecraft:magenta_glazed_terracotta","minecraft:light_blue_glazed_terracotta","minecraft:yellow_glazed_terracotta","minecraft:lime_glazed_terracotta","minecraft:pink_glazed_terracotta","minecraft:gray_glazed_terracotta","minecraft:light_gray_glazed_terracotta","minecraft:cyan_glazed_terracotta","minecraft:purple_glazed_terracotta","minecraft:blue_glazed_terracotta","minecraft:brown_glazed_terracotta","minecraft:green_glazed_terracotta","minecraft:red_glazed_terracotta","minecraft:black_glazed_terracotta","minecraft:concrete","minecraft:concrete_powder","minecraft:structure_block","FakeBlock","bedrock"]} I want this command: /replaceitem entity @A[score_block4_min=1] slot.weapon.mainhand minecraft:dirt 1 0 {HideFlags:16,display:{Name:"Dirt"},CanPlaceOn:["minecraft:stone","minecraft:grass","minecraft:dirt","minecraft:cobblestone","minecraft:planks","minecraft:sapling","minecraft:flowing_water","minecraft:water","minecraft:flowing_lava","minecraft:lava","minecraft:sand","minecraft:gravel","minecraft:gold_ore","minecraft:iron_ore","minecraft:coal_ore","minecraft:log","minecraft:leaves","minecraft:sponge","minecraft:glass","minecraft:lapis_ore","minecraft:dispenser","minecraft:sandstone","minecraft:noteblock","minecraft:bed","minecraft:golden_rail","minecraft:detector_rail","minecraft:sticky_piston","minecraft:web","minecraft:tallgrass","minecraft:deadbush","minecraft:piston","minecraft:piston_head","minecraft:wool","minecraft:yellow_flower","minecraft:red_flower","minecraft:brown_mushroom","minecraft:red_mushroom","minecraft:gold_block,","minecraft:iron_block","minecraft:double_stone_slab","minecraft:stone_slab","minecraft:brick_block","minecraft:tnt","minecraft:bookshelf","minecraft:mossy_cobblestone","minecraft:obsidian","minecraft:torch","minecraft:fire","minecraft:mob_spawner","minecraft:oak_stairs","minecraft:chest","minecraft:redstone_wire","minecraft:diamond_ore","minecraft:diamond_block","minecraft:crafting_table","minecraft:wheat","minecraft:farmland","minecraft:furnace","minecraft:lit_furnace,","minecraft:standing_sign","minecraft:wooden_door","minecraft:ladder","minecraft:rail","minecraft:stone_stairs","minecraft:wall_sign","minecraft:lever","minecraft:stone_pressure_plate","minecraft:iron_door","minecraft:wooden_pressure_plate","minecraft:redstone_ore","minecraft:lit_redstone_ore","minecraft:unlit_redstone_torch","minecraft:redstone_torch","minecraft:stone_button","minecraft:snow_layer","minecraft:ice","minecraft:snow","minecraft:cactus","minecraft:clay","minecraft:reeds","minecraft:jukebox","minecraft:fence","minecraft:pumpkin","minecraft:netherrack","minecraft:soul_sand","minecraft:glowstone","minecraft:portal","minecraft:lit_pumpkin","minecraft:cake","minecraft:unpowered_repeater","minecraft:powered_repeater","minecraft:stained_glass","minecraft:trapdoor","minecraft:monster_egg","minecraft:stonebrick","minecraft:brown_mushroom_block","minecraft:iron_bars","minecraft:glass_pane","minecraft:melon_block","minecraft:pumpkin_stem","minecraft:melon_stem","minecraft:vines","minecraft:fence_gate","minecraft:brick_stairs","minecraft:stone_brick_stairs","minecraft:mycelium","minecraft:waterlily","minecraft:nether_brick","minecraft:nether_brick_fence","minecraft:nether_brick_stairs","minecraft:nether_wart","minecraft:enchanting","minecraft:brewing_stand","minecraft:cauldron","minecraft:end_portal","minecraft:end_portal_frame","minecraft:end_stone","minecraft:dragon_egg","minecraft:redstone_lamp","minecraft:lit_redstone_lamp","minecraft:double_wooden_slab","minecraft:wooden_slab","minecraft:cocoa","minecraft:sandstone_stairs","minecraft:emerald_ore","minecraft:ender_chest","minecraft:tripwire_hook","minecraft:emerald_block","minecraft:spruce_stairs","minecraft:birch_stairs","minecraft:jungle_stairs","minecraft:command_block","minecraft:beacon","minecraft:cobblestone_wall","minecraft:flower_pot","minecraft:carrots","minecraft:potatoes","minecraft:wooden_button","minecraft:skull","minecraft:anvil","minecraft:trapped_chest","minecraft:light_weighted_pressure_plate","minecraft:heavy_weighted_pressure_plate","minecraft:unpowered_comparator","minecraft:powered_comparator","minecraft:daylight_detector","minecraft:redstone_block","minecraft:quartz_ore","minecraft:hopper","minecraft:quartz_block","minecraft:quartz_stairs","minecraft:activator_rail","minecraft:dropper","minecraft:stained_hardened_clay","minecraft:stained_glass_pane","minecraft:leaves2","minecraft:log2","minecraft:acacia_stairs","minecraft:dark_oak_stairs","minecraft:slime","minecraft:barrier","minecraft:iron_trapdoor","minecraft:prismarine","minecraft:sea_lantern","minecraft:hay_block","minecraft:carpet","minecraft:hardened_clay","minecraft:coal_block","minecraft:packed_ice","minecraft:double_plant","minecraft:standing_banner","minecraft:wall_banner","minecraft:daylight_detector_inverted","minecraft:red_sandstone","minecraft:double_sandstone_stairs","minecraft:double_stone_slab2","minecraft:spruce_fence_gate","minecraft:birch_fence_gate","minecraft:jungle_fence_gate","minecraft:dark_oak_fence_gate","minecraft:acacia_fence_gate","minecraft:spruce_fence","minecraft:birch_fence","minecraft:jungle_fence","minecraft:dark_oak_fence","minecraft:acacia_fence","minecraft:spruce_door","minecraft:birch_door","minecraft:jungle_door","minecraft:acacia_door","minecraft:dark_oak_door","minecraft:end_rod","minecraft:chorus_plant","minecraft:chorus_flower","minecraft:purpur_block","minecraft:purpur_pillar","minecraft:purpur_stairs","minecraft:purpur_double_slab","minecraft:purpur_slab","minecraft:end_bricks","minecraft:beetroots","minecraft:grass_path","minecraft:end_gateway","minecraft:frosted_ice","minecraft:magma","minecraft:nether_wart_block","minecraft:red_nether_brick","minecraft:bone_block","minecraft:observer","minecraft:white_shulker_box","minecraft:orange_shulker_box","minecraft:magenta_shulker_box","minecraft:light_blue_shulker_box","minecraft:yellow_shulker_box","minecraft:lime_shulker_box","minecraft:pink_shulker_box","minecraft:gray_shulker_box","minecraft:silver_shulker_box","minecraft:cyan_shulker_box","minecraft:purple_shulker_box","minecraft:blue_shulker_box","minecraft:brown_shulker_box","minecraft:green_shulker_box","minecraft:red_shulker_box","minecraft:black_shulker_box","minecraft:white_glazed_terracotta","minecraft:orange_glazed_terracotta","minecraft:magenta_glazed_terracotta","minecraft:light_blue_glazed_terracotta","minecraft:yellow_glazed_terracotta","minecraft:lime_glazed_terracotta","minecraft:pink_glazed_terracotta","minecraft:gray_glazed_terracotta","minecraft:light_gray_glazed_terracotta","minecraft:cyan_glazed_terracotta","minecraft:purple_glazed_terracotta","minecraft:blue_glazed_terracotta","minecraft:brown_glazed_terracotta","minecraft:green_glazed_terracotta","minecraft:red_glazed_terracotta","minecraft:black_glazed_terracotta","minecraft:concrete","minecraft:concrete_powder","minecraft:structure_block","FakeBlock","bedrock"]} To be a summon command! So i tried by doing this /summon Item ~ ~ ~ {Item:{id:wheat_seeds,Count:1}},{HideFlags:16,display:{Name:"Dirt"},CanPlaceOn:["minecraft:stone","minecraft:grass","minecraft:dirt","minecraft:cobblestone","minecraft:planks","minecraft:sapling","minecraft:flowing_water","minecraft:water","minecraft:flowing_lava","minecraft:lava","minecraft:sand","minecraft:gravel","minecraft:gold_ore","minecraft:iron_ore","minecraft:coal_ore","minecraft:log","minecraft:leaves","minecraft:sponge","minecraft:glass","minecraft:lapis_ore","minecraft:dispenser","minecraft:sandstone","minecraft:noteblock","minecraft:bed","minecraft:golden_rail","minecraft:detector_rail","minecraft:sticky_piston","minecraft:web","minecraft:tallgrass","minecraft:deadbush","minecraft:piston","minecraft:piston_head","minecraft:wool","minecraft:yellow_flower","minecraft:red_flower","minecraft:brown_mushroom","minecraft:red_mushroom","minecraft:gold_block,","minecraft:iron_block","minecraft:double_stone_slab","minecraft:stone_slab","minecraft:brick_block","minecraft:tnt","minecraft:bookshelf","minecraft:mossy_cobblestone","minecraft:obsidian","minecraft:torch","minecraft:fire","minecraft:mob_spawner","minecraft:oak_stairs","minecraft:chest","minecraft:redstone_wire","minecraft:diamond_ore","minecraft:diamond_block","minecraft:crafting_table","minecraft:wheat","minecraft:farmland","minecraft:furnace","minecraft:lit_furnace,","minecraft:standing_sign","minecraft:wooden_door","minecraft:ladder","minecraft:rail","minecraft:stone_stairs","minecraft:wall_sign","minecraft:lever","minecraft:stone_pressure_plate","minecraft:iron_door","minecraft:wooden_pressure_plate","minecraft:redstone_ore","minecraft:lit_redstone_ore","minecraft:unlit_redstone_torch","minecraft:redstone_torch","minecraft:stone_button","minecraft:snow_layer","minecraft:ice","minecraft:snow","minecraft:cactus","minecraft:clay","minecraft:reeds","minecraft:jukebox","minecraft:fence","minecraft:pumpkin","minecraft:netherrack","minecraft:soul_sand","minecraft:glowstone","minecraft:portal","minecraft:lit_pumpkin","minecraft:cake","minecraft:unpowered_repeater","minecraft:powered_repeater","minecraft:stained_glass","minecraft:trapdoor","minecraft:monster_egg","minecraft:stonebrick","minecraft:brown_mushroom_block","minecraft:iron_bars","minecraft:glass_pane","minecraft:melon_block","minecraft:pumpkin_stem","minecraft:melon_stem","minecraft:vines","minecraft:fence_gate","minecraft:brick_stairs","minecraft:stone_brick_stairs","minecraft:mycelium","minecraft:waterlily","minecraft:nether_brick","minecraft:nether_brick_fence","minecraft:nether_brick_stairs","minecraft:nether_wart","minecraft:enchanting","minecraft:brewing_stand","minecraft:cauldron","minecraft:end_portal","minecraft:end_portal_frame","minecraft:end_stone","minecraft:dragon_egg","minecraft:redstone_lamp","minecraft:lit_redstone_lamp","minecraft:double_wooden_slab","minecraft:wooden_slab","minecraft:cocoa","minecraft:sandstone_stairs","minecraft:emerald_ore","minecraft:ender_chest","minecraft:tripwire_hook","minecraft:emerald_block","minecraft:spruce_stairs","minecraft:birch_stairs","minecraft:jungle_stairs","minecraft:command_block","minecraft:beacon","minecraft:cobblestone_wall","minecraft:flower_pot","minecraft:carrots","minecraft:potatoes","minecraft:wooden_button","minecraft:skull","minecraft:anvil","minecraft:trapped_chest","minecraft:light_weighted_pressure_plate","minecraft:heavy_weighted_pressure_plate","minecraft:unpowered_comparator","minecraft:powered_comparator","minecraft:daylight_detector","minecraft:redstone_block","minecraft:quartz_ore","minecraft:hopper","minecraft:quartz_block","minecraft:quartz_stairs","minecraft:activator_rail","minecraft:dropper","minecraft:stained_hardened_clay","minecraft:stained_glass_pane","minecraft:leaves2","minecraft:log2","minecraft:acacia_stairs","minecraft:dark_oak_stairs","minecraft:slime","minecraft:barrier","minecraft:iron_trapdoor","minecraft:prismarine","minecraft:sea_lantern","minecraft:hay_block","minecraft:carpet","minecraft:hardened_clay","minecraft:coal_block","minecraft:packed_ice","minecraft:double_plant","minecraft:standing_banner","minecraft:wall_banner","minecraft:daylight_detector_inverted","minecraft:red_sandstone","minecraft:double_sandstone_stairs","minecraft:double_stone_slab2","minecraft:spruce_fence_gate","minecraft:birch_fence_gate","minecraft:jungle_fence_gate","minecraft:dark_oak_fence_gate","minecraft:acacia_fence_gate","minecraft:spruce_fence","minecraft:birch_fence","minecraft:jungle_fence","minecraft:dark_oak_fence","minecraft:acacia_fence","minecraft:spruce_door","minecraft:birch_door","minecraft:jungle_door","minecraft:acacia_door","minecraft:dark_oak_door","minecraft:end_rod","minecraft:chorus_plant","minecraft:chorus_flower","minecraft:purpur_block","minecraft:purpur_pillar","minecraft:purpur_stairs","minecraft:purpur_double_slab","minecraft:purpur_slab","minecraft:end_bricks","minecraft:beetroots","minecraft:grass_path","minecraft:end_gateway","minecraft:frosted_ice","minecraft:magma","minecraft:nether_wart_block","minecraft:red_nether_brick","minecraft:bone_block","minecraft:observer","minecraft:white_shulker_box","minecraft:orange_shulker_box","minecraft:magenta_shulker_box","minecraft:light_blue_shulker_box","minecraft:yellow_shulker_box","minecraft:lime_shulker_box","minecraft:pink_shulker_box","minecraft:gray_shulker_box","minecraft:silver_shulker_box","minecraft:cyan_shulker_box","minecraft:purple_shulker_box","minecraft:blue_shulker_box","minecraft:brown_shulker_box","minecraft:green_shulker_box","minecraft:red_shulker_box","minecraft:black_shulker_box","minecraft:white_glazed_terracotta","minecraft:orange_glazed_terracotta","minecraft:magenta_glazed_terracotta","minecraft:light_blue_glazed_terracotta","minecraft:yellow_glazed_terracotta","minecraft:lime_glazed_terracotta","minecraft:pink_glazed_terracotta","minecraft:gray_glazed_terracotta","minecraft:light_gray_glazed_terracotta","minecraft:cyan_glazed_terracotta","minecraft:purple_glazed_terracotta","minecraft:blue_glazed_terracotta","minecraft:brown_glazed_terracotta","minecraft:green_glazed_terracotta","minecraft:red_glazed_terracotta","minecraft:black_glazed_terracotta","minecraft:concrete","minecraft:concrete_powder","minecraft:structure_block","FakeBlock","bedrock"]} might help: /summon Item ~ ~ ~ {Item:{id:wheat_seeds,Count:1}} /replaceitem entity @A[score_block4_min=1] slot.weapon.mainhand minecraft:dirt 1 0 {HideFlags:16,display:{Name:"Dirt"},CanPlaceOn:["minecraft:stone","minecraft:grass","minecraft:dirt","minecraft:cobblestone","minecraft:planks","minecraft:sapling","minecraft:flowing_water","minecraft:water","minecraft:flowing_lava","minecraft:lava","minecraft:sand","minecraft:gravel","minecraft:gold_ore","minecraft:iron_ore","minecraft:coal_ore","minecraft:log","minecraft:leaves","minecraft:sponge","minecraft:glass","minecraft:lapis_ore","minecraft:dispenser","minecraft:sandstone","minecraft:noteblock","minecraft:bed","minecraft:golden_rail","minecraft:detector_rail","minecraft:sticky_piston","minecraft:web","minecraft:tallgrass","minecraft:deadbush","minecraft:piston","minecraft:piston_head","minecraft:wool","minecraft:yellow_flower","minecraft:red_flower","minecraft:brown_mushroom","minecraft:red_mushroom","minecraft:gold_block,","minecraft:iron_block","minecraft:double_stone_slab","minecraft:stone_slab","minecraft:brick_block","minecraft:tnt","minecraft:bookshelf","minecraft:mossy_cobblestone","minecraft:obsidian","minecraft:torch","minecraft:fire","minecraft:mob_spawner","minecraft:oak_stairs","minecraft:chest","minecraft:redstone_wire","minecraft:diamond_ore","minecraft:diamond_block","minecraft:crafting_table","minecraft:wheat","minecraft:farmland","minecraft:furnace","minecraft:lit_furnace,","minecraft:standing_sign","minecraft:wooden_door","minecraft:ladder","minecraft:rail","minecraft:stone_stairs","minecraft:wall_sign","minecraft:lever","minecraft:stone_pressure_plate","minecraft:iron_door","minecraft:wooden_pressure_plate","minecraft:redstone_ore","minecraft:lit_redstone_ore","minecraft:unlit_redstone_torch","minecraft:redstone_torch","minecraft:stone_button","minecraft:snow_layer","minecraft:ice","minecraft:snow","minecraft:cactus","minecraft:clay","minecraft:reeds","minecraft:jukebox","minecraft:fence","minecraft:pumpkin","minecraft:netherrack","minecraft:soul_sand","minecraft:glowstone","minecraft:portal","minecraft:lit_pumpkin","minecraft:cake","minecraft:unpowered_repeater","minecraft:powered_repeater","minecraft:stained_glass","minecraft:trapdoor","minecraft:monster_egg","minecraft:stonebrick","minecraft:brown_mushroom_block","minecraft:iron_bars","minecraft:glass_pane","minecraft:melon_block","minecraft:pumpkin_stem","minecraft:melon_stem","minecraft:vines","minecraft:fence_gate","minecraft:brick_stairs","minecraft:stone_brick_stairs","minecraft:mycelium","minecraft:waterlily","minecraft:nether_brick","minecraft:nether_brick_fence","minecraft:nether_brick_stairs","minecraft:nether_wart","minecraft:enchanting","minecraft:brewing_stand","minecraft:cauldron","minecraft:end_portal","minecraft:end_portal_frame","minecraft:end_stone","minecraft:dragon_egg","minecraft:redstone_lamp","minecraft:lit_redstone_lamp","minecraft:double_wooden_slab","minecraft:wooden_slab","minecraft:cocoa","minecraft:sandstone_stairs","minecraft:emerald_ore","minecraft:ender_chest","minecraft:tripwire_hook","minecraft:emerald_block","minecraft:spruce_stairs","minecraft:birch_stairs","minecraft:jungle_stairs","minecraft:command_block","minecraft:beacon","minecraft:cobblestone_wall","minecraft:flower_pot","minecraft:carrots","minecraft:potatoes","minecraft:wooden_button","minecraft:skull","minecraft:anvil","minecraft:trapped_chest","minecraft:light_weighted_pressure_plate","minecraft:heavy_weighted_pressure_plate","minecraft:unpowered_comparator","minecraft:powered_comparator","minecraft:daylight_detector","minecraft:redstone_block","minecraft:quartz_ore","minecraft:hopper","minecraft:quartz_block","minecraft:quartz_stairs","minecraft:activator_rail","minecraft:dropper","minecraft:stained_hardened_clay","minecraft:stained_glass_pane","minecraft:leaves2","minecraft:log2","minecraft:acacia_stairs","minecraft:dark_oak_stairs","minecraft:slime","minecraft:barrier","minecraft:iron_trapdoor","minecraft:prismarine","minecraft:sea_lantern","minecraft:hay_block","minecraft:carpet","minecraft:hardened_clay","minecraft:coal_block","minecraft:packed_ice","minecraft:double_plant","minecraft:standing_banner","minecraft:wall_banner","minecraft:daylight_detector_inverted","minecraft:red_sandstone","minecraft:double_sandstone_stairs","minecraft:double_stone_slab2","minecraft:spruce_fence_gate","minecraft:birch_fence_gate","minecraft:jungle_fence_gate","minecraft:dark_oak_fence_gate","minecraft:acacia_fence_gate","minecraft:spruce_fence","minecraft:birch_fence","minecraft:jungle_fence","minecraft:dark_oak_fence","minecraft:acacia_fence","minecraft:spruce_door","minecraft:birch_door","minecraft:jungle_door","minecraft:acacia_door","minecraft:dark_oak_door","minecraft:end_rod","minecraft:chorus_plant","minecraft:chorus_flower","minecraft:purpur_block","minecraft:purpur_pillar","minecraft:purpur_stairs","minecraft:purpur_double_slab","minecraft:purpur_slab","minecraft:end_bricks","minecraft:beetroots","minecraft:grass_path","minecraft:end_gateway","minecraft:frosted_ice","minecraft:magma","minecraft:nether_wart_block","minecraft:red_nether_brick","minecraft:bone_block","minecraft:observer","minecraft:white_shulker_box","minecraft:orange_shulker_box","minecraft:magenta_shulker_box","minecraft:light_blue_shulker_box","minecraft:yellow_shulker_box","minecraft:lime_shulker_box","minecraft:pink_shulker_box","minecraft:gray_shulker_box","minecraft:silver_shulker_box","minecraft:cyan_shulker_box","minecraft:purple_shulker_box","minecraft:blue_shulker_box","minecraft:brown_shulker_box","minecraft:green_shulker_box","minecraft:red_shulker_box","minecraft:black_shulker_box","minecraft:white_glazed_terracotta","minecraft:orange_glazed_terracotta","minecraft:magenta_glazed_terracotta","minecraft:light_blue_glazed_terracotta","minecraft:yellow_glazed_terracotta","minecraft:lime_glazed_terracotta","minecraft:pink_glazed_terracotta","minecraft:gray_glazed_terracotta","minecraft:light_gray_glazed_terracotta","minecraft:cyan_glazed_terracotta","minecraft:purple_glazed_terracotta","minecraft:blue_glazed_terracotta","minecraft:brown_glazed_terracotta","minecraft:green_glazed_terracotta","minecraft:red_glazed_terracotta","minecraft:black_glazed_terracotta","minecraft:concrete","minecraft:concrete_powder","minecraft:structure_block","FakeBlock","bedrock"]} Okay let me clerify. I just want to be able to summon a block with a CanPlaceOn tag. Is it possible? A: You have to use the tag compound in order to add CanPlaceOn and HideFlags as they are part of the item's dataTag. Also, in the CanPlaceOn list you had FakeBlock which I removed and bedrock which I changed to minecraft:bedrock. I am guessing the summon seeds command you tried came from the summon item command you posted second. I didn't think you actually wanted seeds which were named dirt so this command will summon a dirt block with the data: /summon Item ~ ~ ~ {Item:{id:"minecraft:dirt",Count:1b,tag:{HideFlags:16,CanPlaceOn:["minecraft:stone","minecraft:grass","minecraft:dirt","minecraft:cobblestone","minecraft:planks","minecraft:sapling","minecraft:flowing_water","minecraft:water","minecraft:flowing_lava","minecraft:lava","minecraft:sand","minecraft:gravel","minecraft:gold_ore","minecraft:iron_ore","minecraft:coal_ore","minecraft:log","minecraft:leaves","minecraft:sponge","minecraft:glass","minecraft:lapis_ore","minecraft:dispenser","minecraft:sandstone","minecraft:noteblock","minecraft:bed","minecraft:golden_rail","minecraft:detector_rail","minecraft:sticky_piston","minecraft:web","minecraft:tallgrass","minecraft:deadbush","minecraft:piston","minecraft:piston_head","minecraft:wool","minecraft:yellow_flower","minecraft:red_flower","minecraft:brown_mushroom","minecraft:red_mushroom","minecraft:gold_block,","minecraft:iron_block","minecraft:double_stone_slab","minecraft:stone_slab","minecraft:brick_block","minecraft:tnt","minecraft:bookshelf","minecraft:mossy_cobblestone","minecraft:obsidian","minecraft:torch","minecraft:fire","minecraft:mob_spawner","minecraft:oak_stairs","minecraft:chest","minecraft:redstone_wire","minecraft:diamond_ore","minecraft:diamond_block","minecraft:crafting_table","minecraft:wheat","minecraft:farmland","minecraft:furnace","minecraft:lit_furnace,","minecraft:standing_sign","minecraft:wooden_door","minecraft:ladder","minecraft:rail","minecraft:stone_stairs","minecraft:wall_sign","minecraft:lever","minecraft:stone_pressure_plate","minecraft:iron_door","minecraft:wooden_pressure_plate","minecraft:redstone_ore","minecraft:lit_redstone_ore","minecraft:unlit_redstone_torch","minecraft:redstone_torch","minecraft:stone_button","minecraft:snow_layer","minecraft:ice","minecraft:snow","minecraft:cactus","minecraft:clay","minecraft:reeds","minecraft:jukebox","minecraft:fence","minecraft:pumpkin","minecraft:netherrack","minecraft:soul_sand","minecraft:glowstone","minecraft:portal","minecraft:lit_pumpkin","minecraft:cake","minecraft:unpowered_repeater","minecraft:powered_repeater","minecraft:stained_glass","minecraft:trapdoor","minecraft:monster_egg","minecraft:stonebrick","minecraft:brown_mushroom_block","minecraft:iron_bars","minecraft:glass_pane","minecraft:melon_block","minecraft:pumpkin_stem","minecraft:melon_stem","minecraft:vines","minecraft:fence_gate","minecraft:brick_stairs","minecraft:stone_brick_stairs","minecraft:mycelium","minecraft:waterlily","minecraft:nether_brick","minecraft:nether_brick_fence","minecraft:nether_brick_stairs","minecraft:nether_wart","minecraft:enchanting","minecraft:brewing_stand","minecraft:cauldron","minecraft:end_portal","minecraft:end_portal_frame","minecraft:end_stone","minecraft:dragon_egg","minecraft:redstone_lamp","minecraft:lit_redstone_lamp","minecraft:double_wooden_slab","minecraft:wooden_slab","minecraft:cocoa","minecraft:sandstone_stairs","minecraft:emerald_ore","minecraft:ender_chest","minecraft:tripwire_hook","minecraft:emerald_block","minecraft:spruce_stairs","minecraft:birch_stairs","minecraft:jungle_stairs","minecraft:command_block","minecraft:beacon","minecraft:cobblestone_wall","minecraft:flower_pot","minecraft:carrots","minecraft:potatoes","minecraft:wooden_button","minecraft:skull","minecraft:anvil","minecraft:trapped_chest","minecraft:light_weighted_pressure_plate","minecraft:heavy_weighted_pressure_plate","minecraft:unpowered_comparator","minecraft:powered_comparator","minecraft:daylight_detector","minecraft:redstone_block","minecraft:quartz_ore","minecraft:hopper","minecraft:quartz_block","minecraft:quartz_stairs","minecraft:activator_rail","minecraft:dropper","minecraft:stained_hardened_clay","minecraft:stained_glass_pane","minecraft:leaves2","minecraft:log2","minecraft:acacia_stairs","minecraft:dark_oak_stairs","minecraft:slime","minecraft:barrier","minecraft:iron_trapdoor","minecraft:prismarine","minecraft:sea_lantern","minecraft:hay_block","minecraft:carpet","minecraft:hardened_clay","minecraft:coal_block","minecraft:packed_ice","minecraft:double_plant","minecraft:standing_banner","minecraft:wall_banner","minecraft:daylight_detector_inverted","minecraft:red_sandstone","minecraft:double_sandstone_stairs","minecraft:double_stone_slab2","minecraft:spruce_fence_gate","minecraft:birch_fence_gate","minecraft:jungle_fence_gate","minecraft:dark_oak_fence_gate","minecraft:acacia_fence_gate","minecraft:spruce_fence","minecraft:birch_fence","minecraft:jungle_fence","minecraft:dark_oak_fence","minecraft:acacia_fence","minecraft:spruce_door","minecraft:birch_door","minecraft:jungle_door","minecraft:acacia_door","minecraft:dark_oak_door","minecraft:end_rod","minecraft:chorus_plant","minecraft:chorus_flower","minecraft:purpur_block","minecraft:purpur_pillar","minecraft:purpur_stairs","minecraft:purpur_double_slab","minecraft:purpur_slab","minecraft:end_bricks","minecraft:beetroots","minecraft:grass_path","minecraft:end_gateway","minecraft:frosted_ice","minecraft:magma","minecraft:nether_wart_block","minecraft:red_nether_brick","minecraft:bone_block","minecraft:observer","minecraft:white_shulker_box","minecraft:orange_shulker_box","minecraft:magenta_shulker_box","minecraft:light_blue_shulker_box","minecraft:yellow_shulker_box","minecraft:lime_shulker_box","minecraft:pink_shulker_box","minecraft:gray_shulker_box","minecraft:silver_shulker_box","minecraft:cyan_shulker_box","minecraft:purple_shulker_box","minecraft:blue_shulker_box","minecraft:brown_shulker_box","minecraft:green_shulker_box","minecraft:red_shulker_box","minecraft:black_shulker_box","minecraft:white_glazed_terracotta","minecraft:orange_glazed_terracotta","minecraft:magenta_glazed_terracotta","minecraft:light_blue_glazed_terracotta","minecraft:yellow_glazed_terracotta","minecraft:lime_glazed_terracotta","minecraft:pink_glazed_terracotta","minecraft:gray_glazed_terracotta","minecraft:light_gray_glazed_terracotta","minecraft:cyan_glazed_terracotta","minecraft:purple_glazed_terracotta","minecraft:blue_glazed_terracotta","minecraft:brown_glazed_terracotta","minecraft:green_glazed_terracotta","minecraft:red_glazed_terracotta","minecraft:black_glazed_terracotta","minecraft:concrete","minecraft:concrete_powder","minecraft:structure_block","minecraft:bedrock"]}}} Just in case you actually wanted it, this command summons wheat seeds named Dirt with the data: /summon Item ~ ~ ~ {Item:{id:"minecraft:wheat_seeds",Count:1b,tag:{display:{Name:"Dirt"},HideFlags:16,CanPlaceOn:["minecraft:stone","minecraft:grass","minecraft:dirt","minecraft:cobblestone","minecraft:planks","minecraft:sapling","minecraft:flowing_water","minecraft:water","minecraft:flowing_lava","minecraft:lava","minecraft:sand","minecraft:gravel","minecraft:gold_ore","minecraft:iron_ore","minecraft:coal_ore","minecraft:log","minecraft:leaves","minecraft:sponge","minecraft:glass","minecraft:lapis_ore","minecraft:dispenser","minecraft:sandstone","minecraft:noteblock","minecraft:bed","minecraft:golden_rail","minecraft:detector_rail","minecraft:sticky_piston","minecraft:web","minecraft:tallgrass","minecraft:deadbush","minecraft:piston","minecraft:piston_head","minecraft:wool","minecraft:yellow_flower","minecraft:red_flower","minecraft:brown_mushroom","minecraft:red_mushroom","minecraft:gold_block,","minecraft:iron_block","minecraft:double_stone_slab","minecraft:stone_slab","minecraft:brick_block","minecraft:tnt","minecraft:bookshelf","minecraft:mossy_cobblestone","minecraft:obsidian","minecraft:torch","minecraft:fire","minecraft:mob_spawner","minecraft:oak_stairs","minecraft:chest","minecraft:redstone_wire","minecraft:diamond_ore","minecraft:diamond_block","minecraft:crafting_table","minecraft:wheat","minecraft:farmland","minecraft:furnace","minecraft:lit_furnace,","minecraft:standing_sign","minecraft:wooden_door","minecraft:ladder","minecraft:rail","minecraft:stone_stairs","minecraft:wall_sign","minecraft:lever","minecraft:stone_pressure_plate","minecraft:iron_door","minecraft:wooden_pressure_plate","minecraft:redstone_ore","minecraft:lit_redstone_ore","minecraft:unlit_redstone_torch","minecraft:redstone_torch","minecraft:stone_button","minecraft:snow_layer","minecraft:ice","minecraft:snow","minecraft:cactus","minecraft:clay","minecraft:reeds","minecraft:jukebox","minecraft:fence","minecraft:pumpkin","minecraft:netherrack","minecraft:soul_sand","minecraft:glowstone","minecraft:portal","minecraft:lit_pumpkin","minecraft:cake","minecraft:unpowered_repeater","minecraft:powered_repeater","minecraft:stained_glass","minecraft:trapdoor","minecraft:monster_egg","minecraft:stonebrick","minecraft:brown_mushroom_block","minecraft:iron_bars","minecraft:glass_pane","minecraft:melon_block","minecraft:pumpkin_stem","minecraft:melon_stem","minecraft:vines","minecraft:fence_gate","minecraft:brick_stairs","minecraft:stone_brick_stairs","minecraft:mycelium","minecraft:waterlily","minecraft:nether_brick","minecraft:nether_brick_fence","minecraft:nether_brick_stairs","minecraft:nether_wart","minecraft:enchanting","minecraft:brewing_stand","minecraft:cauldron","minecraft:end_portal","minecraft:end_portal_frame","minecraft:end_stone","minecraft:dragon_egg","minecraft:redstone_lamp","minecraft:lit_redstone_lamp","minecraft:double_wooden_slab","minecraft:wooden_slab","minecraft:cocoa","minecraft:sandstone_stairs","minecraft:emerald_ore","minecraft:ender_chest","minecraft:tripwire_hook","minecraft:emerald_block","minecraft:spruce_stairs","minecraft:birch_stairs","minecraft:jungle_stairs","minecraft:command_block","minecraft:beacon","minecraft:cobblestone_wall","minecraft:flower_pot","minecraft:carrots","minecraft:potatoes","minecraft:wooden_button","minecraft:skull","minecraft:anvil","minecraft:trapped_chest","minecraft:light_weighted_pressure_plate","minecraft:heavy_weighted_pressure_plate","minecraft:unpowered_comparator","minecraft:powered_comparator","minecraft:daylight_detector","minecraft:redstone_block","minecraft:quartz_ore","minecraft:hopper","minecraft:quartz_block","minecraft:quartz_stairs","minecraft:activator_rail","minecraft:dropper","minecraft:stained_hardened_clay","minecraft:stained_glass_pane","minecraft:leaves2","minecraft:log2","minecraft:acacia_stairs","minecraft:dark_oak_stairs","minecraft:slime","minecraft:barrier","minecraft:iron_trapdoor","minecraft:prismarine","minecraft:sea_lantern","minecraft:hay_block","minecraft:carpet","minecraft:hardened_clay","minecraft:coal_block","minecraft:packed_ice","minecraft:double_plant","minecraft:standing_banner","minecraft:wall_banner","minecraft:daylight_detector_inverted","minecraft:red_sandstone","minecraft:double_sandstone_stairs","minecraft:double_stone_slab2","minecraft:spruce_fence_gate","minecraft:birch_fence_gate","minecraft:jungle_fence_gate","minecraft:dark_oak_fence_gate","minecraft:acacia_fence_gate","minecraft:spruce_fence","minecraft:birch_fence","minecraft:jungle_fence","minecraft:dark_oak_fence","minecraft:acacia_fence","minecraft:spruce_door","minecraft:birch_door","minecraft:jungle_door","minecraft:acacia_door","minecraft:dark_oak_door","minecraft:end_rod","minecraft:chorus_plant","minecraft:chorus_flower","minecraft:purpur_block","minecraft:purpur_pillar","minecraft:purpur_stairs","minecraft:purpur_double_slab","minecraft:purpur_slab","minecraft:end_bricks","minecraft:beetroots","minecraft:grass_path","minecraft:end_gateway","minecraft:frosted_ice","minecraft:magma","minecraft:nether_wart_block","minecraft:red_nether_brick","minecraft:bone_block","minecraft:observer","minecraft:white_shulker_box","minecraft:orange_shulker_box","minecraft:magenta_shulker_box","minecraft:light_blue_shulker_box","minecraft:yellow_shulker_box","minecraft:lime_shulker_box","minecraft:pink_shulker_box","minecraft:gray_shulker_box","minecraft:silver_shulker_box","minecraft:cyan_shulker_box","minecraft:purple_shulker_box","minecraft:blue_shulker_box","minecraft:brown_shulker_box","minecraft:green_shulker_box","minecraft:red_shulker_box","minecraft:black_shulker_box","minecraft:white_glazed_terracotta","minecraft:orange_glazed_terracotta","minecraft:magenta_glazed_terracotta","minecraft:light_blue_glazed_terracotta","minecraft:yellow_glazed_terracotta","minecraft:lime_glazed_terracotta","minecraft:pink_glazed_terracotta","minecraft:gray_glazed_terracotta","minecraft:light_gray_glazed_terracotta","minecraft:cyan_glazed_terracotta","minecraft:purple_glazed_terracotta","minecraft:blue_glazed_terracotta","minecraft:brown_glazed_terracotta","minecraft:green_glazed_terracotta","minecraft:red_glazed_terracotta","minecraft:black_glazed_terracotta","minecraft:concrete","minecraft:concrete_powder","minecraft:structure_block","minecraft:bedrock"]}}}'''
Wiki: |
ahh right. Yea, those were essentially the problem. I am not sure how to best fix that issue though. I don't assume the viewer support tables? Another option would be to just remove the whole section with the table (I am not sure how that influences the content)
Hmm this seems hard to expand upon so I agree with @Muennighoff let's not go with KILT |
Done but unfortunately it's still ~14M passages. Is that too many @Muennighoff @isaac-chung? I can also downsample, perhaps sample 2M of the 11.3M stackexchanges ones and keep the rest for a total of 4-5M. Or other downsampling strategies someone prefers. For Wikipedia here's the new version with 200 words for a total of 16M passages. I did some postprocessing to move short sentences over to increase the probability of headings going to the right place, but it does make some of them larger than 200 words (~250 ish). I'll create a PR with these changes once we are done iterating. |
Sounds good, how about we select the top ~4M across all according to the question score? Maybe setting a threshold of say question score bigger than 0 or bigger than 5 would be enough? I think this would largely remove samples that are uninteresting anyways & would hardly ever get retrieved, as they are probably not interesting to many people or poor questions. A total of 4-5M sounds good to me then it would be ~2x as big as Wikipedia/arXiv which is doable.
Looks great. The main downside is that the indices will be twice as expensive but I think it is fine. I will create the indices 👍 |
Investigated some samples of the new wikipedia dump - I think it's probably better but curious if @orionw also thinks so. Example 1 400 word:
200 word:
&
but the middle paragraph ( Example 2 400 word:
200 word
&
Example 3 400 word
200 word
&
|
I just realized the new Wikipedia dataset has |
This is a good question. By default it's at least 2.5x more because 200 vs 500. Then some more is the packing: it accumulates passages until it gets to 200, so if it's at 100 and the next one is 150, it doesn't combine them. So in reality it's gonna be larger since the packing will make them average less than 200 words. I would've guessed beforehand 3-4x though so 6x is a bit weird. But I also don't see any repeats, which is also odd.
I like the 500 word ones for longer-doc search but I agree with your earlier comment that 200 words is more manageable for humans to read :) I'd go with 200 over 500. The missing sections are a good catch... I think this has something to do with the packing code but am not sure. Perhaps I dropped sections in the 500 word version that are more often picked up in the 200? I will have to take a closer look later today. For stackexchange at >= question score of 5 we get 1M docs, >= 3 is 2M, >=2 is 7M, and >= 1 is 10M. I'd say go for >= 3 but happy to go higher/lower! cc @Muennighoff |
Makes sense let's go with the 200 then. I think there's no obvious way to filter it down as we don't have popularity metrics or similar - maybe we can afford the 6x increase.
Agreed, let's go with >=3 so 2M samples in total right? I.e. similar to arxiv |
There's no metadata or sth that we easily get for the Wikipedia corpus right? Maybe we could exclude some categories likely not interesting to our users |
I think this then could be the final version of the Stackexchange one: https://huggingface.co/datasets/orionweller/stackexchange_200_words_2000_chars_en_only_3_score/settings Has only 3+ question score, the one answer, and the subdomain prepended. There's no document more than ~2k chars or 200 words. |
I think for arXiv we could have probably only done CS papers to save some costs but don't want to reencode everything right now - maybe later if it becomes expensive |
Okay here is the final Wikipedia version from 07/15/24 with 3,811,232 instances. It takes the top 500k Wikipedia articles by popularity and then chunks them into 200 words but allows them to be grouped together as long as they don't get too large (to avoid chunking paragraphs when possible and to preserve headings). It's also post-processed to remove any with long chars (3k+) but low words. As an FYI, the top 500k articles account for ~74% of Wikipedia page views (1M is 81%, 750k is 79%, 250k is 64%). Some stats:
|
What random samples would you use for StackExchange @orionw ? I think you suggested CQA previously, but does that make sense given many of the q's will be one-to-one in the dataset i.e. BM25 might win on all. |
Hmm, I haven't seen much CQA comparisons with BM25 so I'm not sure if it always wins. If you're worried about overlap Lotte could be a good option. I'm unsure if people will use the random sample button for the first example and then manually create their own queries, or whether they will continue to cycle through the samples. If they just use the first and move on to their own then it won't matter too much. |
Nice idea on LoTTe - Let's take all Also not sure - will be something we can inspect via the logs after a few days! |
Is it executing arbitrary HTML? |
@isaac-chung @orionw @KennethEnevoldsen nobody of you is getting the scrolling bug i put here #12 (comment) right? If so then let's mark that one as done & close the issue 👍 |
@Muennighoff using Safari I do get issues with scrolling: Screen.Recording.2024-07-30.at.09.58.25.movI don't get it with chrome. It, however, seems like (in Chrome) that the scroll bar is "snappy" (it snaps to the bottom). I guess that this causes the jittery behavior on Safari. |
Yeah that's the problem I am also seeing. Other arenas do not seem to have it 🤔 We should fix this as soon as we can 😬 |
Amazing updating gradio fixed it! This issue is done just in time then - amazing job everyone 😁 |
You can play with the space & retrieval models here: https://b3246e5ab28482f60e.gradio.live - Not all models & indices are cached yet so some first runs may be slow but once cached it should be blazing fast. Some TODOs below - would be great if we can get them done as fast as possible! 🚀
*individual
there is bothmodel
&model_name
(2)*side_by_side
seems to only include one model (3) Should addcorpus
for retrieval - maybe @orionw ?Connection reset by peer
error see below (In the UI it will just say Error). I think this mostly happens when two queries come in at the same time / closely after on another. It may be because I gave all indices the same endpoint rather than one endpoint per index but it could also be that I set max replica node count to 1 which means it cannot autoscale. It could be sth else.We're almost there! ❤️
The text was updated successfully, but these errors were encountered: