IR-Superproject-2023

The integration of large language models such as BERT, GPT, and ChatGPT into search engine applications is revolutionizing the way we search for information. This themed project aims to help you understand, engage with, and advance this technology.

Through this project, you will develop an in-depth understanding of these language models and their applications in powering search engines. You will have the opportunity to explore one of several research directions that we have identified. For instance, you may choose to investigate the effectiveness of these methods under specific conditions, such as studying possible biases and robustness issues, or you design, develop, and evaluate new solutions to address known problems that affect these methods.

While completion of the INFS7410 course at UQ, or a similar Information Retrieval and Web Search course at other universities is desirable, we will provide background information and study material in the initial weeks of the project to allow you to explore these methods in depth. Therefore, if you possess a strong understanding of key artificial intelligence concepts but lack specific information retrieval knowledge, you're still encouraged to undertake this project.

Project Directions

Reproduce the paper Penha, G., Câmara, A. and Hauff, C., 2022, April. Evaluating the robustness of retrieval pipelines with query variation generators. In Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I (pp. 397-412). Cham: Springer International Publishing..
Reproduce the paper Chen, X., Luo, J., He, B., Sun, L., and Sun, Y., 2023. Towards Robust Dense Retrieval via Local Ranking Alignment. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22).
Reproduce the paper Wu, C., Zhang, R., Guo, J., Fan, Y. and Cheng, X., 2022. Are neural ranking models robust?. ACM Transactions on Information Systems, 41(2), pp.1-36..
Pre-trained Language Models-based rankers for Product Search. You will work with the Amazon Shopping Queries Dataset, which is publicly available. Multiple directions here:
- Query Generation. In product search, rankers are very effective if they model user behaviour; however new products will not have behavioural features, importantly they will not have relevant queries associated. You will be setting up a ranking pipeline (to different level of complexity) and implement query generation methods
- Neural features such those generated from cross-encoder rankers have been shown very effective when used in a learning to rank pipeline for product search (e.g. based on gradient boosted tree). However, cross-encoder features are expensive to generate and can only be generated for historic queries, i.e. queries observed in a query log -- i.e. offline (not real time). In this direction you will study the effect non generating cross-encoder features have on the rankers, and will investigate the effectiveness of weaker but computationally feasible neaural features, e.g. those generated by bi-encoders (dense retrievers).
Participate in a TREC competition (only available for students with GPA >= 6). This includes all aspects of the competition, including creation of the pipeline, baselines, implementation of our methods, result analysis. Competition of interest:
- TREC Product Search: This competition uses the Amazon Shopping Queries Dataset](https://arxiv.org/abs/2206.06588), which is publicly available. Task 1 (Product Ranking Task): The first task focuses on product ranking. In task we provide an initial ranking of 100 documents from a BM25 baseline and you are expected to re-rank the products in terms of their relevance to the users given intent. The ranking provides a focused task where the candidate sets are fixed and there is no need to implement complex end to end systems which makes experimentation quick and runs easily comparable. Task 2 (Product Retrieval Task): The second task focuses on end to end product retrieval. In task we provide an a large collection of products and participants need to design end to end retrieval systems which leverage whichever information they find relevant/useful. Unlike the ranking task, the focus here is in understanding the interplay between retrieval and reranking systems.
- NeuCLIR Track: The track is focused on the application of modern neural computing techniques to cross-language information retrieval. NeuCLIR topics are written in English. NeuCLIR has three target language collections in Chinese, Persian, and Russian.

Activities for Semester 2

Week	Meeting Date (every Wed 2-4pm)	Deliverables on this week	Meeting activity	Work plan	Due
1
2				read related works; get familiar with your experimental environments (e.g. computation resources, datasets, code)
3	Aug 9		check up meetings	read related works; get familiar with your experimental environments (e.g. computation resources, datasets, code)
4		[writing] Have a draft for related work chapter of the thesis	no meeting	write related work chapter and submit the draft to the teaching team; start investigating your project
5	Aug 23	[slides] Make slides to show your plans for related methods and experiment settings before this week's meeting	meeting for feedback on the experiment plans	make slides for your research plan (i.e. related methods and experiment settings); start experimentation
6	Aug 30		QA	experimentation
7	Sep 6		feedbacks on the related work chapter	experimentation; and modify the related work chapter based on feedbacks
8	Sep 13	[writings] Have a skeleton for conference paper (master only)	feedbacks on paper skeleton	draft skeleton based on current results and plans; experimentation
9	Sep 20		QA	experimentation
10	Oct 4	[slides] Make slides to show all experiment results by now	feedbacks and discussion about experiment results	make slides for current experiment results (figures and tables that will be used in your paper or thesis)
11	Oct 11		feedback on paper draft	(masters only) write conference paper and submit	Oct 12: conference paper (masters only)
12	Oct 18		feedback on posters	work on posters and submit	Oct 20: poster&demonstration
13	Oct 25		feedback on thesis draft	thesis draft
					Nov 6: thesis report

Activities for Semester 1

Week 5: Video: Welcome and Project Proposal Draft Assessment (slides)
Week 6: Q/A session on proposal draft (recording on zoom)
Week 6: A Framework for Generating Research Ideas: slides, original course material from Pranav Rajpurkar’s Harvard CS197 course
Week 11: Progress Seminar for Year-Long students schedule

Useful Links

Possible computing infrastacture

Background material and videos

Links to videos:


BERT	BERT For Ranking	BERT Limitations	Handling Length by Scores	Handling Length by Representations with PARADE
duoBERT	doc2query	DPR	ANCE	RepBERT
CLEAR	EPIC	DRs Performance	TILDE	TILDEv2

Readings:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
material		material
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IR-Superproject-2023

Project Directions

Activities for Semester 2

Activities for Semester 1

Useful Links

Possible computing infrastacture

Background material and videos

About

Releases

Packages

Contributors 4

License

ielab/IR-Superproject-2023

Folders and files

Latest commit

History

Repository files navigation

IR-Superproject-2023

Project Directions

Activities for Semester 2

Activities for Semester 1

Useful Links

Possible computing infrastacture

Background material and videos

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages