From e38e90a2f883d91226fd05fb4c0d313dbac2d65f Mon Sep 17 00:00:00 2001
From: GiulioRossetti <giulio.rossetti@gmail.com>
Date: Wed, 20 Nov 2024 11:14:10 +0100
Subject: [PATCH] :arrow_up: Pages, dynamic interests

---
 _config.yml              |   4 +-
 _pages/scenario.markdown | 107 +++++++++++++++++++++++++++++++--------
 _pages/yclient.markdown  |  42 +++++++++------
 _pages/yserver.markdown  |   7 ++-
 4 files changed, 121 insertions(+), 39 deletions(-)

diff --git a/_config.yml b/_config.yml
index 7d961bb..127c6ca 100644
--- a/_config.yml
+++ b/_config.yml
@@ -67,8 +67,8 @@ navbar:
       child:
         - title: 🤯 y/politics #Label
           url: /politics  #url
-        - title: 🤺 y/olympics  #Label
-          url: /olympics #url
+        #- title: 🤺 y/olympics  #Label
+        #  url: /olympics #url
 
     - title: 📙 Resources #Label
       url: /resources
diff --git a/_pages/scenario.markdown b/_pages/scenario.markdown
index 1efa9f2..4de2758 100644
--- a/_pages/scenario.markdown
+++ b/_pages/scenario.markdown
@@ -43,41 +43,47 @@ The configuration parameters are stored in a `config.json` file having the follo
 {
   "servers": {
     "llm": "http://127.0.0.1:11434/v1",
-    "api": "http://127.0.0.1:5000/"
-  },
+    "llm_api_key": "NULL",
+    "llm_v": "http://127.0.0.1:11434/v1",
+    "llm_v_api_key": "NULL",
+    "api": "http://127.0.0.1:5010/"},
   "simulation": {
     "name": "simulation_name",
     "client": "YClientBase",
     "days": 365,
     "slots": 24,
-    "starting_agents": 1000,
-    "new_agents_per_iteration": 10
+    "starting_agents": 180,
+    "percentage_new_agents_iteration": 0.07,
+    "percentage_removed_agents_iteration": 0.014,
     "hourly_activity": {...},
     "actions_likelihood": {
       "post": 0.2,
-      "comment": 0.3,
-      "read": 0.1,
-      "share": 0.1,
-      "reply": 0.1,
-      "search": 0.05,
+      "image": 0.1,
       "news": 0.1,
-      "cast": 0.05
+      "comment": 0.5,
+      "read": 0.2,
+      "share": 0.05,
+      "search": 0.1,
+      "cast": 0.0
     }
   },
   "agents": {
+    "llm_agents": ["llama3"],
+    "llm_v_agent": ["minicpm-v"],
     "education_levels": ["high school", "bachelor", "master", "phd"],
     "languages": ["english"],
     "max_length_thread_reading": 5,
     "reading_from_follower_ratio": 0.6,
     "political_leanings": ["Democrat", "Republican", ...],
-    "age": {"min": 18, "max": 80},
+    "age": {"min": 18, "max": 50},
     "round_actions": { "min": 1,"max": 3},
     "nationalities": ["Italian", "French", "German", ...],
-    "llm_agents": ["llama3", "mistral"],
+    "probability_of_daily_follow": 0.1,
     "n_interests": {"min": 4, "max": 10},
     "interests": [...],
+    "attention_window": 336,
     "big_five": {...},
-    "attention_window": 336
+    "toxicity_levels": ["no", "low", "average"]
   },
   "posts": {
     "visibility_rounds": 36,
@@ -94,11 +100,14 @@ The `simulation` section contains the parameters that define the simulation:
 - `days`: the number of days the simulation will last;
 - `slots`: the number of slots in a day;
 - `starting_agents`: the number of agents that will be created at the beginning of the simulation by the `YClient`;
-- `new_agents_per_iteration`: the number of agents that will be created during each day of the simulation;
+- `percentage_new_agents_iteration`: the percentage of new agents (w.r.t, to daily active ones) that will be created during each day of the simulation;
+- `percentage_removed_agents_iteration`: the percentage of agents that will be removed from the simulation during each day of the simulation; The churned agents will be selected based on their last activity (i.e., the more rounds an agent has recently been inactive the more likely it will be removed);
 - `hourly_activity`: a dictionary that specifies the hourly activity of the agents.
-- `actions_likelihood`: a dictionary that specifies the likelihood of each action that an agent can select in a round. During each agent-iteration, the system will sample from this distribution to identify the set of candidate actions the agent will be asked to choose from. Setting individual action likelihood to 0 will prevent the agent from performing that action.
+- `actions_likelihood`: a dictionary that specifies the likelihood of each action that an agent can select in a round. During each agent-iteration, the system will sample from this distribution to identify the set of candidate actions the agent will be asked to choose from. Setting individual action likelihood to 0 will prevent the agent from performing that action. Values will be automatically normalized to sum to 1.
 
 The `agents` section contains the parameters that will be used to generate the agents profiles:
+- `llm_agents`: a list of Large Language Models that the YClient can assign to the agents;
+- `llm_v_agent`: a list of Large Language Models that the YClient will use to annotate images (default is `minicpm-v`);
 - `education_levels`: the education levels of the agents;
 - `languages`: the languages spoken by the agents;
 - `max_length_thread_reading`: the maximum number of posts of a given threads that an agent can read to build a context before commenting;
@@ -107,11 +116,12 @@ The `agents` section contains the parameters that will be used to generate the a
 - `age`: the age range of the agents;
 - `round_actions`: the number of actions that an agent can perform in a round;
 - `nationalities`: the nationalities of the agents (they will impact the locales used to generate synthetic data);
-- `llm_agents`: a list of Large Language Models that the YClient can assign to the agents;
 - `n_interests`: the number of interests that the agents can have;
 - `interests`: the topics among witch each agent can sample (at creation time) in order to define their interests;
 - `big_five`: a dictionary that specifies the Big Five personality traits of the agents (which will be sampled at creation time);
 - `attention_window`: the posting/commenting/reacting history (in terms of rounds) the system will use to dynamically estimate the agent's topics of interests.
+- `probability_of_daily_follow`: probability that a daily active user will receive a follow recommendation from the system;
+- `toxicity_levels`: the expected toxicity levels of agents posts (NB: depending on the selected LLM model high toxicity values might cause guardrails to prevent content generation).
 
 Using such information, the `YClient` will create the agents population (leveraging the `faker` Python library).
 
@@ -126,9 +136,19 @@ The `posts` section contains the parameters that define the posts:
 
 <br>
 
-### RSS Feeds
+# Reality Anchors
+
+`Y Social` simulations can be anchored to real world events or discussion topics by leveraging custom RSS feeds.
+
+### Pages 
+
+`Y Social` integrates a specific module (news) that allows the agents to access and share news from online sources leveraging RSS feeds.
+
+RSS feeds are not directly accessed by the agents but published by `Pages`: customized agents that act as news sources.
+
+Each `Page` is associated with a specific RSS feed and anchors the simulation to the real world by providing the agents with news headlines and summaries.
 
-The RSS feeds from which the agents can access and share news are stored in a `rss_feeds.json` file having the following structure:
+The RSS feeds used to instantiate `Pages` can be specified stored in a `rss_feeds.json` file having the following structure:
 
 ```json
 [
@@ -155,8 +175,55 @@ The RSS feeds from which the agents can access and share news are stored in a `r
 
 The `category` field specifies the category of the news, the `leaning` field specifies the political leaning of the news source, the `name` field specifies its name, and the `feed_url` field specifies the URL of the related RSS feed.
 
-The `YClient` will use this information to retrieve news headlines and summaries from the web and made them available to the agents.
-
 To automatically generate the `rss_feeds.json` from a list of keywords (using Bing search), use the `populate_news_feeds.py` script available in the `YClient` repository.
 {: #myid .alert .alert-info .p-3 .mx-2 mb-3}
 
+In order to integrate pages within the simulation, the `config.json` file must be updated with the following field:
+
+```json
+{
+  "simulation": {
+    "client": "YClientWithPages",
+    ...
+    }
+}
+```
+
+Moreover, in the same file, "actions_likelihood" must be updated to include the "news" action with a value greater than 0.
+Finally, in the `YServer` configuration file, the "news" module must be included in the "modules" list.
+
+
+### Multimodal Content Sharing
+
+`Y Social` integrates a specific module (image) that allows the agents to share images obtained from the headlines of RSS feed items and comment on them.
+
+To enable the image module, "actions_likelihood" must be updated to include the "image" action with a value greater than 0 in the `config.json` file.
+Moreover, in the `YServer` configuration file, the "image" and "news" modules must be included in the "modules" list.
+
+`Y Social` uses a vision Large Language Model (default `minicpm-v`) to generate descriptions of the images extracted from RSS feed news.
+Such description are then provided to the agents' LLM to generate posts.
+
+# User Interests
+
+As shown in the `config.json` file, the agents' interests are sampled from a list of topics specified in the `interests` field.
+
+However, to capture agents interests' evolution over time, the `YClient` integrates a dynamic model that leverages the agents' posting history.
+
+The model is based on the assumption that the agents' interests are influenced by the topics they post about and the topics they read about.
+
+The model is implemented as follows:
+
+- The system keep tracks of the topics of each generated post (sampled users' interests used to generate it);
+
+- Each time an agent interacts with a discussion thread (e.g., by reacting or commenting on it):
+  - the system leverage the original post metadata to identify the topics of the thread;
+  - the agent's interests are updated by adding the thread topics to the agent's interests;
+
+- Each time an agent initiate a new thread:
+  - the system computes the most frequent topics of interest of the agent within the last `attention_window` rounds;
+  - among such topics, the system samples the ones that will be used to generate the post starting the new thread.
+
+Such a mechanism allows the system to dynamically update the agents' interests based on their interaction histories.
+
+Additionally, if the `news` module is enabled, the original interest list will be enriched with the topics of the news items shared by the `Pages` (identified by LLM-based topic modeling).
+Such new topics will then propagate to the agents' interests following the same mechanism described above.
diff --git a/_pages/yclient.markdown b/_pages/yclient.markdown
index c5c7a85..5cf3975 100644
--- a/_pages/yclient.markdown
+++ b/_pages/yclient.markdown
@@ -1,4 +1,4 @@
----
+from distutils.command.config import config---
 # Feel free to add content and custom Front Matter to this file.
 # To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults
 
@@ -107,18 +107,21 @@ Remember to modify the  `config.json` file to specify the LLM server address, po
 
 ## YClient Simulation Loop
 
-The following is a simplified and non-comprehensive pseudocode-version of the simulation loop implemented by `plain_y_client.py`:
+The following is a simplified and non-comprehensive pseudocode-version of the simulation loop implemented by `y_client/clients/client_base.py`:
 
 ```python
 # Input: config: Simulation configuration Files
 # Input: feeds: RSS feeds
 
 # configuring agents and servers 
-agents = create_agents(config, feeds)
+agents = create_agents(config)
+pages = create_pages(config, feeds)  # see scenario documentation
+agents = agents + pages
+
 y_server = connect(config.servers.api)
 
-# simulation loop 
-for day in range(config.simulation.days):
+ for day in range(config.simulation.days):
+    
     for slot in range(config.simulation.slots):
         #synchronize with the y_server clock 
         h = y_server.get_current_slot()
@@ -126,18 +129,27 @@ for day in range(config.simulation.days):
         # identify the active agents for the current slot 
         expected_active = int(len(agents) * config.simulation.hourly_activity[h])
         active = random.sample(agents, expected_active)
+        
+        # available actions
+        acts = [a for a, v in config.simulation.actions_likelihood.items() if v > 0]
+        
         for agent in active:
-            # evaluate agent’s actions (once per activity slot) 
-            agent.select_action(["NEWS", "POST","COMMENT", 
-                                 "REPLY", "SHARE", "READ", "SEARCH", "NONE"])
-
-    for agent in agents:
-        # evaluate following (once per day) 
-        agent.select_action(["FOLLOW", "NONE"])
-    #increase the agent population (if specified in config) 
+            for _ in round_actions:
+                # evaluate agent’s actions (once per activity slot) 
+                agent.select_action(acts)
+                
+                # reply to received mentions
+                agent.reply_mentions()
+
+    # evaluate following (once per day) 
+    evaluate_following(active)
+    
+    # increase the agent population 
     agents.add_new_agents()
+    
+    # evaluate churn
+    agents.churn()
 ```
 
-More complicated behaviors (allowing for more finegrained agents configurations) can be implemented by extending the `y_client.clients.YClientBase` class. 
-Alternative implementation will be released in the future.
+More complicated behaviors (allowing for more finegrained agents configurations) can be implemented by extending the `y_client.clients.YClientBase` class (see as an example `y_client.clients.YClientWithPages`).
 {: #myid .alert .alert-info .p-3 .mx-2 mb-3}
\ No newline at end of file
diff --git a/_pages/yserver.markdown b/_pages/yserver.markdown
index f765692..d9a2995 100644
--- a/_pages/yserver.markdown
+++ b/_pages/yserver.markdown
@@ -61,8 +61,9 @@ Set the server preferences modifying the file `config_files/exp_config.json`:
   "name": "local_test",
   "host": "0.0.0.0",
   "port": 5010,
-  "reset_db": "True",
-  "modules": ["news", "voting"]
+  "debug": "True",
+  "reset_db": "False",
+  "modules": ["news", "voting", "image"]
 }
 ```
 where:
@@ -70,6 +71,7 @@ where:
 - `host` is the IP address of the server;
 - `port` is the port of the server;
 - `reset_db` is a flag to reset the database at each server start;
+- `debug` is a flag to enable the debug mode;
 - `modules` is a list of additional modules to be loaded by the server (e.g., news, voting). Please note that the YClient must be configured to use the same modules.
 
 Once the simulation is configured, start the YServer with the following command:
@@ -83,4 +85,5 @@ The server will be then ready to accept requests at `http://localhost:5010`.
 #### Available Modules
 - **News**: This module allows the server to access online news sources leveraging RSS feeds.
 - **Voting**: This module allows the agents to cast their voting intention after interacting with peers contents (designed to perform political debate simulation).
+- **Image**: This module allows agents to share images (obtained from the headlines of RSS feed items - thus it requires the News module to be active) and comment on them.