-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update photo's caption #24
Conversation
WalkthroughThe changes in this pull request involve modifications across several files, primarily enhancing the documentation, reorganizing the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Outside diff range and nitpick comments (15)
priv/typesense/reset.exs (1)
3-3
: LGTM! Improved error handling with bang function.The migration to
Migration.Typesense.Photo.reset!()
with the bang (!) operator provides clearer error handling expectations.Consider adding a warning comment about the destructive nature of this operation:
# mix run priv/typesense/reset.exs +# WARNING: This script will delete and recreate the photo collection in Typesense. +# All existing photo data will be lost. Migration.Typesense.Photo.reset!()docs/dev-logs/2024-10-25.md (2)
3-3
: Remove trailing punctuation from headingThe heading contains trailing punctuation which violates markdown style guidelines.
-## Req call typesense API alway :timeout, but typesense was updated. +## Req call typesense API alway :timeout, but typesense was updated🧰 Tools
🪛 Markdownlint
3-3: Punctuation: '.'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
3-9
: Enhance error documentation with more detailsThe error log would be more helpful with additional context:
- Typesense version before and after update
- Configuration settings (timeout values, etc.)
- Steps to reproduce
- Resolution or workaround if found
This information would help track similar issues in the future and provide valuable debugging context.
Would you like me to provide a template for documenting these additional details?
🧰 Tools
🪛 Markdownlint
3-3: Punctuation: '.'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
.gitignore (2)
29-31
: Consider consolidating related data ignore patterns.The
_local
anddata
entries appear to be related but are separated. Consider grouping them together for better clarity.# data -_local -data +data/ +_local/
33-38
: Document the purpose of environment-specific wildcards.The wildcard patterns for different environments (
_local*
,_dev*
, etc.) are quite broad. Consider adding comments to explain what types of files these patterns are intended to ignore to prevent accidentally ignoring important files.# scripts +# Environment-specific scripts and configurations _local* _dev* _stag* _prod* +# Background process output nohup.outlib/migration/typesense.ex (1)
1-42
: Consider adding resilience patterns for production use.The module makes external HTTP calls but lacks important resilience patterns:
- Request timeouts
- Retry mechanism for transient failures
- Rate limiting protection
Consider implementing:
- Add timeouts to Req.new:
Req.new( base_url: url, url: path, headers: headers, connect_options: [timeout: 5_000], receive_timeout: 15_000 )
Add retry logic using a library like
retry
or implement custom retry logic for transient failures.Consider implementing rate limiting protection using a token bucket algorithm or similar approach if the Typesense API has rate limits.
Would you like me to provide example implementations for any of these patterns?
lib/migration/typesense/photo.ex (1)
4-25
: Consider adding documentation and field constraints.While the schema structure is well-organized, consider the following improvements:
- Add
@moduledoc
and@doc
documentation explaining the purpose of each field- Consider adding field constraints for
file_id
format validation- Consider adding index definitions for searchable fields like
caption
- Consider adding size limits for the
image
field to prevent oversized uploadsExample documentation:
@moduledoc """ Defines the Typesense collection schema for storing photos with their embeddings and metadata. """ @doc """ Schema for photos collection: - image: Base64 encoded image data - image_embedding: CLIP model generated vector embeddings for similarity search - caption: Optional text description of the image - file_id: Unique identifier for the file - belongs_to_id: Reference to the parent entity - inserted_at: Timestamp of creation """ @photos_schema %{ # ... existing schema }README.md (2)
Line range hint
25-35
: Consider using more permanent demo links.The GitHub asset links used for demonstrations might not be permanently accessible. Consider:
- Using a more permanent hosting solution for demo media
- Adding text descriptions of expected behavior alongside the media
- Including example responses in text form
89-99
: Add file permission security measures.While setting execute permissions is necessary, the script contains sensitive data. Add these security measures:
2. execute permission ```sh -chmod +x start.sh +# Restrict read/write to owner only, add execute +chmod 700 start.sh + +# Restrict .env to owner read/write only +chmod 600 .env</blockquote></details> <details> <summary>zeabur/template.yaml (2)</summary><blockquote> `14-15`: **Remove empty coverImage field if not used.** The `coverImage` field is defined but empty. Consider removing it if it's not needed, or provide a value if it's required. --- `56-56`: **Remove empty configs arrays.** Both services have empty `configs: []` arrays. Consider removing these if they're not being used to keep the configuration concise. ```diff - configs: []
Also applies to: 83-83
lib/save_it/bot.ex (3)
Line range hint
198-202
: Consider extracting duplicate similar photo search logic.The similar photo search logic is duplicated with different thresholds. Consider extracting it into a private function:
+ defp search_similar_photos(photo_id, threshold, chat_id) do + PhotoService.search_similar_photos!( + photo_id, + distance_threshold: threshold, + belongs_to_id: chat_id + ) + endThen use it in both places:
photos = search_similar_photos(typesense_photo["id"], caption == "" && 0.4 || 0.1, chat_id)Also applies to: 208-212
311-313
: Consider implementing the TODO for search trigger.The TODO suggests re-triggering search on edited messages. This could enhance user experience by automatically updating search results.
Would you like me to help implement the search trigger functionality for edited messages?
Line range hint
399-435
: Consider consolidating photo creation logic.The photo creation logic here is similar to the message handlers. Consider reusing the suggested
create_photo/4
function here as well.Additionally, the file type handling could be simplified using pattern matching:
- case file_extension(file_name) do - ext when ext in [".png", ".jpg", ".jpeg"] -> - {:ok, msg} = ExGram.send_photo(...) - ".mp4" -> - ExGram.send_video(...) - ".gif" -> - ExGram.send_animation(...) - _ -> - ExGram.send_document(...) - end + send_file_by_type(file_extension(file_name), chat_id, content, caption) + defp send_file_by_type(ext, chat_id, content, caption) when ext in [".png", ".jpg", ".jpeg"] do + {:ok, msg} = ExGram.send_photo(chat_id, content, caption: caption) + # Photo creation logic here + msg + end + defp send_file_by_type(".mp4", chat_id, content, caption) do + ExGram.send_video(chat_id, content, supports_streaming: true, caption: caption) + end + defp send_file_by_type(".gif", chat_id, content, caption) do + ExGram.send_animation(chat_id, content, caption: caption) + end + defp send_file_by_type(_, chat_id, content, caption) do + ExGram.send_document(chat_id, content, caption: caption) + endlib/small_sdk/typesense.ex (1)
27-27
: Handle cases where the search response has no hitsIn the
search_documents!/2
function, if the search yields no results,data["hits"]
could be an empty list. While mapping over an empty list doesn't cause an error, it's good practice to ensure that the function handles this scenario gracefully.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
- .gitignore (1 hunks)
- README.md (2 hunks)
- config/runtime.exs (0 hunks)
- docs/dev-logs/2024-10-25.md (1 hunks)
- docs/dev/typesense.md (1 hunks)
- lib/migration/typesense.ex (1 hunks)
- lib/migration/typesense/photo.ex (1 hunks)
- lib/save_it/bot.ex (11 hunks)
- lib/save_it/photo_service.ex (5 hunks)
- lib/small_sdk/typesense.ex (2 hunks)
- lib/small_sdk/typesense_admin.ex (0 hunks)
- priv/typesense/2024-10-24_create_photos_collection.exs (1 hunks)
- priv/typesense/reset.exs (1 hunks)
- zeabur/template.yaml (1 hunks)
💤 Files with no reviewable changes (2)
- config/runtime.exs
- lib/small_sdk/typesense_admin.ex
🧰 Additional context used
🪛 Markdownlint
docs/dev-logs/2024-10-25.md
3-3: Punctuation: '.'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
docs/dev/typesense.md
6-6: Column: 6
Hard tabs(MD010, no-hard-tabs)
7-7: Column: 6
Hard tabs(MD010, no-hard-tabs)
8-8: Column: 6
Hard tabs(MD010, no-hard-tabs)
9-9: Column: 6
Hard tabs(MD010, no-hard-tabs)
10-10: Column: 6
Hard tabs(MD010, no-hard-tabs)
11-11: Column: 6
Hard tabs(MD010, no-hard-tabs)
14-14: null
Bare URL used(MD034, no-bare-urls)
5-5: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
🔇 Additional comments (12)
priv/typesense/2024-10-24_create_photos_collection.exs (1)
1-1
: Verify the collection schema implementation.Let's ensure the
Migration.Typesense.Photo
module properly defines all required fields.✅ Verification successful
Schema implementation verified and properly defined
The
Migration.Typesense.Photo
module has a well-structured schema that includes all necessary fields:
- Image field with CLIP embedding support
- Caption (optional string field)
- File and relationship identifiers
- Timestamp field for sorting
- Proper configuration for image embeddings using CLIP model
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for the schema definition in Migration.Typesense.Photo module rg -l "Migration.Typesense.Photo" | xargs rg "create_collection|schema" # Verify if all required fields are defined in the schema rg -A 10 "schema.*=.*\["Length of output: 570
Script:
#!/bin/bash # Let's examine the content of the Photo module to see the schema definition cat lib/migration/typesense/photo.exLength of output: 1013
lib/migration/typesense/photo.ex (2)
1-3
: LGTM! Clean module structure.The module follows Elixir naming conventions and uses appropriate aliasing.
27-34
:⚠️ Potential issueAdd error handling and safety measures.
The current implementation has several potential issues:
- No error handling for failed collection operations
reset!/0
could lead to unintended data loss- No atomic transaction for the reset operation
Consider implementing these improvements:
def create_collection!() do - Typesense.create_collection!(@photos_schema) + case Typesense.create_collection!(@photos_schema) do + {:ok, collection} -> {:ok, collection} + {:error, %{"message" => message}} -> + raise "Failed to create photos collection: #{message}" + end end def reset!() do + # Add warning log + require Logger + Logger.warning("Resetting photos collection - all data will be lost!") + + # Backup existing data if needed + # Add appropriate backup logic here + Typesense.delete_collection!(@photos_schema["name"]) Typesense.create_collection!(@photos_schema) endLet's verify if this collection is used elsewhere in the codebase:
README.md (1)
5-9
: LGTM! Clear feature documentation.The new features section provides a concise overview of the bot's capabilities with clear checkboxes indicating completion status.
zeabur/template.yaml (2)
62-62
: 🛠️ Refactor suggestionConsider using SHA digest for container image.
While version pinning is good, using a SHA digest provides even better security and reproducibility. Consider updating the image reference to include the SHA256 digest.
Example format:
- image: typesense/typesense:27.1 + image: typesense/typesense@sha256:<digest>#!/bin/bash # Check if other yaml files in the repository use SHA digests rg "@sha256:[a-f0-9]{64}" -g "*.yaml" -g "*.yml"
41-55
: Validate required environment variables.Several environment variables have empty default values. This could cause runtime issues if not properly set during deployment. Consider:
- Adding validation checks in your application for required variables
- Documenting which variables are required vs optional
- Providing example values in comments
Affected variables:
- TELEGRAM_BOT_TOKEN
- GOOGLE_OAUTH_CLIENT_ID
- GOOGLE_OAUTH_CLIENT_SECRET
#!/bin/bash # Check if these environment variables are used in the codebase echo "Checking usage of environment variables..." rg "TELEGRAM_BOT_TOKEN|GOOGLE_OAUTH_CLIENT_ID|GOOGLE_OAUTH_CLIENT_SECRET" -llib/save_it/bot.ex (5)
8-8
: LGTM! Improved module naming.The rename from
TypesensePhoto
toPhotoService
better represents the module's responsibility as a service layer.
118-120
: LGTM! Enhanced user experience.Good addition of a helpful prompt when users don't provide search text.
131-133
: LGTM! Well-scoped photo search.The search implementation correctly scopes results to the specific chat using
belongs_to_id
.
317-319
: LGTM! Clean caption update implementation.The handler correctly updates the photo caption when an edited message contains photos.
345-347
: LGTM! Improved media presentation.Good addition of caption support with proper positioning above media for better readability.
lib/save_it/photo_service.ex (1)
22-23
: Verify thatphoto["id"]
exists before updatingIn
update_photo/1
, you're accessingphoto["id"]
. Ensure that thephoto
map contains the"id"
key to prevent aKeyError
. Ifphoto
uses atom keys, you might need to usephoto.id
orphoto[:id]
.Run the following script to check for the existence of the
"id"
key in allphoto
maps:
@@ -0,0 +1 @@ | |||
Migration.Typesense.Photo.create_collection!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add documentation and safety checks to the migration script.
While the basic functionality looks correct, consider enhancing the migration script with:
- Documentation explaining the purpose and schema of the photos collection
- Idempotency check to prevent duplicate collection creation
- Down migration capability for rollbacks
Here's a suggested implementation:
+# Creates the photos collection in Typesense with fields for:
+# - image data
+# - embeddings
+# - captions
+# - file identifiers
+# - timestamps
+
+# Check if collection already exists to ensure idempotency
+unless Migration.Typesense.Photo.collection_exists?() do
Migration.Typesense.Photo.create_collection!()
+end
+
+# Example down migration (in a separate script):
+# Migration.Typesense.Photo.drop_collection!()
Committable suggestion was skipped due to low confidence.
lib/migration/typesense.ex
Outdated
def create_collection!(schema) do | ||
req = build_request("/collections") | ||
{:ok, res} = Req.post(req, json: schema) | ||
|
||
res.body | ||
end | ||
|
||
def delete_collection!(collection_name) do | ||
req = build_request("/collections/#{collection_name}") | ||
{:ok, res} = Req.delete(req) | ||
|
||
res.body | ||
end | ||
|
||
def list_collections() do | ||
req = build_request("/collections") | ||
{:ok, res} = Req.get(req) | ||
|
||
res.body | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling and add documentation for public API functions.
The public functions need several improvements for production readiness:
- Add proper error handling for HTTP requests
- Add response validation
- Add documentation and type specs
Here's a suggested implementation for better error handling and documentation:
+ @doc """
+ Creates a new collection in Typesense with the given schema.
+
+ ## Parameters
+ - schema: Map containing the collection schema
+
+ ## Returns
+ - {:ok, response_body} on success
+ - {:error, reason} on failure
+
+ ## Raises
+ - RuntimeError if the API request fails
+ """
+ @spec create_collection!(map()) :: map() | no_return()
def create_collection!(schema) do
req = build_request("/collections")
- {:ok, res} = Req.post(req, json: schema)
-
- res.body
+ case Req.post(req, json: schema) do
+ {:ok, %{status: status} = res} when status in 200..299 ->
+ res.body
+ {:ok, %{status: status, body: body}} ->
+ raise "Failed to create collection. Status: #{status}, Body: #{inspect(body)}"
+ {:error, error} ->
+ raise "Failed to create collection: #{inspect(error)}"
+ end
end
Similar improvements should be applied to delete_collection!/1
and list_collections/0
.
Committable suggestion was skipped due to low confidence.
lib/migration/typesense.ex
Outdated
defp get_env() do | ||
url = Application.fetch_env!(:save_it, :typesense_url) | ||
api_key = Application.fetch_env!(:save_it, :typesense_api_key) | ||
|
||
{url, api_key} | ||
end | ||
|
||
defp build_request(path) do | ||
{url, api_key} = get_env() | ||
|
||
Req.new( | ||
base_url: url, | ||
url: path, | ||
headers: [ | ||
{"Content-Type", "application/json"}, | ||
{"X-TYPESENSE-API-KEY", api_key} | ||
] | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance configuration handling and request building.
Consider the following improvements for the helper functions:
- Add URL validation
- Use module attributes for constants
- Add documentation for maintainability
Here's a suggested implementation:
+ @content_type "application/json"
+ @typesense_config_keys [:typesense_url, :typesense_api_key]
+
+ @doc false
defp get_env() do
- url = Application.fetch_env!(:save_it, :typesense_url)
- api_key = Application.fetch_env!(:save_it, :typesense_api_key)
+ url =
+ :save_it
+ |> Application.fetch_env!(:typesense_url)
+ |> validate_url!()
+
+ api_key = Application.fetch_env!(:save_it, :typesense_api_key)
{url, api_key}
end
+ @doc false
defp build_request(path) do
{url, api_key} = get_env()
Req.new(
base_url: url,
url: path,
headers: [
- {"Content-Type", "application/json"},
+ {"Content-Type", @content_type},
{"X-TYPESENSE-API-KEY", api_key}
]
)
end
+
+ defp validate_url!(url) do
+ uri = URI.parse(url)
+ unless uri.scheme in ["http", "https"] and uri.host do
+ raise ArgumentError, "Invalid Typesense URL: #{url}"
+ end
+ url
+ end
Committable suggestion was skipped due to low confidence.
README.md
Outdated
Pro Tips: create shell script for fast run app | ||
|
||
1. touch start.sh | ||
|
||
```sh | ||
#!/bin/sh | ||
|
||
export TELEGRAM_BOT_TOKEN=<YOUR_TELEGRAM_BOT_TOKEN> | ||
|
||
export TYPESENSE_URL=<YOUR_TYPESENSE_URL> | ||
export TYPESENSE_API_KEY=<YOUR_TYPESENSE_API_KEY> | ||
|
||
export GOOGLE_OAUTH_CLIENT_ID=<YOUR_GOOGLE_OAUTH_CLIENT_ID> | ||
export GOOGLE_OAUTH_CLIENT_SECRET=<YOUR_GOOGLE_OAUTH_CLIENT_SECRET> | ||
|
||
iex -S mix run --no-halt | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance security guidance for configuration management.
The current setup instructions expose sensitive configuration details. Consider these security improvements:
- Add a warning about keeping
start.sh
private and not committing it to version control - Recommend using a
.env
file instead of direct exports - Add guidance about secure token management
Here's a suggested improvement:
Pro Tips: create shell script for fast run app
+⚠️ Security Notice:
+- Never commit the following files to version control
+- Store these files securely and restrict access
+
+1. Create a .env file
+
+```env
+TELEGRAM_BOT_TOKEN=your_token_here
+TYPESENSE_URL=your_url_here
+TYPESENSE_API_KEY=your_key_here
+GOOGLE_OAUTH_CLIENT_ID=your_client_id_here
+GOOGLE_OAUTH_CLIENT_SECRET=your_client_secret_here
+```
+
+2. Create start.sh
+
```sh
#!/bin/sh
-export TELEGRAM_BOT_TOKEN=<YOUR_TELEGRAM_BOT_TOKEN>
-export TYPESENSE_URL=<YOUR_TYPESENSE_URL>
-export TYPESENSE_API_KEY=<YOUR_TYPESENSE_API_KEY>
-export GOOGLE_OAUTH_CLIENT_ID=<YOUR_GOOGLE_OAUTH_CLIENT_ID>
-export GOOGLE_OAUTH_CLIENT_SECRET=<YOUR_GOOGLE_OAUTH_CLIENT_SECRET>
+# Load environment variables from .env file
+source .env
iex -S mix run --no-halt
<!-- This is an auto-generated comment by CodeRabbit -->
TYPESENSE_API_KEY: | ||
default: ${PASSWORD} | ||
expose: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security: Reconsider exposing TYPESENSE_API_KEY.
The TYPESENSE_API_KEY
is marked as expose: true
. This could lead to security risks as the API key might be visible in logs or monitoring systems. Consider setting expose: false
unless there's a specific requirement for exposure.
TYPESENSE_API_KEY:
default: ${PASSWORD}
- expose: true
+ expose: false
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
TYPESENSE_API_KEY: | |
default: ${PASSWORD} | |
expose: true | |
TYPESENSE_API_KEY: | |
default: ${PASSWORD} | |
expose: false |
lib/small_sdk/typesense.ex
Outdated
defp handle_response(res) do | ||
case res do | ||
%Req.Response{status: 200} -> | ||
res.body | ||
|
||
%Req.Response{status: 201} -> | ||
res.body | ||
|
||
%Req.Response{status: 400} -> | ||
Logger.error("Bad Request: #{inspect(res.body)}") | ||
raise "Bad Request" | ||
|
||
%Req.Response{status: 401} -> | ||
raise "Unauthorized" | ||
|
||
%Req.Response{status: 404} -> | ||
nil | ||
|
||
%Req.Response{status: 409} -> | ||
raise "Conflict" | ||
|
||
%Req.Response{status: 422} -> | ||
raise "Unprocessable Entity" | ||
|
||
%Req.Response{status: 503} -> | ||
raise "Service Unavailable" | ||
|
||
_ -> | ||
raise "Unknown error" | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify error handling using pattern matching
You can streamline the handle_response/1
function by using pattern matching and guard clauses to handle success and error responses more succinctly.
Consider refactoring as follows:
defp handle_response(%Req.Response{status: status, body: body}) when status in [200, 201] do
body
end
-defp handle_response(%Req.Response{status: 400} = res) do
- Logger.error("Bad Request: #{inspect(res.body)}")
- raise "Bad Request"
-end
-
-defp handle_response(%Req.Response{status: 401}) do
- raise "Unauthorized"
-end
-
-defp handle_response(%Req.Response{status: 404}) do
- nil
-end
-
-defp handle_response(%Req.Response{status: 409}) do
- raise "Conflict"
-end
-
-defp handle_response(%Req.Response{status: 422}) do
- raise "Unprocessable Entity"
-end
-
-defp handle_response(%Req.Response{status: 503}) do
- raise "Service Unavailable"
-end
-
-defp handle_response(_) do
- raise "Unknown error"
+defp handle_response(%Req.Response{status: 404}) do
+ nil
end
+defp handle_response(%Req.Response{status: status, body: body}) do
+ Logger.error("HTTP #{status} Error: #{inspect(body)}")
+ raise "HTTP #{status} Error"
+end
This refactoring:
- Handles successful responses with statuses 200 and 201.
- Returns
nil
for 404 Not Found errors inget_document/2
. - Logs and raises an exception for all other error statuses.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
defp handle_response(res) do | |
case res do | |
%Req.Response{status: 200} -> | |
res.body | |
%Req.Response{status: 201} -> | |
res.body | |
%Req.Response{status: 400} -> | |
Logger.error("Bad Request: #{inspect(res.body)}") | |
raise "Bad Request" | |
%Req.Response{status: 401} -> | |
raise "Unauthorized" | |
%Req.Response{status: 404} -> | |
nil | |
%Req.Response{status: 409} -> | |
raise "Conflict" | |
%Req.Response{status: 422} -> | |
raise "Unprocessable Entity" | |
%Req.Response{status: 503} -> | |
raise "Service Unavailable" | |
_ -> | |
raise "Unknown error" | |
end | |
end | |
defp handle_response(%Req.Response{status: status, body: body}) when status in [200, 201] do | |
body | |
end | |
defp handle_response(%Req.Response{status: 404}) do | |
nil | |
end | |
defp handle_response(%Req.Response{status: status, body: body}) do | |
Logger.error("HTTP #{status} Error: #{inspect(body)}") | |
raise "HTTP #{status} Error" | |
end |
🛠️ Refactor suggestion
Log errors for all non-successful HTTP responses
Currently, only the 400 Bad Request response is being logged with Logger.error/1
. For better observability and debugging, consider logging all error responses before raising exceptions.
Apply this diff to add logging for other error statuses:
defp handle_response(res) do
case res do
%Req.Response{status: 200} ->
res.body
%Req.Response{status: 201} ->
res.body
%Req.Response{status: 400} ->
Logger.error("Bad Request: #{inspect(res.body)}")
raise "Bad Request"
%Req.Response{status: 401} ->
+ Logger.error("Unauthorized: #{inspect(res.body)}")
raise "Unauthorized"
%Req.Response{status: 404} ->
nil
%Req.Response{status: 409} ->
+ Logger.error("Conflict: #{inspect(res.body)}")
raise "Conflict"
%Req.Response{status: 422} ->
+ Logger.error("Unprocessable Entity: #{inspect(res.body)}")
raise "Unprocessable Entity"
%Req.Response{status: 503} ->
+ Logger.error("Service Unavailable: #{inspect(res.body)}")
raise "Service Unavailable"
_ ->
raise "Unknown error"
end
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
defp handle_response(res) do | |
case res do | |
%Req.Response{status: 200} -> | |
res.body | |
%Req.Response{status: 201} -> | |
res.body | |
%Req.Response{status: 400} -> | |
Logger.error("Bad Request: #{inspect(res.body)}") | |
raise "Bad Request" | |
%Req.Response{status: 401} -> | |
raise "Unauthorized" | |
%Req.Response{status: 404} -> | |
nil | |
%Req.Response{status: 409} -> | |
raise "Conflict" | |
%Req.Response{status: 422} -> | |
raise "Unprocessable Entity" | |
%Req.Response{status: 503} -> | |
raise "Service Unavailable" | |
_ -> | |
raise "Unknown error" | |
end | |
end | |
defp handle_response(res) do | |
case res do | |
%Req.Response{status: 200} -> | |
res.body | |
%Req.Response{status: 201} -> | |
res.body | |
%Req.Response{status: 400} -> | |
Logger.error("Bad Request: #{inspect(res.body)}") | |
raise "Bad Request" | |
%Req.Response{status: 401} -> | |
Logger.error("Unauthorized: #{inspect(res.body)}") | |
raise "Unauthorized" | |
%Req.Response{status: 404} -> | |
nil | |
%Req.Response{status: 409} -> | |
Logger.error("Conflict: #{inspect(res.body)}") | |
raise "Conflict" | |
%Req.Response{status: 422} -> | |
Logger.error("Unprocessable Entity: #{inspect(res.body)}") | |
raise "Unprocessable Entity" | |
%Req.Response{status: 503} -> | |
Logger.error("Service Unavailable: #{inspect(res.body)}") | |
raise "Service Unavailable" | |
_ -> | |
raise "Unknown error" | |
end | |
end |
lib/save_it/photo_service.ex
Outdated
|
||
res.body["results"] |> typesense_results_to_documents() | ||
data["results"] |> hd() |> Map.get("hits") |> Enum.map(&Map.get(&1, "document")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent potential errors when accessing the first result
In search_photos!/2
and search_similar_photos!/1
, you use hd()
on data["results"]
. If data["results"]
is an empty list, this will raise a ArgumentError
. Consider handling the empty list case to prevent runtime errors.
Apply this diff to safely handle empty search results:
# For search_photos!/2
-case data["results"] |> hd() |> Map.get("hits") |> Enum.map(&Map.get(&1, "document"))
+results = data["results"]
+if results != [] do
+ results
+ |> hd()
+ |> Map.get("hits")
+ |> Enum.map(&Map.get(&1, "document"))
+else
+ []
+end
# For search_similar_photos!/1
-case data["results"] |> hd() |> Map.get("hits") |> Enum.map(&Map.get(&1, "document"))
+results = data["results"]
+if results != [] do
+ results
+ |> hd()
+ |> Map.get("hits")
+ |> Enum.map(&Map.get(&1, "document"))
+else
+ []
+end
Also applies to: 91-91
lib/save_it/photo_service.ex
Outdated
def get_photo!(file_id, belongs_to_id) do | ||
case Typesense.search_documents!("photos", | ||
q: "*", | ||
query_by: "caption", | ||
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}" | ||
) do | ||
[photo | _] -> photo | ||
[] -> nil | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ensure get_photo!/2
raises an exception when no photo is found
The bang function get_photo!/2
currently returns nil
if no photo is found. To adhere to Elixir conventions, it should raise an exception instead.
Apply this diff to raise an error when the photo is not found:
def get_photo!(file_id, belongs_to_id) do
case Typesense.search_documents!("photos",
q: "*",
query_by: "caption",
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}"
) do
[photo | _] -> photo
- [] -> nil
+ [] ->
+ raise "Photo not found for file_id #{file_id} and belongs_to_id #{belongs_to_id}"
end
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_photo!(file_id, belongs_to_id) do | |
case Typesense.search_documents!("photos", | |
q: "*", | |
query_by: "caption", | |
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}" | |
) do | |
[photo | _] -> photo | |
[] -> nil | |
end | |
end | |
def get_photo!(file_id, belongs_to_id) do | |
case Typesense.search_documents!("photos", | |
q: "*", | |
query_by: "caption", | |
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}" | |
) do | |
[photo | _] -> photo | |
[] -> | |
raise "Photo not found for file_id #{file_id} and belongs_to_id #{belongs_to_id}" | |
end | |
end |
lib/save_it/photo_service.ex
Outdated
def update_photo_caption!(file_id, belongs_to_id, caption) do | ||
get_photo!(file_id, belongs_to_id) | ||
|> Map.put("caption", caption) | ||
|> update_photo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle the case when no photo is found in update_photo_caption!/3
In the function update_photo_caption!/3
, get_photo!/2
may return nil
if no photo is found. Passing nil
to Map.put/3
will cause a FunctionClauseError
. Consider adding a check to handle this scenario appropriately.
Apply this diff to handle the nil
case:
def update_photo_caption!(file_id, belongs_to_id, caption) do
- get_photo!(file_id, belongs_to_id)
- |> Map.put("caption", caption)
- |> update_photo()
+ case get_photo!(file_id, belongs_to_id) do
+ photo when is_map(photo) ->
+ photo
+ |> Map.put("caption", caption)
+ |> update_photo()
+ nil ->
+ raise "Photo not found for file_id #{file_id} and belongs_to_id #{belongs_to_id}"
+ end
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def update_photo_caption!(file_id, belongs_to_id, caption) do | |
get_photo!(file_id, belongs_to_id) | |
|> Map.put("caption", caption) | |
|> update_photo() | |
def update_photo_caption!(file_id, belongs_to_id, caption) do | |
case get_photo!(file_id, belongs_to_id) do | |
photo when is_map(photo) -> | |
photo | |
|> Map.put("caption", caption) | |
|> update_photo() | |
nil -> | |
raise "Photo not found for file_id #{file_id} and belongs_to_id #{belongs_to_id}" | |
end | |
end |
lib/save_it/photo_service.ex
Outdated
defp handle_response(res) do | ||
case res do | ||
%Req.Response{status: 200} -> | ||
res.body | ||
|
||
%Req.Response{status: 201} -> | ||
res.body | ||
|
||
%Req.Response{status: 400} -> | ||
Logger.error("Bad Request: #{inspect(res.body)}") | ||
raise "Bad Request" | ||
|
||
%Req.Response{status: 401} -> | ||
raise "Unauthorized" | ||
|
||
%Req.Response{status: 404} -> | ||
nil | ||
|
||
%Req.Response{status: 409} -> | ||
raise "Conflict" | ||
|
||
%Req.Response{status: 422} -> | ||
raise "Unprocessable Entity" | ||
|
||
%Req.Response{status: 503} -> | ||
raise "Service Unavailable" | ||
|
||
_ -> | ||
raise "Unknown error" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify the handle_response/1
function using pattern matching
The handle_response/1
function can be simplified by leveraging pattern matching directly in the function clauses, improving readability and maintainability.
Apply this diff to refactor the function:
defp handle_response(%Req.Response{status: status, body: body} = res) do
- case res do
- %Req.Response{status: 200} ->
- res.body
-
- %Req.Response{status: 201} ->
- res.body
-
- %Req.Response{status: 400} ->
- Logger.error("Bad Request: #{inspect(res.body)}")
- raise "Bad Request"
-
- %Req.Response{status: 401} ->
- raise "Unauthorized"
-
- %Req.Response{status: 404} ->
- nil
-
- %Req.Response{status: 409} ->
- raise "Conflict"
-
- %Req.Response{status: 422} ->
- raise "Unprocessable Entity"
-
- %Req.Response{status: 503} ->
- raise "Service Unavailable"
-
- _ ->
- raise "Unknown error"
- end
+ case status do
+ 200 -> body
+ 201 -> body
+ 400 ->
+ Logger.error("Bad Request: #{inspect(body)}")
+ raise "Bad Request"
+ 401 -> raise "Unauthorized"
+ 404 -> nil
+ 409 -> raise "Conflict"
+ 422 -> raise "Unprocessable Entity"
+ 503 -> raise "Service Unavailable"
+ _ ->
+ Logger.error("Unhandled status code #{status}: #{inspect(body)}")
+ raise "Unknown error: #{status}"
+ end
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
defp handle_response(res) do | |
case res do | |
%Req.Response{status: 200} -> | |
res.body | |
%Req.Response{status: 201} -> | |
res.body | |
%Req.Response{status: 400} -> | |
Logger.error("Bad Request: #{inspect(res.body)}") | |
raise "Bad Request" | |
%Req.Response{status: 401} -> | |
raise "Unauthorized" | |
%Req.Response{status: 404} -> | |
nil | |
%Req.Response{status: 409} -> | |
raise "Conflict" | |
%Req.Response{status: 422} -> | |
raise "Unprocessable Entity" | |
%Req.Response{status: 503} -> | |
raise "Service Unavailable" | |
_ -> | |
raise "Unknown error" | |
end | |
defp handle_response(%Req.Response{status: status, body: body} = res) do | |
case status do | |
200 -> body | |
201 -> body | |
400 -> | |
Logger.error("Bad Request: #{inspect(body)}") | |
raise "Bad Request" | |
401 -> raise "Unauthorized" | |
404 -> nil | |
409 -> raise "Conflict" | |
422 -> raise "Unprocessable Entity" | |
503 -> raise "Service Unavailable" | |
_ -> | |
Logger.error("Unhandled status code #{status}: #{inspect(body)}") | |
raise "Unknown error: #{status}" | |
end | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
lib/small_sdk/typesense.ex (2)
11-27
: Add documentation and type specs.While the implementation is solid, consider adding:
@doc
with usage examples and parameter descriptions@spec
to define the function's type signatureExample:
@doc """ Searches for documents in a collection. ## Parameters * collection_name - The name of the collection to search * opts - Keyword list of search options: * :q - Search query (default: "*") * :query_by - Fields to search in * :filter_by - Filter conditions ## Examples iex> search_documents!("photos", q: "beach", query_by: "caption") [%{"id" => "1", "caption" => "beach sunset"}, ...] """ @spec search_documents!(String.t(), Keyword.t()) :: [map()]
90-121
: Enhance error messages with response details.While the error handling is comprehensive, consider including response body details in the error messages for better debugging:
- raise "Bad Request" + raise "Bad Request: #{inspect(body)}" - raise "Unauthorized" + raise "Unauthorized: #{inspect(body)}" - raise "Conflict" + raise "Conflict: #{inspect(body)}" - raise "Unprocessable Entity" + raise "Unprocessable Entity: #{inspect(body)}" - raise "Service Unavailable" + raise "Service Unavailable: #{inspect(body)}"lib/save_it/photo_service.ex (1)
138-169
: Consider using error level logging for 400 status.The function provides comprehensive status code handling, but a 400 Bad Request typically indicates a client error that should be logged at the error level rather than warning.
400 -> - Logger.warning("Bad Request: #{inspect(body)}") + Logger.error("Bad Request: #{inspect(body)}") raise "Bad Request"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- lib/save_it/photo_service.ex (5 hunks)
- lib/small_sdk/typesense.ex (2 hunks)
🔇 Additional comments (7)
lib/small_sdk/typesense.ex (3)
8-8
: LGTM! Consistent error handling.The centralization of response handling improves code consistency and maintainability.
Also applies to: 34-34
37-42
: LGTM! Proper function naming.The
!
suffix correctly indicates that this function may raise exceptions, following Elixir conventions.
44-48
: LGTM with existing concerns.The implementation follows the module's patterns.
Note: The previous review comment about raising an exception for 404 responses in bang functions is still applicable here.
lib/save_it/photo_service.ex (4)
1-1
: LGTM! Improved error handling with bang version.The module rename better reflects its responsibility, and using
update_document!
is consistent with the module's error handling approach.Also applies to: 22-23
25-34
: LGTM! Well-structured caption update function.The function follows Elixir conventions for bang functions and properly handles the case when a photo is not found.
71-82
: LGTM! Improved handling of empty results.The changes properly handle the case when no results are found.
104-115
: LGTM! Consistent with search_photos! improvements.
lib/save_it/photo_service.ex
Outdated
def get_photo(file_id, belongs_to_id) do | ||
case Typesense.search_documents!("photos", | ||
q: "*", | ||
query_by: "caption", | ||
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}" | ||
) do | ||
[photo | _] -> photo | ||
[] -> nil | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize search query parameters.
The function searches using the caption
field but filters by file_id
and belongs_to_id
. Since you're doing an exact match filter, the caption search is unnecessary.
def get_photo(file_id, belongs_to_id) do
case Typesense.search_documents!("photos",
q: "*",
- query_by: "caption",
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}"
) do
[photo | _] -> photo
[] -> nil
end
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_photo(file_id, belongs_to_id) do | |
case Typesense.search_documents!("photos", | |
q: "*", | |
query_by: "caption", | |
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}" | |
) do | |
[photo | _] -> photo | |
[] -> nil | |
end | |
end | |
def get_photo(file_id, belongs_to_id) do | |
case Typesense.search_documents!("photos", | |
q: "*", | |
filter_by: "file_id:=#{file_id} && belongs_to_id:=#{belongs_to_id}" | |
) do | |
[photo | _] -> photo | |
[] -> nil | |
end | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (9)
lib/tj/url_helper.ex (2)
1-11
: LGTM! Consider adding documentation.
The module structure and naming follow Elixir conventions. However, adding module and function documentation would improve maintainability.
Add documentation using @moduledoc
and @doc
:
defmodule Tj.UrlHelper do
+ @moduledoc """
+ Helper module for URL validation and manipulation.
+ """
+
+ @doc """
+ Validates a URL string, ensuring it has a valid HTTP/HTTPS scheme and host.
+
+ ## Examples
+
+ iex> Tj.UrlHelper.validate_url!("https://example.com")
+ "https://example.com"
+
+ iex> Tj.UrlHelper.validate_url!("invalid-url")
+ ** (ArgumentError) Invalid URL: invalid-url
+ """
def validate_url!(url) do
2-10
: Add unit tests for URL validation.
Since this is a critical validation function used across modules, it should have comprehensive test coverage.
Would you like me to help create a test module with various test cases for URL validation? This would include tests for:
- Valid HTTP/HTTPS URLs
- Invalid schemes
- Missing hosts
- Malformed URLs
- Edge cases with whitespace and special characters
lib/small_sdk/telegram.ex (1)
Consider adding module documentation and improving error handling
The verification confirms the need for improvements as:
- The module lacks documentation while other modules in the codebase (e.g.,
GoogleDrive
) have proper@moduledoc
- No error handling patterns are present in the Telegram module, while error handling exists in other parts (e.g., Typesense module)
- While Tesla is configured globally to use Hackney adapter, the module lacks timeout configuration unlike other Tesla-based modules which have more complete middleware setups
Suggested improvements:
- Add module documentation as shown in the original suggestion
- Add timeout middleware (the global Tesla config doesn't handle timeouts)
- Implement error handling similar to other modules in the codebase
- Consider adding JSON middleware for consistent response handling
The original review comment's suggestions are valid and align with the patterns found in the codebase.
🔗 Analysis chain
Line range hint 1-8
: Consider adding module documentation and improving error handling.
The module would benefit from:
- Module documentation explaining its purpose and usage
- Explicit handling of HTTP errors and timeouts
- Configurable timeout settings
+ @moduledoc """
+ Provides functionality for interacting with the Telegram Bot API.
+ Handles file downloads and related operations.
+ """
+
+ @request_timeout 30_000
+
use Tesla
- plug(Tesla.Middleware.BaseUrl, "https://api.telegram.org")
+ plug Tesla.Middleware.BaseUrl, "https://api.telegram.org"
+ plug Tesla.Middleware.Timeout, timeout: @request_timeout
Let's check if there are any timeout-related issues reported:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for timeout-related issues or error handling patterns
rg -l "timeout|error.*telegram" --type elixir
Length of output: 47
Script:
#!/bin/bash
# Let's first check if there are any existing Tesla configurations or error handling patterns
rg -A 5 "use Tesla" --type elixir
# Also check for any existing documentation patterns in the module
rg "@moduledoc|@doc" --type elixir
# Check for any existing error handling patterns
rg "rescue|catch|with" --type elixir
# Look for any timeout configurations in config files
fd "config.exs|runtime.exs" --exec cat {}
Length of output: 2784
README.md (2)
5-9
: Add the new caption feature to the Features list.
Since this PR implements photo caption functionality (as mentioned in the PR description), please add it to the features list for completeness.
## Features
- [x] Save photos via a link
- [x] Search photos using semantic search
- [x] Find similar photos by photo
+- [x] View and manage photo captions
58-64
: Enhance Typesense setup documentation.
Since this PR includes Typesense SDK changes, please provide more detailed setup instructions:
- Add Typesense version requirements
- Document the required collections and schema
- Include migration steps mentioned in the PR description
Here's a suggested addition:
# Start typesense
docker compose up
+# Typesense Setup
+1. Required version: X.Y.Z
+2. Collections:
+ - photos
+ - <other_collections>
+
+# Migration
+Follow these steps to migrate existing data:
+1. <migration_step_1>
+2. <migration_step_2>
lib/save_it/photo_service.ex (2)
43-51
: Consider adding documentation for the return value.
The function's behavior (returning nil for not found) should be documented to help API consumers.
+@doc """
+Gets a photo by file_id and belongs_to_id.
+
+Returns nil if no photo is found.
+"""
def get_photo(file_id, belongs_to_id) do
120-120
: Consider validating URL during application initialization.
The URL validation could fail at runtime. Consider moving this validation to application startup to fail fast.
lib/small_sdk/typesense.ex (2)
13-29
: Consider adding documentation for the search options and field exclusion.
While the implementation is good, it would be helpful to:
- Document the available search options (
q
,query_by
,filter_by
). - Explain why
image_embedding
is excluded from results.
Add documentation like this:
+ @doc """
+ Searches for documents in a collection.
+
+ ## Options
+ * `:q` - Search query string (default: "*")
+ * `:query_by` - Comma-separated list of fields to search by
+ * `:filter_by` - Filter conditions for search results
+
+ Note: The `image_embedding` field is excluded from results to reduce response size.
+ """
def search_documents!(collection_name, opts) do
Line range hint 57-70
: Consider renaming to create_search_key!
and parameterizing the key configuration.
- The function should be renamed to include
!
as it can raise exceptions. - The hardcoded values for
actions
andcollections
limit reusability.
Consider refactoring like this:
- def create_search_key() do
+ def create_search_key!(description \\ "Search-only key", actions \\ ["documents:search"], collections \\ ["photos"]) do
{url, _} = get_env()
req = build_request("/keys")
res =
Req.post(req,
json: %{
- "description" => "Search-only photos key",
- "actions" => ["documents:search"],
- "collections" => ["photos"]
+ "description" => description,
+ "actions" => actions,
+ "collections" => collections
}
)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
- .env.template (1 hunks)
- README.md (2 hunks)
- lib/migration/typesense.ex (1 hunks)
- lib/save_it/photo_service.ex (4 hunks)
- lib/small_sdk/telegram.ex (2 hunks)
- lib/small_sdk/typesense.ex (3 hunks)
- lib/tj/url_helper.ex (1 hunks)
- start.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .env.template
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/migration/typesense.ex
🧰 Additional context used
🪛 Shellcheck
start.sh
[warning] 3-3: In POSIX sh, 'source' in place of '.' is undefined.
(SC3046)
🔇 Additional comments (12)
start.sh (2)
5-5
: LGTM: Application startup command.
The Elixir application startup command looks correct with the --no-halt
flag ensuring continuous operation.
1-5
: Verify script permissions.
Ensure the script has proper execution permissions.
✅ Verification successful
Script has proper execution permissions
The start.sh
script has the correct execution permissions (-rwxr-xr-x
), allowing it to be executed by the owner, group, and others.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if start.sh has execution permissions
# Expected: File should have execution bit set
ls -l start.sh
Length of output: 72
🧰 Tools
🪛 Shellcheck
[warning] 3-3: In POSIX sh, 'source' in place of '.' is undefined.
(SC3046)
README.md (1)
63-64
: Security concerns about environment variables.
The previous review comment about secure configuration management is still applicable. Please refer to the earlier suggestions about using a .env
file and adding security notices.
lib/save_it/photo_service.ex (4)
1-6
: LGTM! Module rename improves clarity.
The rename from TypesensePhoto
to PhotoService
better reflects the module's responsibility as a service layer. The import is appropriately scoped.
27-36
: LGTM! Robust error handling implemented.
The function properly handles the case when no photo is found and includes a descriptive error message.
71-83
: LGTM! Improved error handling for empty results.
The changes properly handle empty search results, preventing potential runtime errors.
24-25
: Verify error handling for update_document! failures.
The change to use update_document!
is appropriate, but ensure that callers are prepared to handle potential exceptions.
✅ Verification successful
Let me analyze the error handling by checking the broader context of the call chain. I'll search for any try/rescue blocks or error handling patterns.
Let me check how the Typesense module handles errors and if there's any global error handling in the application.
Let me check the implementation of the Typesense module's update_document! and the application's error handling setup.
Let me check the handle_response implementation to understand how errors are handled in Typesense operations.
Based on the analysis of the codebase, I can now provide the final response:
The error handling for update_document! is appropriate and well-structured
The change is safe because:
- Typesense.update_document! has proper error handling that raises exceptions for non-2xx responses
- The calling chain (Bot -> update_photo_caption! -> update_photo) maintains the bang-style convention consistently
- The supervisor configuration with
:one_for_one
strategy ensures proper process isolation and restart capabilities
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for update_photo calls to verify error handling
rg -A 3 "update_photo" --type elixir
Length of output: 970
Script:
#!/bin/bash
# Search for try/rescue blocks in bot.ex and photo_service.ex
rg -B 3 -A 5 "try|rescue" lib/save_it/bot.ex lib/save_it/photo_service.ex
# Also check the broader context of the update_photo_caption! function
rg -B 5 -A 10 "update_photo_caption!" lib/save_it/photo_service.ex
Length of output: 583
Script:
#!/bin/bash
# Check Typesense module for error handling patterns
rg -B 2 -A 2 "update_document!"
# Look for any application-wide error handling
rg -l "use Application|defexception|handle_error" --type elixir
Length of output: 713
Script:
#!/bin/bash
# Check the implementation of update_document! in Typesense
rg -B 5 -A 10 "def update_document!" lib/small_sdk/typesense.ex
# Check application.ex for error handling setup
cat lib/save_it/application.ex
Length of output: 1134
Script:
#!/bin/bash
# Check handle_response implementation
rg -B 3 -A 5 "handle_response" lib/small_sdk/typesense.ex
# Also check if there's any error handling in the Bot module
rg -B 3 -A 5 "rescue|catch|handle_error" lib/save_it/bot.ex
Length of output: 1898
lib/small_sdk/typesense.ex (5)
4-4
: LGTM! Good practice to validate URLs.
The specific import of validate_url!/1
helps ensure URL validity and follows good practices by importing only what's needed.
8-10
: LGTM! Good refactor to centralize error handling.
The change improves code maintainability by using the centralized handle_response/1
function.
39-44
: LGTM! Consistent naming convention for functions that can raise.
The rename to include !
correctly indicates that this function can raise exceptions, following Elixir conventions.
46-50
: LGTM! Consistent implementation with other document operations.
75-75
: LGTM! Good addition of URL validation.
Early validation of the URL configuration helps catch issues sooner.
start.sh
Outdated
#!/bin/sh | ||
|
||
source .env | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve script robustness and POSIX compatibility.
The script needs better error handling and POSIX compliance:
source
command is not POSIX-compliant- Missing check for
.env
file existence - No validation of required environment variables
Apply this diff to improve the script:
#!/bin/sh
-source .env
+if [ ! -f .env ]; then
+ echo "Error: .env file not found"
+ exit 1
+fi
+
+. .env
+
+# Validate required environment variables
+for var in BOT_TOKEN TYPESENSE_API_KEY; do
+ if [ -z "$(eval echo \$$var)" ]; then
+ echo "Error: $var is not set"
+ exit 1
+ fi
+done
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#!/bin/sh | |
source .env | |
#!/bin/sh | |
if [ ! -f .env ]; then | |
echo "Error: .env file not found" | |
exit 1 | |
fi | |
. .env | |
# Validate required environment variables | |
for var in BOT_TOKEN TYPESENSE_API_KEY; do | |
if [ -z "$(eval echo \$$var)" ]; then | |
echo "Error: $var is not set" | |
exit 1 | |
fi | |
done |
🧰 Tools
🪛 Shellcheck
[warning] 3-3: In POSIX sh, 'source' in place of '.' is undefined.
(SC3046)
def validate_url!(url) do | ||
uri = URI.parse(url) | ||
|
||
if uri.scheme in ["http", "https"] and uri.host do | ||
url | ||
else | ||
raise ArgumentError, "Invalid URL: #{url}" | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing URL validation.
While the current implementation covers basic URL validation, it could be more comprehensive to handle edge cases.
Consider these improvements:
def validate_url!(url) do
+ url = String.trim(url)
uri = URI.parse(url)
- if uri.scheme in ["http", "https"] and uri.host do
+ if uri.scheme in ["http", "https"] and
+ not is_nil(uri.host) and
+ String.contains?(uri.host, ".") and
+ not String.contains?(uri.host, "..") do
url
else
raise ArgumentError, "Invalid URL: #{url}"
end
end
The enhanced validation:
- Trims whitespace from input
- Ensures host contains at least one dot (for TLD)
- Prevents double dots in host names
- More explicitly checks for nil host
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def validate_url!(url) do | |
uri = URI.parse(url) | |
if uri.scheme in ["http", "https"] and uri.host do | |
url | |
else | |
raise ArgumentError, "Invalid URL: #{url}" | |
end | |
end | |
def validate_url!(url) do | |
url = String.trim(url) | |
uri = URI.parse(url) | |
if uri.scheme in ["http", "https"] and | |
not is_nil(uri.host) and | |
String.contains?(uri.host, ".") and | |
not String.contains?(uri.host, "..") do | |
url | |
else | |
raise ArgumentError, "Invalid URL: #{url}" | |
end | |
end |
bot_token = get_env() | ||
url = "/file/bot#{bot_token}/#{file_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Good refactoring, but consider enhancing the token retrieval.
The extraction of token retrieval logic is a good improvement. However, consider these enhancements:
- bot_token = get_env()
+ bot_token = get_bot_token()
url = "/file/bot#{bot_token}/#{file_path}"
Also consider adding token format validation to prevent runtime errors with malformed tokens.
Committable suggestion was skipped due to low confidence.
defp get_env() do | ||
bot_token = Application.fetch_env!(:save_it, :telegram_bot_token) | ||
|
||
bot_token | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance the environment configuration function.
The function needs several improvements for better maintainability and reliability:
- defp get_env() do
- bot_token = Application.fetch_env!(:save_it, :telegram_bot_token)
-
- bot_token
- end
+ @doc """
+ Retrieves the Telegram bot token from application configuration.
+ Raises if the token is not configured.
+ """
+ @spec get_bot_token() :: String.t()
+ defp get_bot_token() do
+ token = Application.fetch_env!(:save_it, :telegram_bot_token)
+ validate_bot_token!(token)
+ end
+
+ @spec validate_bot_token!(String.t()) :: String.t()
+ defp validate_bot_token!(token) when is_binary(token) do
+ if String.match?(token, ~r/^\d+:[A-Za-z0-9_-]{35}$/) do
+ token
+ else
+ raise ArgumentError, "Invalid Telegram bot token format"
+ end
+ end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
defp get_env() do | |
bot_token = Application.fetch_env!(:save_it, :telegram_bot_token) | |
bot_token | |
end | |
@doc """ | |
Retrieves the Telegram bot token from application configuration. | |
Raises if the token is not configured. | |
""" | |
@spec get_bot_token() :: String.t() | |
defp get_bot_token() do | |
token = Application.fetch_env!(:save_it, :telegram_bot_token) | |
validate_bot_token!(token) | |
end | |
@spec validate_bot_token!(String.t()) :: String.t() | |
defp validate_bot_token!(token) when is_binary(token) do | |
if String.match?(token, ~r/^\d+:[A-Za-z0-9_-]{35}$/) do | |
token | |
else | |
raise ArgumentError, "Invalid Telegram bot token format" | |
end | |
end |
res = Req.post(req, json: req_body) | ||
data = Typesense.handle_response(res) | ||
|
||
results = data["results"] | ||
|
||
if results != [] do | ||
results | ||
|> hd() | ||
|> Map.get("hits") | ||
|> Enum.map(&Map.get(&1, "document")) | ||
else | ||
[] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider extracting common results handling logic.
The results handling logic is duplicated between search_photos!
and search_similar_photos!
.
Extract the common logic into a private function:
+defp handle_search_results(data) do
+ results = data["results"]
+ if results != [] do
+ results
+ |> hd()
+ |> Map.get("hits")
+ |> Enum.map(&Map.get(&1, "document"))
+ else
+ []
+ end
+end
def search_photos!(q, opts) do
# ... existing code ...
- results = data["results"]
- if results != [] do
- results
- |> hd()
- |> Map.get("hits")
- |> Enum.map(&Map.get(&1, "document"))
- else
- []
- end
+ handle_search_results(data)
end
def search_similar_photos!(photo_id, opts \\ []) do
# ... existing code ...
- results = data["results"]
- if results != [] do
- results
- |> hd()
- |> Map.get("hits")
- |> Enum.map(&Map.get(&1, "document"))
- else
- []
- end
+ handle_search_results(data)
end
Committable suggestion was skipped due to low confidence.
def handle_response({:ok, %{status: status, body: body}}) do | ||
case status do | ||
status when status in 200..209 -> | ||
body | ||
|
||
400 -> | ||
Logger.warning("Bad Request: #{inspect(body)}") | ||
raise "Bad Request" | ||
|
||
401 -> | ||
raise "Unauthorized" | ||
|
||
404 -> | ||
nil | ||
|
||
409 -> | ||
raise "Conflict" | ||
|
||
422 -> | ||
raise "Unprocessable Entity" | ||
|
||
503 -> | ||
raise "Service Unavailable" | ||
|
||
_ -> | ||
Logger.error("Unhandled status code #{status}: #{inspect(body)}") | ||
raise "Unknown error: #{status}" | ||
end | ||
end | ||
|
||
def handle_response({:error, reason}) do | ||
Logger.error("Request failed: #{inspect(reason)}") | ||
raise "Request failed" | ||
end | ||
|
||
def handle_response!(%{status: status, body: body}) do | ||
case status do | ||
status when status in 200..209 -> | ||
body | ||
|
||
status -> | ||
Logger.warning("Request failed with status #{status}: #{inspect(body)}") | ||
raise "Request failed with status #{status}" | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider consolidating error handling and adjusting log levels.
- There's duplication between
handle_response/1
andhandle_response!/1
. - Using
Logger.warning
for a 400 status might be inappropriate as it's typically a client error that should be logged as info.
Consider consolidating the error handling:
- def handle_response({:ok, %{status: status, body: body}}) do
+ def handle_response(response) do
+ case response do
+ {:ok, %{status: status, body: body}} ->
+ handle_status(status, body)
+
+ {:error, reason} ->
+ Logger.error("Request failed: #{inspect(reason)}")
+ raise "Request failed"
+ end
+ end
- def handle_response({:error, reason}) do
- Logger.error("Request failed: #{inspect(reason)}")
- raise "Request failed"
- end
- def handle_response!(%{status: status, body: body}) do
+ defp handle_status(status, body) do
case status do
status when status in 200..209 ->
body
+ 400 ->
+ Logger.info("Bad Request: #{inspect(body)}")
+ raise "Bad Request"
# ... rest of status handling ...
end
end
Committable suggestion was skipped due to low confidence.
SaveIt.Migration.Typesense.Photo.migrate_photos_2024_10_29!()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (6)
docs/dev-logs/2024-10-25.md (2)
3-3
: Remove trailing period from heading
The heading contains trailing punctuation which violates markdown style guidelines.
-## Req call typesense API alway :timeout, but typesense was updated.
+## Req call typesense API alway :timeout, but typesense was updated
🧰 Tools
🪛 Markdownlint
3-3: Punctuation: '.'
Trailing punctuation in heading
(MD026, no-trailing-punctuation)
3-7
: Enhance error documentation with more context
The current documentation of the timeout error would be more helpful with additional details such as:
- Steps to reproduce the issue
- Configuration settings (timeout values, API endpoint)
- Any temporary workarounds if known
- Related Typesense version information
Would you like me to help create a template for documenting these additional details?
🧰 Tools
🪛 Markdownlint
3-3: Punctuation: '.'
Trailing punctuation in heading
(MD026, no-trailing-punctuation)
lib/save_it/migration/typesense.ex (2)
1-5
: Add module documentation.
Consider adding @moduledoc
documentation to describe the module's purpose, responsibilities, and usage examples. This will help other developers understand how to use this module effectively.
defmodule SaveIt.Migration.Typesense do
+ @moduledoc """
+ Provides functions for managing Typesense collections through HTTP requests.
+
+ This module encapsulates the core CRUD operations for Typesense collections,
+ handling authentication and request formatting.
+
+ ## Examples
+
+ schema = %{name: "photos", fields: [...]}
+ SaveIt.Migration.Typesense.create_collection!(schema)
+ """
alias SmallSdk.Typesense
1-47
: Consider adding logging for critical operations.
Given that this module handles critical database operations, it would be beneficial to add structured logging to track collection operations, especially in production environments. This will help with monitoring and debugging.
Consider using Logger to track operations:
+ require Logger
+
def delete_collection!(collection_name) do
+ Logger.info("Deleting Typesense collection", collection: collection_name)
req = build_request("/collections/#{collection_name}")
res = Req.delete!(req)
lib/small_sdk/typesense.ex (1)
64-68
: Consider supporting soft delete functionality.
The delete_document!/2
function performs a hard delete. In many applications, it's beneficial to support soft deletes to maintain data history and enable recovery.
Consider adding a soft_delete_document!/2
function that updates a document's status instead of removing it:
def soft_delete_document!(collection_name, document_id) do
req = build_request("/collections/#{collection_name}/documents/#{document_id}")
res = Req.patch(req, json: %{deleted_at: DateTime.utc_now()})
handle_response(res)
end
lib/save_it/migration/typesense/photo.ex (1)
46-46
: Implement pagination to handle collections with more than 10,000 documents
The current code retrieves up to 10,000 documents. If the collection contains more, some documents will not be processed. Consider implementing pagination to ensure all documents are migrated.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
- docs/dev-logs/2024-10-25.md (1 hunks)
- lib/save_it/migration/typesense.ex (1 hunks)
- lib/save_it/migration/typesense/photo.ex (1 hunks)
- lib/small_sdk/typesense.ex (3 hunks)
- priv/typesense/2024-10-24_create_photos_collection.exs (1 hunks)
- priv/typesense/2024-10-29_photos_url_to_file_id.ex (1 hunks)
- priv/typesense/reset.exs (1 hunks)
- test/save_it_test.exs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- priv/typesense/2024-10-24_create_photos_collection.exs
- priv/typesense/reset.exs
🧰 Additional context used
🪛 Markdownlint
docs/dev-logs/2024-10-25.md
3-3: Punctuation: '.'
Trailing punctuation in heading
(MD026, no-trailing-punctuation)
🔇 Additional comments (7)
priv/typesense/2024-10-29_photos_url_to_file_id.ex (1)
1-1
: Verify the migration's impact on the application.
Since this migration modifies the photo collection schema, we should verify its effects on the application.
Let's check for any direct usages of the photo collection that might be affected:
✅ Verification successful
Migration's impact has been verified and is safe to proceed
The migration safely transitions from URL to file_id field:
- Adds new
file_id
field - Migrates existing data by extracting file_id from URLs
- Drops the
url
field only after migration
The codebase is already using file_id
consistently:
- PhotoService uses
file_id
for all photo operations - Bot implementation handles photos via Telegram's file_id system
- No direct URL dependencies found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for photo collection usages
echo "Searching for photo collection usages..."
rg -l "photos" --type elixir
# Search for url field usages that might need updating
echo "Searching for url field usages..."
rg -l "url.*photo" --type elixir
Length of output: 453
Script:
#!/bin/bash
# Let's examine the content of these files to understand the impact
echo "=== Checking photo collection creation ==="
cat priv/typesense/2024-10-24_create_photos_collection.exs
echo -e "\n=== Checking current migration ==="
cat priv/typesense/2024-10-29_photos_url_to_file_id.ex
echo -e "\n=== Checking photo service implementation ==="
cat lib/save_it/photo_service.ex
echo -e "\n=== Checking Typesense implementation ==="
cat lib/small_sdk/typesense.ex
Length of output: 7786
Script:
#!/bin/bash
# Let's check the migration implementation and schema changes
echo "=== Checking Photo migration module ==="
cat lib/save_it/migration/typesense/photo.ex
# Let's also check the bot implementation for any direct URL usages
echo -e "\n=== Checking Bot implementation ==="
cat lib/save_it/bot.ex
Length of output: 16080
test/save_it_test.exs (1)
5-5
: Verify documentation completeness for migration module.
Ensure that the SaveIt.Migration.Typesense.Photo
module has comprehensive documentation for all migration-related functions.
#!/bin/bash
# Check for documented functions in the migration module
ast-grep --pattern 'def $func $args do
@doc """
$$$
"""
$$$
end'
# Look for any TODO or FIXME comments in the documentation
rg -i "(@doc|@moduledoc).*?(todo|fixme)"
lib/save_it/migration/typesense.ex (1)
27-46
: 🛠️ Refactor suggestion
Consider adding request timeouts and retries.
The HTTP requests lack timeout and retry configurations, which are important for reliability in a distributed system.
defp build_request(path) do
{url, api_key} = get_env()
Req.new(
base_url: url,
url: path,
+ retry: :exponential,
+ max_retries: 3,
+ connect_timeout: 5000,
+ receive_timeout: 15000,
headers: [
{"Content-Type", "application/json"},
{"X-TYPESENSE-API-KEY", api_key}
]
)
end
Validate API key format.
Consider adding validation for the API key format to fail fast if misconfigured.
lib/small_sdk/typesense.ex (3)
4-4
: LGTM: Good addition of URL validation!
The addition of URL validation through validate_url!/1
improves the security and reliability of the API interactions.
Also applies to: 93-93
57-62
: LGTM: Consistent use of bang suffix.
The renaming of update_document/3
to update_document!/3
maintains consistency with other functions that may raise exceptions.
112-156
: Refer to previous review comments about error handling.
The suggestions from previous reviews about consolidating error handling and adjusting log levels are still valid and should be addressed.
lib/save_it/migration/typesense/photo.ex (1)
9-33
: The create_photos_20241024!/0
function correctly defines the collection schema
The schema for the "photos" collection is properly defined, and the collection is created as expected.
@@ -0,0 +1 @@ | |||
SaveIt.Migration.Typesense.Photo.migrate_photos_2024_10_29!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling and validation to the migration script.
The current implementation might be too simplistic for a production migration. Consider enhancing it with proper error handling, validation, and logging.
Here's a suggested improvement:
-SaveIt.Migration.Typesense.Photo.migrate_photos_2024_10_29!()
+# Wrap migration in a try-catch block with proper logging
+try do
+ IO.puts("Starting photo migration to add file_id field...")
+
+ # Validate environment and prerequisites
+ unless System.get_env("ALLOW_DESTRUCTIVE_MIGRATION") == "true" do
+ raise "Migration requires ALLOW_DESTRUCTIVE_MIGRATION=true"
+ end
+
+ # Run the migration
+ case SaveIt.Migration.Typesense.Photo.migrate_photos_2024_10_29!() do
+ {:ok, count} ->
+ IO.puts("Successfully migrated #{count} photos")
+ {:error, reason} ->
+ IO.puts("Migration failed: #{inspect(reason)}")
+ exit({:shutdown, 1})
+ end
+rescue
+ e ->
+ IO.puts("Migration failed: #{inspect(e)}")
+ exit({:shutdown, 1})
+end
Committable suggestion was skipped due to low confidence.
@@ -2,6 +2,8 @@ defmodule SaveItTest do | |||
use ExUnit.Case | |||
doctest SaveIt | |||
|
|||
doctest SaveIt.Migration.Typesense.Photo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Migration module requires dedicated test coverage beyond doctests
The SaveIt.Migration.Typesense.Photo
module contains critical data migration operations including collection creation, schema updates, and data transformations. The current test coverage consisting of only doctests is insufficient for such complex operations. You should:
- Create a dedicated test file
test/migration/typesense/photo_test.exs
- Add test cases for:
- Collection creation with proper schema validation
- Data migration with URL to file_id transformation
- Field updates and drops
- Reset functionality
- Error handling scenarios
The module's complexity and its data-altering nature demand comprehensive test coverage to ensure reliability and data integrity during migrations.
🔗 Analysis chain
Consider expanding test coverage for migration operations.
While adding doctests is good practice, migration operations typically require more comprehensive testing scenarios. Consider adding explicit test cases that verify:
- Success and failure scenarios of migration
- Data integrity before and after migration
- Error handling during migration
Let's verify the migration module's test coverage:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for additional test files related to migration
fd -e exs migration test/
# Look for existing migration-related tests
rg -l "Migration.Typesense.Photo" test/
Length of output: 119
Script:
#!/bin/bash
# Let's try a different approach to find migration-related tests and documentation
# Search for any test files containing "typesense" or "migration"
rg -l -i "typesense|migration" test/
# Look for module documentation in lib directory
rg -l "SaveIt.Migration.Typesense.Photo" lib/
# Search for @doc or @moduledoc strings in potential migration files
rg -A 5 "@(doc|moduledoc)" lib/ test/
Length of output: 1491
Script:
#!/bin/bash
# Let's examine the content of the migration module and its tests
# Check the content of the migration module
cat lib/save_it/migration/typesense/photo.ex
# Check the content of the test file
cat test/save_it_test.exs
Length of output: 2172
lib/save_it/migration/typesense.ex
Outdated
def create_collection!(schema) do | ||
req = build_request("/collections") | ||
res = Req.post!(req, json: schema) | ||
|
||
Typesense.handle_response!(res) | ||
end | ||
|
||
def update_collection!(collection_name, schema) do | ||
req = build_request("/collections/#{collection_name}") | ||
res = Req.patch!(req, json: schema) | ||
|
||
Typesense.handle_response!(res) | ||
end | ||
|
||
def delete_collection!(collection_name) do | ||
req = build_request("/collections/#{collection_name}") | ||
res = Req.delete!(req) | ||
|
||
Typesense.handle_response!(res) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation and typespecs for public functions.
Each public function should have @doc
and @spec
annotations to document its behavior, parameters, and return types.
+ @doc """
+ Creates a new Typesense collection with the given schema.
+
+ ## Parameters
+ * schema - Map containing the collection configuration
+
+ ## Returns
+ * Map with collection details on success
+ * Raises an error if the operation fails
+ """
+ @spec create_collection!(map()) :: map()
def create_collection!(schema) do
URI encode collection names in URLs.
The collection name should be URI encoded to prevent potential injection attacks and handle special characters correctly.
def update_collection!(collection_name, schema) do
- req = build_request("/collections/#{collection_name}")
+ req = build_request("/collections/#{URI.encode(collection_name)}")
Committable suggestion was skipped due to low confidence.
lib/small_sdk/typesense.ex
Outdated
def list_documents(collection_name, opts \\ []) do | ||
page = Keyword.get(opts, :page, 1) | ||
per_page = Keyword.get(opts, :per_page, 10) | ||
req = build_request("/collections/#{collection_name}/documents") | ||
|
||
res = | ||
Req.get(req, | ||
params: %{ | ||
q: "*", | ||
query_by: "", | ||
page: page, | ||
per_page: per_page | ||
} | ||
) | ||
|
||
handle_response(res) || [] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding sorting and filtering to list_documents.
While the pagination implementation is good, the function could be more flexible by supporting sorting and filtering options.
Consider extending the options:
def list_documents(collection_name, opts \\ []) do
page = Keyword.get(opts, :page, 1)
per_page = Keyword.get(opts, :per_page, 10)
+ sort_by = Keyword.get(opts, :sort_by, "created_at:desc")
+ filter_by = Keyword.get(opts, :filter_by, "")
req = build_request("/collections/#{collection_name}/documents")
res =
Req.get(req,
params: %{
q: "*",
query_by: "",
page: page,
- per_page: per_page
+ per_page: per_page,
+ sort_by: sort_by,
+ filter_by: filter_by
}
)
handle_response(res) || []
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def list_documents(collection_name, opts \\ []) do | |
page = Keyword.get(opts, :page, 1) | |
per_page = Keyword.get(opts, :per_page, 10) | |
req = build_request("/collections/#{collection_name}/documents") | |
res = | |
Req.get(req, | |
params: %{ | |
q: "*", | |
query_by: "", | |
page: page, | |
per_page: per_page | |
} | |
) | |
handle_response(res) || [] | |
end | |
def list_documents(collection_name, opts \\ []) do | |
page = Keyword.get(opts, :page, 1) | |
per_page = Keyword.get(opts, :per_page, 10) | |
sort_by = Keyword.get(opts, :sort_by, "created_at:desc") | |
filter_by = Keyword.get(opts, :filter_by, "") | |
req = build_request("/collections/#{collection_name}/documents") | |
res = | |
Req.get(req, | |
params: %{ | |
q: "*", | |
query_by: "", | |
page: page, | |
per_page: per_page, | |
sort_by: sort_by, | |
filter_by: filter_by | |
} | |
) | |
handle_response(res) || [] | |
end |
def search_documents!(collection_name, opts) do | ||
q = Keyword.get(opts, :q, "*") | ||
query_by = Keyword.get(opts, :query_by, "") | ||
filter_by = Keyword.get(opts, :filter_by, "") | ||
|
||
query_params = %{ | ||
q: q, | ||
query_by: query_by, | ||
filter_by: filter_by, | ||
exclude_fields: "image_embedding" | ||
} | ||
|
||
req = build_request("/collections/#{collection_name}/documents/search") | ||
res = Req.get(req, params: query_params) | ||
data = handle_response(res) | ||
|
||
data["hits"] |> Enum.map(&Map.get(&1, "document")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding search result metadata.
The search_documents!/2
function currently returns only the documents. Consider returning additional metadata like total hits, processing time, etc., which could be useful for the client.
Consider returning a map with additional metadata:
def search_documents!(collection_name, opts) do
# ... existing code ...
data = handle_response(res)
- data["hits"] |> Enum.map(&Map.get(&1, "document"))
+ %{
+ documents: data["hits"] |> Enum.map(&Map.get(&1, "document")),
+ found: data["found"],
+ processing_time_ms: data["search_time_ms"],
+ total_pages: ceil(data["found"] / (Keyword.get(opts, :per_page, 10)))
+ }
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def search_documents!(collection_name, opts) do | |
q = Keyword.get(opts, :q, "*") | |
query_by = Keyword.get(opts, :query_by, "") | |
filter_by = Keyword.get(opts, :filter_by, "") | |
query_params = %{ | |
q: q, | |
query_by: query_by, | |
filter_by: filter_by, | |
exclude_fields: "image_embedding" | |
} | |
req = build_request("/collections/#{collection_name}/documents/search") | |
res = Req.get(req, params: query_params) | |
data = handle_response(res) | |
data["hits"] |> Enum.map(&Map.get(&1, "document")) | |
def search_documents!(collection_name, opts) do | |
q = Keyword.get(opts, :q, "*") | |
query_by = Keyword.get(opts, :query_by, "") | |
filter_by = Keyword.get(opts, :filter_by, "") | |
query_params = %{ | |
q: q, | |
query_by: query_by, | |
filter_by: filter_by, | |
exclude_fields: "image_embedding" | |
} | |
req = build_request("/collections/#{collection_name}/documents/search") | |
res = Req.get(req, params: query_params) | |
data = handle_response(res) | |
%{ | |
documents: data["hits"] |> Enum.map(&Map.get(&1, "document")), | |
found: data["found"], | |
processing_time_ms: data["search_time_ms"], | |
total_pages: ceil(data["found"] / (Keyword.get(opts, :per_page, 10))) | |
} |
def reset!() do | ||
drop_photos!() | ||
create_photos_20241024!() | ||
migrate_photos_2024_10_29!() | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsider the necessity of migrating data after resetting the collection
In the reset!/0
function, after dropping and recreating the collection, it will be empty. Running migrate_photos_2024_10_29!/0
may not be necessary, as there will be no documents to migrate. Consider removing this step or seeding the collection with data before migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- lib/save_it/migration/typesense/photo.ex (1 hunks)
- priv/typesense/2024-10-29_photos_url_to_file_id.ex (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- priv/typesense/2024-10-29_photos_url_to_file_id.ex
TypesenseDataClient.list_documents(@collection_name, per_page: 10000) | ||
|> Enum.each(fn doc -> | ||
id = doc["id"] | ||
|
||
file_id = | ||
doc["url"] | ||
|> String.split("/") | ||
|> List.last() | ||
|
||
TypesenseDataClient.update_document!(@collection_name, id, %{ | ||
"file_id" => file_id | ||
}) | ||
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure proper handling for collections exceeding 10,000 documents
The migrate_photos_20241029!/0
function fetches documents with per_page: 10_000
. If the collection contains more than 10,000 documents, some documents will not be migrated, leading to incomplete data updates. Consider implementing pagination to process all documents in the collection.
Would you like assistance in modifying the code to handle pagination for collections larger than 10,000 documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (2)
lib/save_it/migration/typesense.ex (1)
1-5
: Add module documentation.
The module should have a @moduledoc
annotation describing its purpose, responsibilities, and usage examples.
defmodule SaveIt.Migration.Typesense do
+ @moduledoc """
+ Provides functions for managing Typesense collections.
+
+ This module handles the creation, updating, listing, and deletion of Typesense collections
+ through HTTP requests to the Typesense API.
+
+ ## Examples
+
+ schema = %{
+ name: "photos",
+ fields: [
+ %{name: "caption", type: "string"}
+ ]
+ }
+
+ # Create a new collection
+ SaveIt.Migration.Typesense.create_collection!(schema)
+ """
alias SmallSdk.Typesense
lib/small_sdk/typesense.ex (1)
Line range hint 77-86
: Consider making search key parameters configurable.
The function has hardcoded values for description, actions, and collections. Consider making these configurable to improve reusability.
- def create_search_key() do
+ def create_search_key(opts \\ []) do
+ description = Keyword.get(opts, :description, "Search-only photos key")
+ actions = Keyword.get(opts, :actions, ["documents:search"])
+ collections = Keyword.get(opts, :collections, ["photos"])
{url, _} = get_env()
req = build_request("/keys")
res =
Req.post(req,
json: %{
- "description" => "Search-only photos key",
- "actions" => ["documents:search"],
- "collections" => ["photos"]
+ "description" => description,
+ "actions" => actions,
+ "collections" => collections
}
)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- lib/save_it/migration/typesense.ex (1 hunks)
- lib/save_it/migration/typesense/photo.ex (1 hunks)
- lib/small_sdk/typesense.ex (3 hunks)
🔇 Additional comments (5)
lib/save_it/migration/typesense.ex (2)
34-53
: LGTM! Well-structured helper functions.
The helper functions are well-implemented:
- URL validation is properly handled
- Environment variables are safely accessed with fetch_env!
- Request headers are correctly set
1-54
: Verify integration with dependent modules.
Let's verify the integration with PhotoService and SaveIt.Bot modules mentioned in the summary.
✅ Verification successful
Let's gather more specific information about these integrations.
Let's check the content of the photo migration file to understand the complete integration.
Integration verification completed successfully
The Typesense module is correctly integrated within the application:
- PhotoService directly uses SmallSdk.Typesense for document operations
- SaveIt.Migration.Typesense provides collection management functionality used by Photo migrations
- Bot module correctly interacts with PhotoService for photo operations including search and creation
- The schema and migrations are properly defined with all required fields including image embeddings
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check integration with PhotoService and Bot modules
# Check for PhotoService usage of Typesense functions
echo "Checking PhotoService integration..."
rg -l "SaveIt.Migration.Typesense" "lib/save_it/photo_service.ex"
# Check for Bot module's photo-related functionality
echo "Checking Bot integration..."
rg -l "PhotoService" "lib/save_it/bot.ex"
Length of output: 270
Script:
#!/bin/bash
# Check the actual usage context in Bot module
echo "Checking Bot module's PhotoService usage..."
rg "PhotoService" -B 2 -A 2 "lib/save_it/bot.ex"
# Check for any Typesense-related code in PhotoService
echo "Checking PhotoService for Typesense references..."
rg "typesense" -i "lib/save_it/photo_service.ex"
# Check for any collection operations in PhotoService
echo "Checking PhotoService for collection operations..."
rg "collection" -i "lib/save_it/photo_service.ex"
# Look for other modules using Typesense
echo "Checking other Typesense usage..."
rg -l "SaveIt.Migration.Typesense" "lib/"
Length of output: 2468
Script:
#!/bin/bash
# Check the photo migration file content
echo "Checking photo migration implementation..."
cat "lib/save_it/migration/typesense/photo.ex"
# Check for any schema definitions
echo "Checking for schema definitions..."
rg "schema" -B 2 -A 5 "lib/save_it/migration/typesense/photo.ex"
Length of output: 2572
lib/small_sdk/typesense.ex (3)
4-4
: LGTM! Good addition of URL validation.
The import of validate_url!/1
enhances input validation for the Typesense URL.
95-95
: LGTM! Good use of URL validation.
The addition of URL validation using validate_url!/1
improves the robustness of the configuration handling.
114-158
: Skip: Response handling improvements already suggested.
The response handling functions have some inconsistencies and duplications that were already addressed in previous review comments.
def create_collection!(schema) do | ||
req = build_request("/collections") | ||
res = Req.post!(req, json: schema) | ||
|
||
Typesense.handle_response!(res) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for HTTP request.
The Req.post!/2
call could fail with network or server errors. Consider using Req.post/2
with proper error handling.
def create_collection!(schema) do
req = build_request("/collections")
- res = Req.post!(req, json: schema)
-
- Typesense.handle_response!(res)
+ case Req.post(req, json: schema) do
+ {:ok, res} -> Typesense.handle_response!(res)
+ {:error, error} -> raise "Failed to create collection: #{inspect(error)}"
+ end
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def create_collection!(schema) do | |
req = build_request("/collections") | |
res = Req.post!(req, json: schema) | |
Typesense.handle_response!(res) | |
end | |
def create_collection!(schema) do | |
req = build_request("/collections") | |
case Req.post(req, json: schema) do | |
{:ok, res} -> Typesense.handle_response!(res) | |
{:error, error} -> raise "Failed to create collection: #{inspect(error)}" | |
end | |
end |
def update_collection!(collection_name, schema) do | ||
req = build_request("/collections/#{collection_name}") | ||
res = Req.patch!(req, json: schema) | ||
|
||
Typesense.handle_response!(res) | ||
end | ||
|
||
def list_collections!() do | ||
req = build_request("/collections") | ||
res = Req.get!(req) | ||
|
||
Typesense.handle_response!(res) | ||
end | ||
|
||
def delete_collection!(collection_name) do | ||
req = build_request("/collections/#{collection_name}") | ||
res = Req.delete!(req) | ||
|
||
Typesense.handle_response!(res) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URI encode collection names in URLs.
Collection names should be URI encoded to prevent injection attacks and handle special characters correctly.
def update_collection!(collection_name, schema) do
- req = build_request("/collections/#{collection_name}")
+ req = build_request("/collections/#{URI.encode(collection_name)}")
res = Req.patch!(req, json: schema)
Typesense.handle_response!(res)
end
def delete_collection!(collection_name) do
- req = build_request("/collections/#{collection_name}")
+ req = build_request("/collections/#{URI.encode(collection_name)}")
res = Req.delete!(req)
Typesense.handle_response!(res)
end
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def update_collection!(collection_name, schema) do | |
req = build_request("/collections/#{collection_name}") | |
res = Req.patch!(req, json: schema) | |
Typesense.handle_response!(res) | |
end | |
def list_collections!() do | |
req = build_request("/collections") | |
res = Req.get!(req) | |
Typesense.handle_response!(res) | |
end | |
def delete_collection!(collection_name) do | |
req = build_request("/collections/#{collection_name}") | |
res = Req.delete!(req) | |
Typesense.handle_response!(res) | |
end | |
def update_collection!(collection_name, schema) do | |
req = build_request("/collections/#{URI.encode(collection_name)}") | |
res = Req.patch!(req, json: schema) | |
Typesense.handle_response!(res) | |
end | |
def list_collections!() do | |
req = build_request("/collections") | |
res = Req.get!(req) | |
Typesense.handle_response!(res) | |
end | |
def delete_collection!(collection_name) do | |
req = build_request("/collections/#{URI.encode(collection_name)}") | |
res = Req.delete!(req) | |
Typesense.handle_response!(res) | |
end |
def list_documents(collection_name, opts \\ []) do | ||
page = Keyword.get(opts, :page, 1) | ||
per_page = Keyword.get(opts, :per_page, 10) | ||
req = build_request("/collections/#{collection_name}/documents/search") | ||
|
||
res = | ||
Req.get(req, | ||
params: %{ | ||
q: "*", | ||
query_by: "caption", | ||
page: page, | ||
per_page: per_page | ||
} | ||
) | ||
|
||
data = handle_response(res) | ||
|
||
data["hits"] |> Enum.map(&Map.get(&1, "document")) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider consolidating search functionality.
The list_documents/2
and search_documents!/2
functions share similar logic for searching documents. Consider extracting the common functionality into a private helper function to reduce code duplication.
+ defp search_documents_internal(collection_name, params) do
+ req = build_request("/collections/#{collection_name}/documents/search")
+ res = Req.get(req, params: params)
+ data = handle_response(res)
+ data["hits"] |> Enum.map(&Map.get(&1, "document"))
+ end
def list_documents(collection_name, opts \\ []) do
page = Keyword.get(opts, :page, 1)
per_page = Keyword.get(opts, :per_page, 10)
query_by = Keyword.get(opts, :query_by, "caption")
- req = build_request("/collections/#{collection_name}/documents/search")
- # ...
- data = handle_response(res)
- data["hits"] |> Enum.map(&Map.get(&1, "document"))
+ search_documents_internal(collection_name, %{
+ q: "*",
+ query_by: query_by,
+ page: page,
+ per_page: per_page
+ })
end
Also applies to: 33-49
Req.get(req, | ||
params: %{ | ||
q: "*", | ||
query_by: "caption", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider making query_by field configurable.
The query_by
parameter is hardcoded to "caption", which limits the function's reusability. Consider making it configurable through the opts
parameter.
def list_documents(collection_name, opts \\ []) do
page = Keyword.get(opts, :page, 1)
per_page = Keyword.get(opts, :per_page, 10)
+ query_by = Keyword.get(opts, :query_by, "caption")
# ...
params: %{
q: "*",
- query_by: "caption",
+ query_by: query_by,
Committable suggestion was skipped due to low confidence.
def migrate_photos_data_20241029 do | ||
Logger.info("migrating photos documents") | ||
|
||
docs = | ||
TypesenseDataClient.list_documents(@collection_name, per_page: 200) | ||
|
||
count = | ||
Enum.map(docs, fn doc -> | ||
id = doc["id"] | ||
|
||
file_id = | ||
doc["url"] | ||
|> String.split("/") | ||
|> List.last() | ||
|
||
TypesenseDataClient.update_document!(@collection_name, id, %{ | ||
"file_id" => file_id | ||
}) | ||
end) | ||
|> Enum.count() | ||
|
||
Logger.info("migrated #{count} photos") | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invoke migrate_photos_data_20241029
after updating the collection
The function migrate_photos_data_20241029
is defined but not called anywhere in the module. To ensure that the data migration occurs after the collection schema is updated, consider invoking this function in the reset!/0
function or at the appropriate place in your migration process.
You can modify the reset!/0
function to include the data migration:
def reset!() do
drop_photos!()
create_photos_20241024!()
migrate_photos_20241029!()
+ migrate_photos_data_20241029()
end
Committable suggestion was skipped due to low confidence.
TypesenseDataClient.update_document!(@collection_name, id, %{ | ||
"file_id" => file_id | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for document updates
The TypesenseDataClient.update_document!/3
function may raise exceptions if an update fails. To ensure robustness, consider adding error handling to manage failed updates and log any errors without interrupting the entire migration process.
You can modify the update logic to handle errors gracefully:
try do
TypesenseDataClient.update_document!(@collection_name, id, %{
"file_id" => file_id
})
+ rescue
+ exception ->
+ Logger.error("Failed to update document #{id}: #{Exception.message(exception)}")
+ :error
end
Committable suggestion was skipped due to low confidence.
count = | ||
Enum.map(docs, fn doc -> | ||
id = doc["id"] | ||
|
||
file_id = | ||
doc["url"] | ||
|> String.split("/") | ||
|> List.last() | ||
|
||
TypesenseDataClient.update_document!(@collection_name, id, %{ | ||
"file_id" => file_id | ||
}) | ||
end) | ||
|> Enum.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use Enum.each
instead of Enum.map
for side effects
Since the primary purpose of iterating over docs
is to perform side effects (updating documents), using Enum.each
is more appropriate than Enum.map
. This improves code readability and clearly conveys intent.
Apply this diff to make the change:
- count =
- Enum.map(docs, fn doc ->
+ Enum.each(docs, fn doc ->
id = doc["id"]
file_id =
doc["url"]
|> String.split("/")
|> List.last()
TypesenseDataClient.update_document!(@collection_name, id, %{
"file_id" => file_id
})
- end)
- |> Enum.count()
-
- Logger.info("migrated #{count} photos")
+ end)
+
+ count = Enum.count(docs)
+ Logger.info("migrated #{count} photos")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
count = | |
Enum.map(docs, fn doc -> | |
id = doc["id"] | |
file_id = | |
doc["url"] | |
|> String.split("/") | |
|> List.last() | |
TypesenseDataClient.update_document!(@collection_name, id, %{ | |
"file_id" => file_id | |
}) | |
end) | |
|> Enum.count() | |
Enum.each(docs, fn doc -> | |
id = doc["id"] | |
file_id = | |
doc["url"] | |
|> String.split("/") | |
|> List.last() | |
TypesenseDataClient.update_document!(@collection_name, id, %{ | |
"file_id" => file_id | |
}) | |
end) | |
count = Enum.count(docs) | |
Logger.info("migrated #{count} photos") |
TypesenseDataClient.list_documents(@collection_name, per_page: 200) | ||
|
||
count = | ||
Enum.map(docs, fn doc -> | ||
id = doc["id"] | ||
|
||
file_id = | ||
doc["url"] | ||
|> String.split("/") | ||
|> List.last() | ||
|
||
TypesenseDataClient.update_document!(@collection_name, id, %{ | ||
"file_id" => file_id | ||
}) | ||
end) | ||
|> Enum.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement pagination to migrate all documents
Currently, TypesenseDataClient.list_documents/2
is called with per_page: 200
, which limits the fetched documents to 200. If the collection contains more than 200 documents, some documents will not be migrated, leading to incomplete data updates. To ensure all documents are processed, implement pagination to fetch and migrate all documents.
Here's a possible implementation using recursion to handle pagination:
def migrate_photos_data_20241029 do
Logger.info("migrating photos documents")
- docs =
- TypesenseDataClient.list_documents(@collection_name, per_page: 200)
-
- count =
- Enum.map(docs, fn doc ->
- # update logic
- end)
- |> Enum.count()
+ migrate_all_documents(1, 0)
+
+ Logger.info("completed migrating all photos")
end
+
+defp migrate_all_documents(page, total_count) do
+ per_page = 200
+ docs = TypesenseDataClient.list_documents(@collection_name, page: page, per_page: per_page)
+
+ count = Enum.count(docs)
+
+ Enum.each(docs, fn doc ->
+ # update logic
+ end)
+
+ Logger.info("migrated page #{page} with #{count} photos")
+
+ if count == per_page do
+ migrate_all_documents(page + 1, total_count + count)
+ else
+ Logger.info("migrated total of #{total_count + count} photos")
+ end
+end
Committable suggestion was skipped due to low confidence.
Tests
合并前必须测试
Summary by CodeRabbit
Release Notes
New Features
Documentation
Bug Fixes
Chores
.gitignore
for better organization of ignored files.SmallSdk.TypesenseAdmin
module to streamline functionality.Tests