Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use md5 hash in file name for cache busting instead of the timestamp query parameter #12

Open
nickjj opened this issue Jan 11, 2020 · 4 comments
Labels

Comments

@nickjj
Copy link

nickjj commented Jan 11, 2020

Hi,

Have you given any thoughts on using an md5 hash to uniquely identify versions? This is a best practice when it comes to cache busting vs using query parameters.

I know this introduces a fair bit more complexity but it seems like a good long term solution. Phoenix Digest has all of the code related to pulling this off as well.

@achempion
Copy link
Member

Hi, thank you for the question.

Do you have an example when it would be useful and what issues do we have now with the current approach?

I think keeping the original file name is more user friendly approach than just generation some random string.

To address your question, you can define a version name function which allows you to pick the filename by yourself.

@nickjj
Copy link
Author

nickjj commented Jan 13, 2020

Certain browsers don't take the query string into account. Also certain proxies will ignore them when it comes to caching. In both cases that means the cache hit never happens.

Using query strings for versioning files has been openly not recommended for over 10 years. If you Google around for the topic you'll find dozens or more in depth answers on why not.

That's why Phoenix, Rails, Django and other frameworks all use md5 hashes for their cache busting strategy.

A think a reasonable work flow would be to hash the file on upload and then add the md5 hash as part of an attribute that gets saved to the database for that file so that it can be retrieved at run time. I didn't think the implementation through 100% but I don't know if there's any other way? The md5 hash would need to be stored in the database, otherwise there would be no way to know it later.

It's to my understanding we're doing a DB lookup at run time as is to look at the updated time stamp, so it shouldn't cause any change in run time performance.

Can you give a code example on how defining a version name function will work in practice?

For example, here's my current implementation for handling a user photo.

defmodule Hello.Files.File do
  defmacro __using__(types) do
    quote do
      use Waffle.Definition
      use Waffle.Ecto.Definition

      def __storage, do: Waffle.Storage.Local

      def default_url(_version, _scope), do: "/images/defaults/anonymous.png"

      def expanded_dir(path) do
        Application.fetch_env!(:waffle, :storage_dir) <> path
      end

      def validate({file, _}) do
        Enum.member?(unquote(types), file_type(file))
      end

      def mime_type(file) do
        case file_type(file) do
          :jpg -> "image/jpg"
          :jpeg -> "image/jpg"
          :png -> "image/png"
        end
      end

      defp file_type(file) do
        file.file_name
        |> Path.extname()
        |> String.replace(".", "")
        |> String.downcase()
        |> String.to_atom()
      end

      defp hashed(id), do: Hello.Hashid.encode(id)

      defp source(scope), do: scope.__meta__.source
    end
  end
end
defmodule Hello.Files.Profile do
  use Hello.Files.File, [:jpg, :jpeg, :png]

  @versions [:large, :medium, :small, :tiny]

  def storage_dir(_version, {_file, scope}),
    do: expanded_dir("/photos/#{source(scope)}/#{hashed(scope.id)}")

  def filename(version, _), do: "photo-#{version}"

  def transform(version, _scope) do
    {:convert, "-strip -resize #{dimensions(version)} -format png", :png}
  end

  defp dimensions(:large), do: "600x600"
  defp dimensions(:medium), do: "300x300"
  defp dimensions(:small), do: "150x150"
  defp dimensions(:tiny), do: "40x40"
end

This creates a photos/xxx/photo-tiny.png, photos/xxx/photo-small.png, and so on files. How could this be adjusted to include the md5 hash, and also adjust the Ecto type to support that along with writing that information to the DB on file upload and reading it out when I access the file?

@nickjj
Copy link
Author

nickjj commented Aug 8, 2020

Is this something still up for discussion?

If it won't be added to the library itself, what would the steps look like to implement something like the above for a specific project? I've kind of stepped away from the project I was working on for ~6 months, but I'm thinking about resuming it and this feature is kind of critical to the app. Not really sure how to move forward.

@achempion
Copy link
Member

achempion commented Aug 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants