Hello, guys!

I’m in process of moving my notes from Joplin, which is also a great tool, to Emacs 30.1. I use denote for managing notes.

I found a strange behavior when using org-publish: almost every note I created and exported using org-publish can’t be read by webserver. It happens when file name consists cyrillic letters. I’ve tried nginx, apache, python http.server, web-static-server. When I run a server and try to open html file in latin - it’s OK, but when there some cyrillic letters in file name - web serser tells me it can’t find file with this name like “%u…”. However when I open html files locally with Firefox everything works just fine.

So after a couple of days of reasearch I found that one reason for such behavior could be the wrong file name encoding. As far as I’m not an expert may be somebody can explain how to make emacs convert with org-publish notes in encoding that is readable for any web server?

My emacs config consists:

org-publish-project-alist '(
                            (
                             "notes"
                             :base-directory "~/org/denotes/"
                             :recursive nil
                             :publishing-directory "~/public_notes"
                             :section-numbers nil
                             :with-toc nil
                             :with-author nil
                             :with-creator nil
                             :with-date nil
                             :html-preamble "<nav><a href='index.html'>Notes</a></nav>"
                             :html-postamble nil
                             :auto-sitemap t
                             :sitemap-filename "index.org"
                             :sitemap-title "Notes"
                             :sitemap-sort-files anti-chronologically
                             )

Host is Debian 13. UTF-8 is the only encoding enabled in locales. Servers I’ve tried so far also run on Debian 13 with UTF-8.

  • midribbon_action@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    2
    ·
    11 days ago

    URIs can only contain ASCII characters, so the web server is receiving requests in ‘percent encoded’ form for urls, not in utf8, and so there is no way for the server to know which file to respond with. You’ll have to urlencode the filenames yourself unfortunately, so that they will match the incoming requests. The tool jq can urlencode cyrillic characters:

    echo "људиа" | jq -rR @uri
    

    You could probably do this as part of the build process if you are clever enough.

    This is only for the file name itself; the exported document should share the source document’s encoding unless overridden by the org-export-coding-system option.

    • midribbon_action@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      2
      ·
      11 days ago

      One more note on this is that while some searching did lead to webservers that can decode uris into utf before handling them, I believe this is very unsafe for a public server, and, in the worst case, could allow public access to your entire drive. There are vulnerabilities because different systems, and even different services on a single system, can treat specific unicode characters differently. My advice above to url-encode the filenames before serving or while building them would avoid the need for any decoding of requests as they come in.