Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initiating some documentation using Doxygen & Sphinx #25

Conversation

Lucas-C
Copy link
Contributor

@Lucas-C Lucas-C commented Jan 15, 2025

Initiating some HTML documentation generation.
Based on this tutorial: Clear, Functional C++ Documentation with Sphinx + Breathe + Doxygen + CMake

Remains to be done:

  • finish GitHub Actions pipeline in .github/workflows/unix.yml
  • improve logging docstrings & tutorial
  • (optional) integrate documentation generation in CMakeLists.txt

Preview

image

@felixguendling
Copy link
Member

Thank you! Until now we didn't use Doxgen (or any documentation generator at all). But if it helps you I am happy to merge it.

Just please don't write documentation comments like this:

/// \param msg  the message
void print(char const* msg);

I don't think that telling the reader that the message parameter is the message helps.

I would rather like to document the non-obvious stuff and leave the obvious stuff undocumented.

Having examples, usage instructions and documenting edge cases and non-obvious things is of course very welcome.

@felixguendling
Copy link
Member

Maybe as an acronym, MOTIS should be spelled uppercase (Motis -> MOTIS).

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 15, 2025

Thank you! Until now we didn't use Doxgen (or any documentation generator at all). But if it helps you I am happy to merge it.

Yes, I think that having documentation on core utilities like logging is important for the dev contributors 🙂

Just please don't write documentation comments like this:

/// \param msg  the message
void print(char const* msg);

I don't think that telling the reader that the message parameter is the message helps.

I would rather like to document the non-obvious stuff and leave the obvious stuff undocumented.

Having examples, usage instructions and documenting edge cases and non-obvious things is of course very welcome.

I totally agree 💯

I like those articles on this topic:

  1. Comments should not duplicate the code.
  2. Good comments do not excuse unclear code.
  3. If you can’t write a clear comment, there may be a problem with the code.
  4. Comments should dispel confusion, not cause it.
  5. Explain unidiomatic code in comments.
  6. Provide links to the original source of copied code.
  7. Include links to external references where they will be most helpful.
  8. Add comments when fixing bugs.
  9. Use comments to mark incomplete implementations.

8 comment-writing guidelines to follow

  1. Make an effort to understand all the code you’re dealing with
  2. Never restate your code functionality in the comments
  3. Respect documentation comments
  4. You may write marvelous comments, but that does not allow you to write poor code
  5. Comments will become out of date only if you let them
  6. Never commit commented-out code
  7. Our comments should be a living form of our specifications
  8. Self-commenting code should live along with intent comments

Would you agree to merge this PR first, when it's ready,
or would you prefer that we first design a clean shared logging function/API?

@felixguendling
Copy link
Member

For this PR we can focus on merging docs generation.
Logging can be another PR.

@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch from 7f2b564 to a4fff0b Compare January 15, 2025 15:00
@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch 2 times, most recently from 2f0ba76 to e164c0c Compare January 15, 2025 15:21
@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 15, 2025

For this PR we can focus on merging docs generation. Logging can be another PR.

In the end I implemented both in this PR.

Is that OK with you?

The benefit for me is that I did not waste time documenting the old logging function 😅

@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch from e164c0c to e363b76 Compare January 15, 2025 15:23
@Lucas-C Lucas-C marked this pull request as ready for review January 15, 2025 15:29
@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch 3 times, most recently from 29255a7 to a5afa73 Compare January 15, 2025 15:35
@felixguendling
Copy link
Member

felixguendling commented Jan 15, 2025

Ok, we can merge it also together.

For logging we can merge it for now but in the end we should settle for a structured logging interface that makes use of additional fields. At least this would be in line with what OTEL offers.

Example:

utl::err(
    "nigiri.loader.gtfs.stops",
    {{"dataset", dataset},
     {"stop_id", stop_id}},
    "{}: parent stop ID {} not found",
    dataset, stop_id);

This way, you would be able to search in aggregation tools for messages regarding a specific stop or a specific dataset in this case (or whatever other fields if you add them in a structured way).

I would also try if std::source_location is able to replace those macros. The example on cppreference.com looks like its perfect for logging.

@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch from a5afa73 to 6acf5d1 Compare January 15, 2025 15:36
@felixguendling
Copy link
Member

Regarding the interface, I would add shorthands:

  • utl::info(...) = utl::log(utl::log_level::info, ...)
  • utl::dbg(...) = utl::log(utl::log_level::dbg, ...)
  • utl::err(...) = utl::log(utl::log_level::err, ...)

So we still have enum class log levels but also a convenient way to log with utl::info etc.

@felixguendling
Copy link
Member

Regarding the structured part (indexable fields), it could maybe be something like

using field_value_t = std::variant<std::string_view, std::uint64_t, std::int64_t, double>;
using fields_t = std::initializer_list<std::pair<std::string_view, field_value_t>>;

But I haven't looked deeply into OTEL if this is a good fit.

@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch 3 times, most recently from abdb361 to 91101c8 Compare January 15, 2025 16:03
@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 15, 2025

The CI pipeline is being a bit difficult to beat...
I need to find a way to target the all the clang-based builds using a C++ preprocessing instruction (#ifdef...).
Currently the compilation fails due to the fact that the syntax I used in logging.h is not the same between MSVC & Clang.

If that's OK with you, we could maybe merge this PR as it is,
and tackle your last comments in another PR?

Also, once this is merged, I'd like to volunteer to make nigiri use the new log() function instead of its own.

I'm out for today.
Have a nice evening.

@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch from 91101c8 to cfe6c9d Compare January 15, 2025 17:24
@felixguendling
Copy link
Member

is not the same between MSVC & Clang

The goal would be to eliminate all macro usage by using std::source_location and therefore compiler differences would not matter (hopefully).

@Lucas-C Lucas-C force-pushed the doc-with-doxygen-and-sphinx branch from cfe6c9d to b6ea79b Compare January 15, 2025 17:32
@felixguendling
Copy link
Member

felixguendling commented Jan 15, 2025

IMO macros have only one use case left where it's hard to replace them. That's when you want to avoid that arguments are created at all if the function should not be called.

Like here:

#define log_if_macro(condition, ...) \
  if (condition) { \
    // ... \
  }

template <typename... Args>
void log_if(bool const condition, Args... args) {
  if (condition) {
    // ...
  }
}

auto log = false;
log_if_macro(log, expensive_function_call());
log_if(log, expensive_function_call());

Here, the templated version doesn't prevent calling expensive_function_call() which can be a big disadvantage if you're in a hot loop.

Maybe a solution would be to hand a lambda and the lambda only gets called if the condition is met.

Condition could be the log level filter.

But my guess is that for >99% of the code where we want to log, the arguments are cheap to compute.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 15, 2025

IMO macros have only one use case left where it's hard to replace them. That's when you want to avoid that arguments are created at all if the function should not be called.

There's also the FILE_AND_LINE trick used in this PR (and used in the previous code) that is very handy to capture the current filename & line number.

But I agree that we should limit the use cases for macros to the minimum.

I think we maybe should adopt the builder pattern to add metadata to a log:

// Simple usage:
log(info, "Hello world!").end()
// Complex usage:
log(error, "String={} Int={}", "Hello", 42).ctx("http.get.resource").dataSet(data_set).stopId(stop_id).end()

What do you think of this syntax/usage?

@felixguendling felixguendling merged commit 609c85e into motis-project:master Jan 15, 2025
13 checks passed
@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

Thank you for accepting my contribution

@Lucas-C Lucas-C deleted the doc-with-doxygen-and-sphinx branch January 16, 2025 07:54
@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

@felixguendling
Copy link
Member

I did already yesterday:

image

Not sure what else I can do.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

The documentation is currently generated in the public/ folder, not docs/:
https://github.com/motis-project/utl/blob/master/.github/workflows/unix.yml#L168

Maybe that's what missing? 🙂

@felixguendling
Copy link
Member

That's not in the list:
image

I think another way would be to not use the gh-pages branch approach but GH Actions directly: https://github.com/motis-project/prima-mobile/blob/master/.github/workflows/gh-pages.yml

@felixguendling
Copy link
Member

felixguendling commented Jan 16, 2025

Regarding logging API this would work without macros.

https://gcc.godbolt.org/z/hfKhMTq8x

#include <source_location>
#include <iostream>
#include <iomanip>
#include <initializer_list>
#include <cinttypes>

#include "fmt/core.h"
#include "fmt/ostream.h"
#include "fmt/ranges.h"
#include "fmt/std.h"

namespace utl {

enum class log_level { debug, info, error };

log_level s_verbosity;

constexpr char const* to_str(log_level const level) {
  switch (level) {
    case log_level::debug: return "debug";
    case log_level::info: return "info";
    case log_level::error: return "error";
  }
  return "";
}

inline std::string now() {
  using clock = std::chrono::system_clock;
  auto const now = clock::to_time_t(clock::now());
  struct tm tmp {};
#if _MSC_VER >= 1400
  gmtime_s(&tmp, &now);
#else
  gmtime_r(&now, &tmp);
#endif

  std::stringstream ss;
  ss << std::put_time(&tmp, "%FT%TZ");
  return ss.str();
}


template <typename... Args>
struct log {    
    log(log_level const level, char const* ctx, fmt::format_string<Args...> fmt_str, Args&&... args,
        std::source_location const& loc = std::source_location::current())
            : level_{level},
              ctx_{ctx},
              args_{fmt_str, std::forward<Args>(args)...},
              loc_{loc}
    {}

    ~log() {
        if (level_ >= utl::s_verbosity) {
            fmt::print(std::clog, "{time} [{level}] [{file}:{line} {fn}] [{ctx}] {msg} {params}\n",
                    fmt::arg("file", loc_.file_name()),
                    fmt::arg("line", loc_.line()),
                    fmt::arg("fn", loc_.function_name()),
                    fmt::arg("time", now()),
                    fmt::arg("level", to_str(level_)),
                    fmt::arg("ctx", ctx_),
                    fmt::arg("msg", std::apply([](fmt::format_string<Args...> fmt, Args... args) { return fmt::format(fmt, args...); }, args_)),
                    fmt::arg("params", params_));
        }
    }

    void params(std::initializer_list<std::pair<std::string_view, std::string_view>>&& params) {
        params_ = std::move(params);
    }
    

    log_level level_;
    char const* ctx_;
    std::tuple<fmt::format_string<Args...> , Args...> args_;
    std::source_location loc_;
    std::initializer_list<std::pair<std::string_view, std::string_view>> params_;
};

template <typename... Args>
log(log_level, char const*, fmt::format_string<Args...>, Args&&... args) -> log<Args...>;


}  // namespace utl

int main() {
  auto dataset = "test";
  utl::log(utl::log_level::info, "nigiri.loader", "starting import: dataset={}", dataset).params({{"dataset", dataset}});
}

Having a global macro called log and logF is essentially asking for trouble (collisions with logarithm function, other logging functions, etc.). So for macros, they should be prefixed with something like UTL_ if we want to use them all over the codebase.

The log level severity probably has to be extern and not static to work. Otherwise each compilation unit has its own.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

That's not in the list:

I have it setup like this on fpdf2:
image

@MichaelKutzner
Copy link

I just had a look and the documentation looks promising. Just a few questions, as I haven't used GitHub Pages much, yet.

  • Is the gh-pages branch in the intended state? After checking fpdf2, the layout seems quite different.
    I also cannot find any public directory on that branch. Instead the files seem to be located at /html.
  • I realized some unicode symbols in the build pipeline. So I was wondering, if this is a common practice right now?

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

Is the gh-pages branch in the intended state? After checking fpdf2, the layout seems quite different.
I also cannot find any public directory on that branch. Instead the files seem to be located at /html.

There is indeed something missing, the docs are hosted there for now:
https://motis-project.github.io/utl/html/

I'm drafting a PR to fix this right now.

I realized some unicode symbols in the build pipeline. So I was wondering, if this is a common practice right now?

You mean the emojis at the end of the step names?
https://github.com/motis-project/utl/blob/master/.github/workflows/unix.yml#L144

From my humble experience, yes they are common in several GitHub projects 🙂

@Lucas-C Lucas-C mentioned this pull request Jan 16, 2025
2 tasks
@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

PR submitted: #26

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

Having a global macro called log and logF is essentially asking for trouble (collisions with logarithm function, other logging functions, etc.). So for macros, they should be prefixed with something like UTL_ if we want to use them all over the codebase.

Agreed.

If that's OK with you, I'll suggest an improved API in another PR tomorrow,
taking into account all the feedbacks made above in this PR.

@felixguendling
Copy link
Member

Except for the convenience functions, my implementation from #25 (comment) should be fine (refinements can of course be made if we hit problems in nigiri, osr, etc.).

Regarding the interface, I would add shorthands:

  • utl::info(...) = utl::log(utl::log_level::info, ...)
  • utl::dbg(...) = utl::log(utl::log_level::dbg, ...)
  • utl::err(...) = utl::log(utl::log_level::err, ...)

@MichaelKutzner
Copy link

Is the gh-pages branch in the intended state? After checking fpdf2, the layout seems quite different.
I also cannot find any public directory on that branch. Instead the files seem to be located at /html.

There is indeed something missing, the docs are hosted there for now: https://motis-project.github.io/utl/html/

I'm drafting a PR to fix this right now.

I realized some unicode symbols in the build pipeline. So I was wondering, if this is a common practice right now?

You mean the emojis at the end of the step names? https://github.com/motis-project/utl/blob/master/.github/workflows/unix.yml#L144

From my humble experience, yes they are common in several GitHub projects 🙂

Thank you for the reply. Great to see it working already.
Yes, that's what I was referring to. Thank you for the explanation.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

Except for the convenience functions, my implementation from #25 (comment) should be fine (refinements can of course be made if we hit problems in nigiri, osr, etc.).

Yes, this seems promising!
std::source_location::current() is nice!
And it's using the builder pattern, which will make extending it easier 👍

I have a few questions (some may be obvious to you, but I do not often code in C++ 😅) :

  • why using a struct and not a class? It's the convention in MOTIS?
  • what is this syntax? -> log<Args...>
  • why delaying the call to fmt::format() to the destructor? Wouldn't it be better to perform it in the constructor?
  • wouldn't it be better to have an explicit .end() method instead of relying on a call to the destructor to flush the logs? I'm not a C++ expert, and I'm unsure if it is enforced by the C++ spec that the object destructor will be destroyed immediately, and there is no risk of this to be delayed by the garbage collector...

@felixguendling
Copy link
Member

felixguendling commented Jan 16, 2025

  • Yes, this is a convention in MOTIS. The only difference between struct and class is that the default visibility is public for struct and private for class. Since we rarely do plain OOP, we mostly don't use public or private and just leave everything public. If we want to add abstractions (like getters + setters to hide implementation details), we can still add them later where needed. This way, the code stays minimalistic and doesn't get cluttered with OOP stuff where it's not useful.
  • This syntax is a user defined class template deduction guide which is necessary to help the compiler distinguish between the variadic template Args... and the next argument std::source_location which has a default value.
  • Delaying the call to print is required in order to allow the structured logging parameters later - if we already print before adding paramters with the builder pattern, then the extra parameters won't be printed
  • Having a end() method and printing there would also solve the problem from the previous point. However, I would prefer to not have the codebase cluttered with unnecessary end() calls if there's the simple solution to do the printing in the destructur. The destructor is called after the class stops to exist, which is immediately in case the return value of the constructor isn't assigned anywhere.

Don't worry. C++ is a bit tricky but you'll get used to it by reading and writing more and more of it (like with any other programming language).

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 16, 2025

Thank you for your answers 👍

Delaying the call to print is required in order to allow the structured logging parameters later - if we already print before adding paramters with the builder pattern, then the extra parameters won't be printed

Yes, I agree that the final log line needs to be assembled in the destructor.
But I was refering to this line in your code snippet:

fmt::arg("msg", std::apply([](fmt::format_string<Args...> fmt, Args... args) { return fmt::format(fmt, args...); }, args_)),

Couldn't be done in the constructor?

I opened PR #27, but I have trouble pleasing the C++ compiler...
I have issues like this:

error: class template argument deduction failed:
    9 |   log(llvl::info, "[{}] starting", name);
      |                                        ^
src/timer.cc:9:40: error: no matching function for call to ‘log(utl::llvl, const char [14], std::string&)’

include/utl/logging.h:51:5: note: candidate: ‘log(utl::llvl, fmt::v10::format_string<T ...>, const T&& ...)-> utl::log<T> [with T = {std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}; fmt::v10::format_string<T ...> = fmt::v10::basic_format_string<char, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >]’ (near match)
   51 |     log(llvl const level, fmt::format_string<T...> fmt_str, const T&&... args)
      |     ^~~
include/utl/logging.h:51:5: note:   conversion of argument 3 would be ill-formed:
src/timer.cc:9:36: error: cannot bind rvalue reference of type ‘const std::__cxx11::basic_string<char>&&’ to lvalue of type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’}
    9 |   log(llvl::info, "[{}] starting", name);
      |                                    ^~~~

@Lucas-C Lucas-C mentioned this pull request Jan 16, 2025
2 tasks
@felixguendling
Copy link
Member

Couldn't be done in the constructor?

As it's done now, probably yes. But the best solution would be to do only one formatting call with all parameters to not have any allocations in our code at all. Allocations (especially many small allocations) make programs slow and lead to memory fragmentation. We try to prevent them where ever possible. So the end goal would be to have only one formatting call and probably reuse the buffer used for formatting (thread local). But let's keep it simple for now. Still it's probably better to keep the formatting calls together, so at one point we can merge them into one (and add buffer reuse + OTEL).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants