Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filling in state of art section #9

Merged
merged 35 commits into from
Mar 9, 2023
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a4aea12
draft of first two subsections
staniewzki Feb 22, 2023
4ab20d7
existing tools subsection
staniewzki Feb 25, 2023
6f3726f
real life examples, some polishes
staniewzki Feb 28, 2023
87df4b4
remove todo comments
staniewzki Feb 28, 2023
2b839e7
remove trailing whitespace
staniewzki Feb 28, 2023
77e1d6a
Add another footnote
staniewzki Feb 28, 2023
de3cbd9
Added newlines in tex for easier vim navigation
tonowak Mar 1, 2023
98782db
Review adjustments
staniewzki Mar 3, 2023
b516c69
Review adjustments continued
staniewzki Mar 8, 2023
0c48d29
Minor changes, moving to cite from footnote
staniewzki Mar 8, 2023
eb04c98
Remove 'you'
staniewzki Mar 8, 2023
e1812e5
Resolved conversations
tonowak Mar 9, 2023
8b54ef6
Update thesis-en.tex
tonowak Mar 9, 2023
1e3a46d
Update thesis-en.tex
tonowak Mar 9, 2023
1a2e5f6
Remove all remaining uses of \footnote{}
staniewzki Mar 9, 2023
bfa83d6
Update thesis-en.tex
tonowak Mar 9, 2023
87f7d09
Update thesis-en.tex
tonowak Mar 9, 2023
10659e6
Update thesis-en.tex
tonowak Mar 9, 2023
66bcc1c
Update thesis-en.tex
tonowak Mar 9, 2023
0dd3b5c
Update thesis-en.tex
tonowak Mar 9, 2023
61b7df5
Update thesis-en.tex
tonowak Mar 9, 2023
bea7467
Update thesis-en.tex
tonowak Mar 9, 2023
5e8e4c7
Update thesis-en.tex
tonowak Mar 9, 2023
d717c31
Update thesis-en.tex
tonowak Mar 9, 2023
1e169ac
Update thesis-en.tex
tonowak Mar 9, 2023
f4150f7
Update thesis-en.tex
tonowak Mar 9, 2023
24bb968
Update thesis-en.tex
tonowak Mar 9, 2023
a26bb9b
Update thesis-en.tex
tonowak Mar 9, 2023
e0beeae
Update thesis-en.tex
tonowak Mar 9, 2023
c90a7b4
Update thesis-en.tex
tonowak Mar 9, 2023
e7a3132
Update thesis-en.tex
tonowak Mar 9, 2023
7a4c9ce
Update thesis-en.tex
tonowak Mar 9, 2023
a82e493
Update thesis-en.tex
tonowak Mar 9, 2023
b11e07d
Update thesis-en.tex
tonowak Mar 9, 2023
3f37094
Update thesis-en.tex
tonowak Mar 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 120 additions & 44 deletions thesis-en.tex
Original file line number Diff line number Diff line change
Expand Up @@ -319,56 +319,106 @@ \chapter{State of the art}\label{r:chapter_stateoftheart}

\section{Problems with using semver in Rust}\label{r:section_usageofsemver}

TODO:
\begin{itemize}
\item explain why it is easy to break semver in Rust.
Do that by giving specific, non-obvious code examples.
\item search for sources from which to get examples
\item explain other reasons as to why people tend to break semver
\item don't give yet real-life examples (those will be in the sections under),
write in a general way
\item make it clear that using semver in Rust is hard
\end{itemize}
It might seem easy to maintain semver, but some violations are hard to notice
when not actively searched for. Consider the following example:
\vspace{-3pt}
\begin{verbatim}
struct Foo {
x: String
}

pub struct Bar {
y: Foo
}
\end{verbatim}
\vspace{-5pt}

Changing {\ttfamily Foo.x} type from {\ttfamily String} to {\ttfamily Rc<str>}
causes semver break, even though it is a non-public field of a non-public struct.
That is because {\ttfamily String} implements {\ttfamily Send} and {\ttfamily Sync} traits
that are automatically derived, making both {\ttfamily Foo} and {\ttfamily Bar}
implement {\ttfamily Send} and {\ttfamily Sync}.
In contrary, {\ttfamily Rc<str>} implements neither of them,
so the change results in a publicly visible struct {\ttfamily Bar} losing a trait.

The given example is not only unobvious, but also even harder to notice
in large codebases, where those structs could be in completely different locations.
In fact, a similar error crept into the release v3.2.0 of a well-known crate
maintained by the Rust team -- {\ttfamily clap}.
More details about it can be found in section \ref{r:section_real_life_semver_breaks}.

The same issue almost happened
(but has been prevented thanks to our tool)
in another common library \texttt{rust-libp2p},
where it is clear from the conversation \cite{issue-libp2p} that the maintainers
were not expecting their type to stop being \texttt{UnwindSafe} and were likely not even aware that
their type was publicly \texttt{UnwindSafe} to start with.

\section{Consequences of breaking semver}

TODO:
\begin{itemize}
\item describe that breaking semver means that people's code stops compiling
\item describe the possible scale of catastrophes
\item don't give yet real-life examples, write in a general way
\end{itemize}
When a maintainer publishes a new version of their crate that is breaking semver,
it is causing a major inconvenience for the crate's users.
Their code might just stop compiling when the offending version gets downloaded.
This could also happen if the crate containing the violation is not an immediate dependency,
so one semver break could result in tons of other broken crates.

\section{Real-life examples of semver breaks}
Debugging a cryptic compilation error that starts showing up one day,
without any change to the code, can be frustrating. In fact, we have experienced it during our contributions
(one of the tool's users opened a GitHub Issue \cite{issue-compiling-fails}), as one of our dependencies broke semver. This is a major problem, as it might drive the users to stop using such crate.
tonowak marked this conversation as resolved.
Show resolved Hide resolved

staniewzki marked this conversation as resolved.
Show resolved Hide resolved
TODO:
Because of that, maintainers have to yank
the incorrect releases as soon as possible
-- otherwise more users would encounter this problem and their trust
in this particular crate (and crates using it as a dependency)
would decrease. Even though yanking the release seems easy, fixing the semver break could also result in a lot of additional work for the maintainers -- they have to investigate the semver break when it is reported, inform the users about the yanking and possibly help some move away from the faulty release.

\section{Real-life examples of semver breaks} \label{r:section_real_life_semver_breaks}

Some of popular Rust crates with millions of downloads happened to break semver:
\begin{itemize}
\item write (and cite) about cases our mentor mentioned in his blogs
\item write about cases users reported in the github issue
\item mention the paper describing that 43\% of yanked releases
are because of semver breaks and 3.7\% of all >300'000 releases are yanked
\item mention that we've developed
a script that scans all releases for the semver breaks
we can detect and the results are presented in some chapter
\item {\ttfamily pyo3 v0.5.1} accidentally changed a function signature \cite{pyo3-issue}
\item {\ttfamily clap v3.2.0} accidentally had a type stop implementing an auto-trait \cite{clap-issue}
\item multiple {\ttfamily block-buffer} versions accidentally broke their MSRV contract \cite{block-buffer-issue}
\item and many more. We have developed a script that scans all releases
for semver breaks we can detect. The results are covered in section \ref{r:section_scanning_script}
\end{itemize}

Those were examples of popular crates with experienced maintainers, but the problem is even more prominent in less used crates
where developers might not know the common semver pitfalls. A paper \cite{paper}
claims that out of the yanked (un-publised) releases,
semver break was the leading reason for yanking, with a shocking 43\% rate.
It also mentions that 3.7\% of all releases (and there is more than 300 000 of them already)
are yanked, which shows the scale of the problem -- thousands of detected semver breaks.

\section{Existing tools for detecting semver breaks}\label{r:section_existing_semver_tools}

TODO:
\begin{itemize}
\item list languages which have semver checking built-in,
explain that the language semantics were made for e.g. semver checking,
\item list current tools for detecting semver breaks in Rust:
cargo-breaking, rust-semverver, cargo-semver-checks
\item for the first two, explain a bit how they work and why they are no longer maintained.
Mieszko's slides have some info about that.
\item for cargo-semver-checks, explain a bit how it works (rustdoc, json, etc.)
and that contrary to the other two, it is maintained and it's made to be easily maintained.
Mention that this is the project we're working on.
\item research the current state of semver detection in other languages,
explain that it's hard to do in popular languages,
especially without features like rustdoc.
\end{itemize}
There are not many great tools for semver checking in existence.
The main reason for that is that the semantics of popular languages
make complete and automatic verification practically impossible.
There are some initiatives to combat this. For example,
the Elm languge\cite{elm-lang} by design enforces semantic versioning.
Its type system enables automatic detection of all API changes.
Outside of that, it does not appear that tools for checking semver
in estabilished languages like Python or C++ are commonly used in the industry.

Unfortunately, the Rust language's semantics were also not designed with semver in mind.
Despite this, there are some existing tools for semver checking.
First of them, \texttt{cargo-breaking}, works on the abstract syntax tree.
Although ASTs contain all the information needed for comparing API changes,
it has a major drawback -- two trees must be navigated at once.
It can get complex and tedious (especially when checking for moved or removed items), because the abstract syntax tree could change quite a lot,
even without any public API changes.
Another issue is that both language syntax and the structure of the abstract syntax tree
often change along with the development of the language, which makes maintenance time-consuming.
staniewzki marked this conversation as resolved.
Show resolved Hide resolved

The second existing tool is \texttt{rust-semverver}, which focuses on
the metadata present in the rust-specific rlib binary static library format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the metadata present in the rust-specific rlib binary static library format.
the metadata present in the rust-specific rlib binary static library format.

This line is a bit exhausting to understand, some simplifying maybe?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I just want to merge this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

@SmolSir SmolSir Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my what have I done .ru

Because of that, the user experience is far from ideal,
as it forces the user to use some specific unstable versions of the language, and the quality of error messages is limited.
tonowak marked this conversation as resolved.
Show resolved Hide resolved

In comparison, the cargo-semver-checks' approach to write lints as queries seems to work really well.
Adding new queries is designed to be accessible and the maintenance comes down to
keeping up with rustdoc API changes, which seems to be about as low effort as it could be.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Vision %
Expand Down Expand Up @@ -497,7 +547,7 @@ \section{Project baseline}
\item some existing lints had false-positives,
\item the codebase was not in a state where new contributors could easily begin making changes
to the project (which is crucial for the project to flourish in the long term).
For example, adding new lints and tests wasn't intuitive and required many manual steps,
For example, adding new lints and tests was not intuitive and required many manual steps,
the filenames and variable names were not always descriptive enough
and the code lacked comments that explained some of the logic and decisions behind it.
\end{itemize}
Expand Down Expand Up @@ -613,11 +663,12 @@ \section{Steady increase in tool's popularity}
\item list the maintainers of big libraries that started using the tool during our development
tonowak marked this conversation as resolved.
Show resolved Hide resolved
\end{itemize}

\section{Script}
\section{Script} \label{r:section_scanning_script}

TODO:
\begin{itemize}
\item show the results of the script that searches all existing releases for detected semver breaks
\item adjust the name of the subsection
\item show the results of the script that searches all existing releases for detected semver breaks
staniewzki marked this conversation as resolved.
Show resolved Hide resolved
\item describe how our new lints can make an impact on the community based on the found semver breaks from the script
\end{itemize}

Expand Down Expand Up @@ -670,6 +721,8 @@ \section{Responsibilities}
% function}, Mathematica Absurdica, 117 (1965) 338--9.
\bibitem{issue-merge-cargo} \href{}{GitHub cargo-semver-checks issue \#61: Prepare for merging into cargo}
\bibitem{issue-cli-interface} \href{}{GitHub cargo-semver-checks issue \#86 What should the CLI look like?}
\bibitem{issue-compiling-fails} \href{}{GitHub cargo-semver-checks issue \#317: compiling semver-checks fails}
\bibitem{issue-libp2p} \href{}{GitHub rust-libp2p issue \#3312: feat: migrate to quick-protobuf}

\bibitem{Rust-1} Rust Team,
\textit{Rust Programming Language} (2023) \\
Expand Down Expand Up @@ -703,8 +756,31 @@ \section{Responsibilities}
\textit{Semantic Versioning 2.0.0} (2022) \\
https://semver.org/

\end{thebibliography}
\bibitem{paper} Hao Li, Filpe R Cogo, Cor-Paul Bezemer, \\
\textit{An Empirical Study of Yanked Releases in the Rust Package Registry}
(2022) \\ https://arxiv.org/pdf/2201.11821.pdf

\bibitem{fearless-cargo-update} Predrag Gruevski,
\textit{Towards fearless cargo update} (2022) \\
https://predr.ag/blog/toward-fearless-cargo-update/

\bibitem{elm-lang} Evan Czaplicki,
\textit{Elm Programming Language} (2021) \\
https://elm-lang.org/

\bibitem{pyo3-issue}
\textit{Github PyO3 issue \#285} (2018) \\
https://github.com/PyO3/pyo3/issues/285

\bibitem{clap-issue}
\textit{Github clap issue \#3876} (2022) \\
https://github.com/clap-rs/clap/issues/3876

\bibitem{block-buffer-issue}
\textit{Github RustCrypto issue \#22} \\
https://github.com/RustCrypto/utils/issues/22

\end{thebibliography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Attachments %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Expand Down