Skip to content

Commit

Permalink
Update on Overleaf.
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexGustafsson authored and overleaf committed May 9, 2021
1 parent 612a21e commit 89c268b
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 41 deletions.
4 changes: 2 additions & 2 deletions abstract-english.tex
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
\abstract
% Introduction, som i conclusion fast ännu mer sammanfattat
% Dagsläget, problemet, lösningen (pqc)
\noindent\textbf{Background.} People use the Internet for communication, work, online banking and more. Public-key cryptography enables this use to be secure by providing confidentiality and trust online. Though these algorithms may be secure from attacks from classical computers, future quantum computers may break them using Shor's algorithm. Therefore \gls{post-quantum} algorithms are being developed to mitigate this issue. The \acrlong{nist} has started a standardization process for these algorithms.\newline
\noindent\textbf{Background.} People use the Internet for communication, work, online banking and more. Public-key cryptography enables this use to be secure by providing confidentiality and trust online. Though these algorithms may be secure from attacks from classical computers, future quantum computers may break them using Shor's algorithm. Post-quantum algorithms are therefore being developed to mitigate this issue. The \acrfull{nist} has started a standardization process for these algorithms.\newline
% Sammanfatta våra research questions. "We analyze prestanda o.s.v., ej nämna rqs"
\textbf{Objectives.} In this work, we analyze what specialized features applicable for \gls{post-quantum} algorithms are available in the mainframe architecture \gls{ibmz}. Furthermore, we study the performance of these algorithms on various hardware in order to understand what techniques may increase their performance.\newline
% Litterature study, experimental study - lite sammanfattning från metoden? Vi har ett stycke om våra metod
\textbf{Methods.} We apply a literature study to identify the performance characteristics of \gls{post-quantum} algorithms as well as what features of \gls{ibmz} may accommodate and accelerate these. We further apply an experimental study to analyze the practical performance of the two prominent finalists \gls{ntru} and \gls{mceliece} on consumer, cloud and mainframe hardware.\newline
% Kortfattat från rq1-3 i conclusions
\textbf{Results.} \gls{ibmz} was found to be able to accelerate several key symmetric primitives such as \gls{sha3} and \gls{aes} via the \gls{cpacf}. Though the available \acrlong{hsm}s did not support any of the studied algorithms, they were found to be able to accelerate them via a \gls{fpga}. Based on our experimental study, we found that computers with support for the Advanced Vector Extensions (\gls{avx}) were able to significantly accelerate the execution of \gls{post-quantum} algorithms. Lastly, we identified that vector extensions, \glspl{asic} and \glspl{fpga} are key techniques for accelerating these algorithms.\newline
\textbf{Results.} \gls{ibmz} was found to be able to accelerate several key symmetric primitives such as \gls{sha3} and \gls{aes} via the \gls{cpacf}. Though the available \acrlong{hsm}s (\acrshort{hsm}s) did not support any of the studied algorithms, they were found to be able to accelerate them via a \gls{fpga}. Based on our experimental study, we found that computers with support for the Advanced Vector Extensions (\gls{avx}) were able to significantly accelerate the execution of \gls{post-quantum} algorithms. Lastly, we identified that vector extensions, \glspl{asic} and \glspl{fpga} are key techniques for accelerating these algorithms.\newline
% Outlook-stycket?
\textbf{Conclusions.} When considering the readiness of hardware for the transition to \gls{post-quantum} algorithms, we find that the proposed algorithms do not perform nearly as well as classical algorithms. Though the algorithms are likely to improve until the \gls{post-quantum} transition occurs, improved hardware support via faster vector instructions, increased cache sizes and the addition of polynomial instructions may significantly help reduce the impact of the transition.

Expand Down
2 changes: 1 addition & 1 deletion abstract-swedish.tex
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
\textbf{Syfte.} I detta arbete analyserar vi vilka specialiserade funktioner för kvantsäkra algoritmer som finns i stordator-arkitekturen \gls{ibmz}. Vidare studerar vi prestandan av dessa algoritmer på olika hårdvara för att förstå vilka tekniker som kan öka deras prestanda.\newline
\textbf{Metod.} Vi utför en litteraturstudie för att identifiera vad som är karaktäristiskt för kvantsäkra algoritmers prestanda samt vilka funktioner i \gls{ibmz} som kan möta och accelerera dessa. Vidare applicerar vi en experimentell studie för att analysera den praktiska prestandan av de två framträdande finalisterna \gls{ntru} och \gls{mceliece} på konsument-, moln- och stordatormiljöer.\newline
\textbf{Resultat.} Vi fann att \gls{ibmz} kunde accelerera flera centrala symmetriska primitiver så som \gls{sha3} och \gls{aes} via en hjälpprocessor för kryptografiska funktioner (\acrshort{cpacf}). Även om befintliga hårdvarusäkerhetsmoduler inte stödde några av de undersökta algoritmerna, fann vi att de kan accelerera dem via en på-plats-programmerbar grindmatris (\acrshort{fpga}). Baserat på vår experimentella studie, fann vi att datorer med stöd för avancerade vektorfunktioner (\gls{avx}) möjlggjorde en signifikant acceleration av kvantsäkra algoritmer. Slutligen identifierade vi att vektorfunktioner, applikationsspecifika integrerade kretsar (\acrshort{asic}s) och \acrshort{fpga}s är centrala tekniker som kan nyttjas för att accelerera dessa algortmer.\newline
\textbf{Slutsatser.} Gällande beredskapen hos hårdvara för en övergång till kvantsäkra krypton, finner vi att de föreslagna algoritmerna inte presterar närmelsevis lika bra som klassiska algoritmer. Trots att det är sannolikt att de kvantsäkra kryptona fortsatt förbättras innan övergången sker, kan förbättrat hårdvarustöd för snabbare vektorfunktioner, ökade cachestorlekar och tillägget av polynomoperationer signifikant bidra till att minska påverkan av övergången.
\textbf{Slutsatser.} Gällande beredskapen hos hårdvara för en övergång till kvantsäkra krypton, finner vi att de föreslagna algoritmerna inte presterar närmelsevis lika bra som klassiska algoritmer. Trots att det är sannolikt att de kvantsäkra kryptona fortsatt förbättras innan övergången sker, kan förbättrat hårdvarustöd för snabbare vektorfunktioner, ökade cachestorlekar och tillägget av polynomoperationer signifikant bidra till att minska påverkan av övergången till kvantsäkra krypton.

\vspace{1cm}
\noindent
Expand Down
37 changes: 5 additions & 32 deletions chapters/conclusions/main.tex
Original file line number Diff line number Diff line change
@@ -1,13 +1,7 @@
\chapter{Conclusions and Future Work}
\label{chapter:conclusion}

% 1. Återintroduktion till problemet - 1 stycke
% Vad är problemet, varför behöver vi byta till PQ.
% Vad NIST gör. Vad vi anser är problemet - vi vill mäta ...
% Introducera // upprepa RQ1-3 igen.
% Enkelt att förlänga - ta med fler aspekter från introduktionen
% Nämn Shor, Grover, siffror (90% av PCI?), nämn research grap - lite mer om mainframes. Lite metod, kanske, mer vad vi gjort?
\noindent Web traffic, online banking, \glspl{vpn} and messaging applications are secured using public-key cryptography algorithms. The algorithms in use will continue to be secure from attacks from conventional computers for the foreseeable future. But the rise of quantum computers and algorithms, such as Shor's, threatens the security of the classical algorithms and may completely break the classical algorithms in the near future. These developments have been followed for a long time and progress has been made throughout the years to introduce \gls{post-quantum} algorithms. The \acrfull{nist} published an open call for \gls{post-quantum} \gls{kem} and signature algorithm submissions, thus starting their standardization process. Their third round of submissions provided the two prominent \gls{kem} finalists \gls{mceliece} and \gls{ntru}. Although research has been done on the performance of the submissions, we identified a gap in the research. We measured the performance on mainframe hardware. We also measured the performance on consumer and cloud hardware to provide context. With regards to these developments, we have answered the following research questions.
\noindent Web traffic, online banking, \glspl{vpn} and messaging applications are secured using public-key cryptography algorithms. The algorithms in use will continue to be secure from attacks from conventional computers for the foreseeable future. But the rise of quantum computers and algorithms, such as Shor's, threatens the security of the classical algorithms and may completely break the classical algorithms in the near future. These developments have been followed for a long time and progress has been made throughout the years to introduce \gls{post-quantum} algorithms. The \acrfull{nist} published an open call for \gls{post-quantum} \gls{kem} and digital signature algorithm submissions, thus starting their standardization process. Their third round of submissions provided the two prominent \gls{kem} finalists \gls{mceliece} and \gls{ntru}. Although research has been done on the performance of the submissions, we identified a gap in the research. We measured the performance on mainframe hardware. We also measured the performance on consumer and cloud hardware to provide context. With regards to these developments, we have answered the following research questions.

\begin{description}
\item \textbf{RQ1} What specialized instructions and features applicable for \gls{post-quantum} \acrlong{kem}s are available in \gls{ibmz}?
Expand All @@ -17,33 +11,12 @@ \chapter{Conclusions and Future Work}
\item \textbf{RQ3} What techniques may be used to increase the performance of \gls{post-quantum} \acrlong{kem}s?
\end{description}

\noindent\textbf{RQ1}. We focused our work on the \gls{z15}. Though we were unable to target the processor with target-specific optimizations for the \gls{post-quantum} \glspl{kem}, we believe our results provide us with enough information to state that considerable performance gains are to be made on \gls{z15} hardware. As the \gls{z15} supports \gls{cpacf}, several key symmetric primitives such as \gls{sha3} and \gls{aes} may be accelerated. Furthermore, the support for \glspl{hsm} with purpose-built hardware for accelerating these primitives allows for further performance increases. Though the \glspl{hsm} do not support the \gls{post-quantum} \glspl{kem} we studied, they do offer programmability and cryptographic agility, enabling future algorithms to be accelerated once standardized. The \gls{z15} further offers \gls{simd} instructions at sustained 5.2GHz, theoretically allowing for highly performant software implementations of \gls{post-quantum} algorithms.
\noindent\textbf{RQ1}. We focused our work on the \gls{z15}. Though we were unable to target the processor with target-specific optimizations for the \gls{post-quantum} \glspl{kem}, we believe our results provide us with enough information to state that considerable performance gains are to be made on \gls{z15} hardware. As the \gls{z15} features the \gls{cpacf}, several key symmetric primitives such as \gls{sha3} and \gls{aes} may be accelerated. Furthermore, the support for \glspl{hsm} with purpose-built hardware for accelerating these primitives allows for further performance increases. Though the \glspl{hsm} do not support the \gls{post-quantum} \glspl{kem} we studied, they do offer programmability and cryptographic agility, enabling future algorithms to be accelerated once standardized. The \gls{z15} further offers \gls{simd} instructions at sustained 5.2GHz, theoretically allowing for highly performant software implementations of \gls{post-quantum} algorithms.

\noindent\textbf{RQ2}. By researching the sequential performance of algorithms as well as their throughput, we have collected data on the performance characteristics. Based on our measurements, it is clear that the performance of \gls{post-quantum} \glspl{kem} varies largely between architectures and environments. Modern computers running on \gls{x86} hardware utilizing the \gls{avx2} instruction set, performs considerably better than older \gls{x86} computers without support for \gls{avx}. The performance on the non-vectorized implementations on \gls{z15} was similar to that on \gls{x86} hardware.\todo{Further information here?}
\noindent\textbf{RQ2}. By researching the sequential performance of algorithms as well as their throughput, we have collected data on their performance characteristics. Based on our measurements, it was clear that the performance of \gls{post-quantum} \glspl{kem} varied largely between architectures and environments. Modern computers running on \gls{x86} hardware utilizing the \gls{avx2} instruction set, performed considerably better than older \gls{x86} computers without support for \gls{avx}. The performance on the non-vectorized implementations on \gls{z15} was similar to that on \gls{x86} hardware.\todo{Further information here?}

\noindent\textbf{RQ3}. We identified that vectorization of \gls{post-quantum} algorithms makes a key difference in performance based on our measurements. Though not all algorithms saw a significant performance increase when vectorized, hardware implementations in either \glspl{asic} or \glspl{fpga} were found to significantly increase performance, based on our literature study. Our measurements further provide evidence that the algorithmic change of using the semi-systematic form of \gls{mceliece} outperforms the non-systematic variant. To increase the throughput of the \glspl{kem} more threads may be used. We found that \gls{ntru} scaled the best with regards to the number of threads used.

% 2. Tydligt via hur man svarar på forskningsfrågor - 3 stycke
% RQ1 - skiljer sig tydligt. moderna workstation x86 etc. presterar mycket bra - mycket AVX2. Äldra x86 presterar klart sämre. Mainframe har potential att dra nytta av CPACF när stöd finns. FPGA etc. i HSM.
% RQ2 - CPACF, HSM, SIMD, (Memory safety), (Enormous cache?)
% RQ3 - Vectorization - SIMD, Hardware implementations (ASIC / FPGA), algorithmic changes - semi-systematic mceliece. Tillgängliggöra mer cache, fler trådar?.
\noindent\textbf{Outlook}. When considering the readiness of hardware for the transition to \gls{post-quantum} \glspl{kem}, we found that the \glspl{kem} submitted to the \gls{nist} in their current state do not perform nearly as well as the classical algorithms. A transition today would therefore result in a noticeable overhead for clients and have an even greater impact on servers. As the algorithms and implementations have improved significantly in a short period of time, it is likely that the software will continue to improve and lower the overhead before the \gls{post-quantum} transition occurs. To further help the transition, processors should see increased performance in \gls{simd} instructions and an increase in cache sizes. Furthermore, processor designers should commit to bring constant-time vector instructions as well as wider and further vector registers to help improve the speed of the \gls{post-quantum} \glspl{kem}. By bringing hardware-accelerated polynomial multiplication, current and future lattice-based cryptosystems may be further optimized. Hardware-based \gls{kem} implementations were found in literature to be practical and performant, making them a prominent solution for lowering the potential performance impact of the transition.

%3. Allmän summering - baserat på våra mätningar bla bla bla
%Återkoppla till introduktion?

%Vi kommer se en perfromance-hit, men det kan vara så att plattformer hinner med. Utvecklingen är snabb och har pågått under en lång tid. IBM, Microsoft m.fl. har investerat för att vara väl förberedda. Vi har ett antal år kvar, beroende på källa.

% Outlook, hur framtiden ser ut, 3-10 år? Hårdvaran är redo - men det är en stor performance impact - speciellt för servrar. Förtydliga varje del som redan finns?

\noindent\textbf{Outlook}. When considering the readiness of hardware for the transition to \gls{post-quantum} \glspl{kem}, we found that the \glspl{kem} submitted to \gls{nist} in their current state do not perform nearly as well as the classical algorithms. A transition today would therefore result in a noticeable overhead for clients and have an even greater impact on servers. As the algorithms and implementations have improved significantly in a short period of time, it is likely that the software will continue to improve and lower the overhead before the \gls{post-quantum} transition occurs. To further help the transition, processors should see increased performance in \gls{simd} instructions and an increase in cache sizes. Furthermore, processor designers should commit to bring constant-time vector instructions as well as wider and further vector registers to help improve the speed of the \gls{post-quantum} \glspl{kem}. By bringing hardware-accelerated polynomial multiplication, current and future lattice-based cryptosystems may be further optimized. Hardware-based \gls{kem} implementations were found in literature to be practical and performant, making them a prominent solution for lowering the potential performance impact of the transition.

% 4. Future work - "baserat på det vi lärt oss nu, kan man undersöka det här A, det här B och det här C..."
% TODO: Kolla på hur latency påverkas - future work? SaberX4 high-throughput software ...
% Analysera latency, inte bara throughput.
% Applicera samma metod på samtliga submissions.
% Ta fram perf-liknande verktyg för z15 för hårdvarubaserad mätning.
% Dedikerad hårdvara för mainframes, HSMs, CPACF - stort fokus?
% Totalt ungefär 3-5 (concl.) + 3-5 stycken (future.)
% Fördjupa mer

\noindent\textbf{Future work}. We have identified several potential topics for which future work may be warranted. For one, our work measures throughput performance without any regard to how the latency of the algorithms is affected. By studying the latency alongside the throughput, one may have provided data to further understand how the two correlate. Furthermore, we decided to limit our work to two \gls{nist} submissions as the time constraints we were under would not enable us to study further algorithms. By applying the method presented in this thesis to further algorithms, we believe one could achieve a broader understanding of the performance of \gls{post-quantum} \glspl{kem} on various hardware. To further increase the amount of data collectable from the \gls{z15} platform, we believe a study of perf-like tools for the platform is warranted. Furthermore, a more complete study targeting the \gls{z15}, applying a more direct and practical method with new implementations for the platform, would yield a more representative result for the performance of the \gls{z15}.
\noindent\textbf{Future work}. Our work measured throughput performance without any regard to how the latency of the algorithms is affected. By studying the latency alongside the throughput, one may gather data to further understand how the two correlate. Furthermore, we decided to limit our work to two \gls{nist} submissions as the time constraint we were under would not enable us to study further algorithms. By applying the method presented in this thesis to further algorithms, we believe one could achieve a broader understanding of the performance of \gls{post-quantum} \glspl{kem} on various hardware. To further increase the amount of data collectable from the \gls{z15} platform, we believe a study of perf-like tools for the platform is warranted. Furthermore, a more complete study targeting \gls{z15}, applying a more direct and practical method with new implementations for the platform, could yield a more representative result for the performance of \gls{z15}.
Loading

0 comments on commit 89c268b

Please sign in to comment.