Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for plain source code instead of XML when Loading files on Caché #1

Open
amirsamary opened this issue Aug 27, 2016 · 9 comments

Comments

@amirsamary
Copy link

I just realized that CacheGitHubCI will try to load classes from GitHub as XML. As I am using Atelier with the Eclipse GitHub plugin, the source code on GitHub is not stored as Caché XML format but as the plain source we see on Atelier when editing the class/csp/etc.

I can contribute to CacheGitHubCI, but before I do it, I wonder if anyone has implemented a new version of CacheGitHubCI that will load source as exported by Atelier...

@eduard93
Copy link
Contributor

CacheUpdater has a branch with atelier support. We are in a process of evaluating it. Maybe you can check it out too? Here's the discussion.

@amirsamary
Copy link
Author

Thank you!

@amirsamary
Copy link
Author

Hi again Eduard!

I have been using the UDL version and I have found some issues that I would like to share with you.

DownloadingFiles

When method CacheGitHubCI.Update() runs, it changes to the namespace where the files should be loaded and compiled and calls method DownloadFiles(). That method will download each file on the link list and compile it. But to decide on which method to use to compile (LoadStream or LoadUDLFile) the method will call a method on ANOTHER CLASS:

If ##class(CacheGitHubCI.UDL).IsUDLFile(stream) {
Set stload = ##class(CacheGitHubCI.UDL).LoadUDLFile(stream, link, .items, i)
}
Else {
Set stload = $system.OBJ.LoadStream(stream,"",.error,.items,,,,"UTF8")
}

This will fire an error since we are already on another namespace and this class (CacheGitHubCI.UDL) doesn't exist there. To fix the problem, I had to map the CacheGitHubCI package to my namespace.

I like the way you solved the problem though. It allows you to have code in any format on GitHub at the same time. My suggestion is to refactor this code to instead of adding the files to the Links list, add them to a global on CacheTemp, say:

^CacheTempCGCIFiles("<NAMESPACE>", sequence)=$LB(a,b,c,d,...)

The nice thing about using a global is that the list can be very big and you will not get a error. Also, prefixing the global with CacheTemp will create it on the CacheTemp namespace which is shared among all namespaces. So, when you change to the destination namespace, all the file data will be available! Finally, use a sequence and a $LB() to store the data on the global. The idea of using a $LB() is for you to call IsUDLFile() inside the ProcessDirectory() method and store its result with the file name so you don't have to call it when you are loading the classes on the destination namespace. Don't forget to Kill the global first!

I can make these changes if you want to. But I wanted to discuss them with you first.

*Downloading from a Branch other than "master" *

I think downloading from a branch other than master is not working for some reason... The problem I am facing is that when I load CacheGitHubCI classes on %SYS and call CacheGitHubCI.Install.setup(), it will run the manifest and try to download the latest code from GitHub:

ClassMethod DownloadFromGitHub(pVars, pLogLevel, tInstaller) As %Status {
Set Namespace=tInstaller.Evaluate("${Namespace}") Do tInstaller.PushNS("%SYS")
Set tSC = ..Update(Namespace, "intersystems-ru", "CacheGitHubCI", "udl") Do tInstaller.PopNS()
If $$$ISERR(tSC) Throw ##class(%Installer.Exception).CreateFromStatus(tSC) quit $$$OK
`}

The branch is there ("udl"), but the code that ends up on CGCI namespace is the code from "master" branch, not "udl" branch...

Have you faced this problem already? I can't help you with this one without reading GitHub API' specification.

** Timeout **
The hook works great. The problem is that I have too many files on GitHub and I didn't know that when the web hook is fired, you simply download everything on the branch again. I thought you would only download what was in the commit.

I have no problem with this since I will not be using web hooks anymore. My patching will be controlled by Ansible. But I thought you should know that when you have many files, ProcessDirectory() will spend a lot of time and GitHub will timeout saying that the web hook call failed.

There are two options to solve this:

  • Do it asynchronously - Looks like the cleanest approach. But it would be nice to have some e-mail alerts if something goes wrong. The e-mail could be sent to the e-mail of the person who did the last commit plus some another e-mail of an environment admin.
  • Do it synchronously but parallelising ProcessDirectory(). Using CacheTempCGCI global to hold the files would help you with that since you could have multiple jobs parsing the directory tree and populating this shared global. Say you have a parameter somewhere that will limit the number of jobs to fire. You $Increment a node inside CacheTempCGCI("","Jobs") to know if you are under that limit and fire a job with ProcessDirectory() for that subtree. You would do that on a loop for each directory entry you will find until you reach the maximum number of jobs. If you reach the maximum number of jobs, you will issue a Do instead of a Job to do the work on the current process instead of spawning a new one. That would allow you to keep the maximum number of jobs working even when jobs finishes (before finishing a job, you have to $Increment(, -1) on the global node. It is more error prone but maybe that would make it a lot faster and prevent the timeout. If a new timeout occurs, the administrator would have to increase the number of jobs. That is why I prefer option number 1 that will always report OK to GitHub in the sense that the integration worked but the work to be done will be done asynchronously and take time...

@eduard93
Copy link
Contributor

eduard93 commented Sep 7, 2016

Thank you!

@RustamIbragimov - what do you think?

@amirsamary
Copy link
Author

Hi @eduard93 and @RustamIbragimov !

I just found the problem with the Installer.

The method ProcessDirectory() on the CacheGitHubCI.Install class is missing the Branch parameter. The first request comes from "udl" Branch, but after the first directory it descends it will go without a Branch and change to "master".

I made CacheGitHubCI.Install.ProcessDirectory() identical to CacheGitHubCI.GitHub.ProcessDirectory() and now it is working. I guess you duplicated the code to make it possible to ship only CacheGitHubCI.Install class and that this single class would do all the work independently, right?

If that is the case, why not refactor this to put the Installer Manifest inside CacheGitHubCI.GitHub class and have the Install() and setup() methods there? We could eliminate CacheGitHubCI.Install class and a lot of duplicated code.

I am new to Git. I think I must create a new branch for the fixes I am working on and, when I am done, I should do a pull request to you so you can peer review what I did, right?

Respectfully,
Amir Samary

@amirsamary
Copy link
Author

The refactor I mentioned will be problematic since DownloadFiles() also has more dependencies:

  • ##class(CacheGitHubCI.UDL).IsUDLFile
  • ##class(CacheGitHubCI.UDL).LoadUDLFile

It looks like the best thing to do is to assume Install class must always be shipped with GitHub and UDL classes in order to use it to install/upgrade CacheGitHubCI on a system... I suggest refactoring Install class to call ProcessDirectory() and DownloadFiles() on CacheGitHubCI.GitHub class and keep Install class smaller, without repeated code.

Remember the dependency problem on DownloadFiles() that made me do a package mapping of CacheGitHubCI to the target namespace so that DownloadFiles could call IsUDLFile() on UDL class? I missed another dependency: LoadUDLFile() on UDL class too. So, even if we do the CacheTemp thing, this will break too. My suggestion here is to make GitHub class inherit from UDL class so we can ZN to the target namespace and have all the code available on a single class (no dependency on a second class). That would solve the orignal problem and make the changes on your code minimal (no need of CacheTemp global anymore). Although I would recommend a CacheTemp global anyway to avoid problems... But it is unlikely we have a problem with source code, right?

Please, let me know your thoughts about all this.

Respectfully,
AS

@eduard93
Copy link
Contributor

eduard93 commented Sep 8, 2016

The plan is for Installer to be able to download any branch regardless of. So CacheGitHubCI.Install.ProcessDirectory() needs to be modified. Thank you for finding and fixing this bug (pull request please?).

I am new to Git. I think I must create a new branch for the fixes I am working on and, when I am
done, I should do a pull request to you so you can peer review what I did, right?

You can either do this or create a fork of this repository and send pull request from there. We welcome pull requests here.

Overall I currently have a two step plan:

  • fix immediate bugs you described here
  • I'm not particularly concerned about web hook timeout error, I think we can just job ##class(CacheGitHubCI.Hook).Update(Owner,Repository,Namespace) call and be done with that. What do you think? I get my build analytics in DeepSee. It seems tio me that GitHub webhook status is more of a delivery status.

@amirsamary
Copy link
Author

amirsamary commented Sep 23, 2016

Hi @eduard93 and @RustamIbragimov,

I can see that you decided to make GitHub inherit from UDL. Cool. I am working on some other improvements:

  • I found out that we are not compiling CSP pages after loading them. I have fixed this since my CSP application has the Autocompile flag set to false for security reasons.
  • I have also implemented a local hashed registry of what has been loaded from GitHub. During ProcessDirectory() processing, I compare the SHA hash of the files on each directory to the hashes I have on my local registry (from the last load). If the hashes are equal, I won't add the file to the list of files to be downloaded. That made the update process 8 times faster to me since I am not loading everything from GitHub over and over again.
  • I have found that the way we were dealing with the character streams was breaking long javascript files. For instance, the ones that are minimized. They have the entire javascript code on a single line in the file and that line was being broken. I have fixed that.

Am am still facing a bug though where my accented characters such as é or á are being corrupted during download. They are fine in Git and on Atelier. But they are appearing as garbage on the integration server. I will be looking into this soon.

As I said, I am new to GitHub. So I will try to pull your changes from your repo to my fork of it and sort it all out. :)

Kind regards,
Amir Samary

@evshvarov
Copy link
Member

Wow, Amir!
This sounds great! Thanks for this update!

On Fri, Sep 23, 2016 at 5:23 PM, Amir Samary notifications@github.com
wrote:

Hi @eduard93 https://github.com/eduard93 and @RustamIbragimov
https://github.com/RustamIbragimov,

I am working on some other improvements. I can see that you decided to
make GitHub inherit from UDL. Cool.

I found out that we are not compiling CSP pages after loading them. I have
fixed this since my CSP application has the Autocompile flag set to false
for security reasons.

I have also implemented a local hashed registry of what has been loaded
from GitHub. During ProcessDirectory() processing, I compare the SHA hash
of the files on each directory to the hashes I have on my local registry
(from the last load). If the hashes are equal, I won't add the file to the
list of files to be downloaded. That made the update process 8 times faster
to me since I am not loading everything from GitHub over and over again.

As I said, I am new to GitHub. So I will try to pull your changes from
your repo to my fork of it and sort it all out. :)

Kind regards,
Amir Samary


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACpyPwGAKiRAhw2-Mq-Cpnk2O9aFyMXtks5qs-DWgaJpZM4Juv2N
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants