-
Notifications
You must be signed in to change notification settings - Fork 851
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2989 from chaoss/remove-update-weight-hotfix
Remove update weight hotfix
- Loading branch information
Showing
6 changed files
with
103 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
|
||
## Idea: Enhance Conversational Topic Modelling Capabilities in CHAOSS Software | ||
|
||
**Hours: 350** | ||
|
||
[Micro-tasks and place for questions](https://github.com/chaoss/augur/issues/1640) | ||
|
||
This project will add GenSIM logic, and other capabilities to the Clustering Worker inside of Augur Software, and be extended into a generalized Open Source Software Conversational Topic Modeling Instrument. | ||
|
||
CHOASS/augur has several workers that store machine learning information derived from computational linguistic analysis of data in the `message` table. The message table includes messages from issue, pull request, pull request review, and email messages. They are related to their origin with bridge tables like `pull_request_message_ref`. The ML/CL workers are all run against all the messages, regardless of origin. | ||
|
||
1. Clustering Worker (clusters created and topics modeled) | ||
2. message analysis worker (sentiment and novelty analysis) | ||
3. discourse analysis worker (speech act classification (question, answer, approval, etc.) | ||
|
||
Clustering Worker Notes: | ||
|
||
Clustering Worker: 2 Models. | ||
- Models: | ||
- Topic modeling, but it needs a better way of estimating number of topics. | ||
- Tables | ||
- repo_topic | ||
- topic_words | ||
- Computational linguistic clustering | ||
- Tables | ||
- repo_cluster_messages | ||
- Key Needs | ||
- Add GenSim algorithms to topic modeling section https://github.com/chaoss/augur/issues/1199 | ||
- The topics, and associated topic words need to be persisted after each run. At the moment, the topic words get overwritten for each topic modeling run. | ||
- Description/optimization of the parameters used to create the computational linguistic clusters. | ||
- Periodic deletion of models (heuristic: If 3 months pass, OR there’s a 10% increase in the messages, issues, or PRs in a repo, rebuild the models) | ||
- Establish some kind of model archiving with appropriate metadata (lower priority) | ||
|
||
Discourse Analysis Worker Notes: | ||
|
||
discourse_insights table (select max(data_collection_date) for each msg_id) | ||
- sequence is reassembled from the timestamp in the message table (look at msg_timestamp) | ||
- issues_msg_ref, pull_request_message_ref, pull_request_review_msg_ref | ||
|
||
Message Analysis Worker | ||
- message_analysis | ||
- message_analysis_summary | ||
|
||
<img width="1159" alt="augur-tech" src="https://user-images.githubusercontent.com/379847/124799236-f440dc80-df19-11eb-84ce-302cf274884f.png"> | ||
|
||
The aims of the project are as follows: | ||
- Advance topic modeling of open source software conversations captured in GitHub. | ||
- Integrate this information into clearer, more parsimonious CHAOSS metrics. | ||
- Automate the management machine learning insights, and topic models over time. | ||
- (Stretch Goal) Improve the operation of the overall machine learning insights pipeline in CHAOSS/augur, and generalize these capabilities. | ||
|
||
|
||
* _Difficulty:_ Medium | ||
* _Requirements:_ Interest in software analytics. Python programming. Conceptual understanding of machine learning, and an eagerness to learn maching learning, and SQL knowledge. | ||
* _Recommended:_ Experience with Python | ||
* _Mentors:_ Sean Goggins, Andrew Brain, Isaac Milarsky |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Google Summer of Code 2025 Interested Candidates | ||
|
||
Hi potential GSoC students, | ||
|
||
You can ask questions and meet the community on Slack here: https://join.slack.com/t/chaoss-workspace/shared_invite/zt-289zxh6tu-3oQaFlutPFY039MjKpnWcA ... look for the `wg-augur-8knot` channel. | ||
|
||
A few details regarding the application process specific to the CHAOSS project: | ||
|
||
1) You must complete one micro-task related to the idea you are interested in. You can find the micro-tasks on the GSoc Idea Page at: [gsoc-ideas.md](./gsoc-ideas.md) | ||
|
||
2) Once you completed one micro-task, create a pull request on this file below to add yourself, your information, and a link to your repository of the completed micro-task. **NOTE:** This repository requires [Developer Certificate of Origin](https://developercertificate.org/) (DCO) sign-off; see [CONTRIBUTING.md](https://github.com/chaoss/governance/blob/master/CONTRIBUTING.md#code-or-document-change-contributions-github-interface) for details on how to sign your commits. | ||
|
||
3) You are welcome to include in your repositories other information that could be of interest, such as open issues or pull requests submitted to the project to which you intend to contribute during GSoC, contributions to other projects, skills, and other related information. | ||
|
||
4) Using and submitting other people's work as your own is not allowed. If you use other people's work, be sure to acknowledge their work in your submission. | ||
|
||
5) Documentation of all code contributions is critical, and expected from all CHAOSS GSoC Students. | ||
|
||
You must complete these things by the GSOC Deadline. Make sure to also [submit the information required by GSoC for applicants](https://summerofcode.withgoogle.com/) (i.e., project proposal), linking to it from your pull request to this file. Here is an [Proposal Template](https://docs.google.com/document/d/1YZez6_hgp2dBybEsMZoQ-ONB9IawK4_OPISLHe9Tjew/edit) to get you started. | ||
|
||
Regards, | ||
GSoC Mentors | ||
|
||
--- | ||
|
||
## Applicants | ||
|
||
**The applicants section will be completed as applicants are added here. At the moment, we are at the very beginning!** | ||
|
||
|
||
**UPDATE:** This repository requires [Developer Certificate of Origin](https://developercertificate.org/) (DCO) sign-off; see [CONTRIBUTING.md](https://github.com/chaoss/governance/blob/master/CONTRIBUTING.md#code-or-document-change-contributions-github-interface) for details on how to sign your commits. | ||
|
||
|
||
| Name | Email | Idea | Micro-Task Repository | Project Proposal | Submitted on GSOC | | ||
| --- | --- | --- | --- | --- | --- | | ||
| Your Name Here | Your Email Here | Idea You Hoping to Work On | Link to your Mico-task Repo | Link to Your Proposal | YES/NO | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters