Skip to content

Commit

Permalink
Merge pull request #2989 from chaoss/remove-update-weight-hotfix
Browse files Browse the repository at this point in the history
Remove update weight hotfix
  • Loading branch information
sgoggins authored Feb 12, 2025
2 parents 9933ee7 + c199f85 commit 8302e87
Show file tree
Hide file tree
Showing 6 changed files with 103 additions and 8 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Augur NEW Release v0.76.7
# Augur NEW Release v0.80.1

Augur is primarily a data engineering tool that makes it possible for data scientists to gather open source software community data - less data carpentry for everyone else!
The primary way of looking at Augur data is through [8Knot](https://github.com/oss-aspen/8knot), a public instance of 8Knot is available [here](https://metrix.chaoss.io) - this is tied to a public instance of [Augur](https://ai.chaoss.io).
Expand All @@ -11,8 +11,7 @@ We follow the [First Timers Only](https://www.firsttimersonly.com/) philosophy o
## NEW RELEASE ALERT!
**If you want to jump right in, the updated docker, docker-compose and bare metal installation instructions are available [here](docs/new-install.md)**.


Augur is now releasing a dramatically improved new version to the ```main``` branch. It is also available [here](https://github.com/chaoss/augur/releases/tag/v0.76.7).
Augur is now releasing a dramatically improved new version to the ```main``` branch. It is also available [here](https://github.com/chaoss/augur/releases/tag/v0.80.1).


- The `main` branch is a stable version of our new architecture, which features:
Expand Down
4 changes: 2 additions & 2 deletions augur/tasks/init/celery_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,8 @@ def setup_periodic_tasks(sender, **kwargs):
logger.info(f"Scheduling refresh materialized view every night at 1am CDT")
sender.add_periodic_task(datetime.timedelta(days=mat_views_interval), refresh_materialized_views.s())

logger.info(f"Scheduling update of collection weights on midnight each day")
sender.add_periodic_task(crontab(hour=0, minute=0),augur_collection_update_weights.s())
# logger.info(f"Scheduling update of collection weights on midnight each day")
# sender.add_periodic_task(crontab(hour=0, minute=0),augur_collection_update_weights.s())

logger.info(f"Setting 404 repos to be marked for retry on midnight each day")
sender.add_periodic_task(crontab(hour=0, minute=0),retry_errored_repos.s())
Expand Down
5 changes: 4 additions & 1 deletion augur/tasks/util/collection_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,11 +154,14 @@ def get_valid_repos(self,session):
def get_newly_added_repos(session, limit, hook):

condition_string = ""
order_by_field = ""
if hook in ["core", "secondary", "ml"]:
condition_string += f"""{hook}_status='{str(CollectionState.PENDING.value)}'"""
order_by_field = "issue_pr_sum"

elif hook == "facade":
condition_string += f"""facade_status='{str(CollectionState.UPDATE.value)}'"""
order_by_field = "commit_sum"

if hook == "secondary":
condition_string += f""" and core_status='{str(CollectionState.SUCCESS.value)}'"""
Expand All @@ -168,7 +171,7 @@ def get_newly_added_repos(session, limit, hook):
from augur_operations.collection_status x, augur_data.repo y
where x.repo_id=y.repo_id
and {condition_string}
order by repo_added
order by {order_by_field}
limit :limit_num
""").bindparams(limit_num=limit)

Expand Down
56 changes: 56 additions & 0 deletions gsoc-ideas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@

## Idea: Enhance Conversational Topic Modelling Capabilities in CHAOSS Software

**Hours: 350**

[Micro-tasks and place for questions](https://github.com/chaoss/augur/issues/1640)

This project will add GenSIM logic, and other capabilities to the Clustering Worker inside of Augur Software, and be extended into a generalized Open Source Software Conversational Topic Modeling Instrument.

CHOASS/augur has several workers that store machine learning information derived from computational linguistic analysis of data in the `message` table. The message table includes messages from issue, pull request, pull request review, and email messages. They are related to their origin with bridge tables like `pull_request_message_ref`. The ML/CL workers are all run against all the messages, regardless of origin.

1. Clustering Worker (clusters created and topics modeled)
2. message analysis worker (sentiment and novelty analysis)
3. discourse analysis worker (speech act classification (question, answer, approval, etc.)

Clustering Worker Notes:

Clustering Worker: 2 Models.
- Models:
- Topic modeling, but it needs a better way of estimating number of topics.
- Tables
- repo_topic
- topic_words
- Computational linguistic clustering
- Tables
- repo_cluster_messages
- Key Needs
- Add GenSim algorithms to topic modeling section https://github.com/chaoss/augur/issues/1199
- The topics, and associated topic words need to be persisted after each run. At the moment, the topic words get overwritten for each topic modeling run.
- Description/optimization of the parameters used to create the computational linguistic clusters.
- Periodic deletion of models (heuristic: If 3 months pass, OR there’s a 10% increase in the messages, issues, or PRs in a repo, rebuild the models)
- Establish some kind of model archiving with appropriate metadata (lower priority)

Discourse Analysis Worker Notes:

discourse_insights table (select max(data_collection_date) for each msg_id)
- sequence is reassembled from the timestamp in the message table (look at msg_timestamp)
- issues_msg_ref, pull_request_message_ref, pull_request_review_msg_ref

Message Analysis Worker
- message_analysis
- message_analysis_summary

<img width="1159" alt="augur-tech" src="https://user-images.githubusercontent.com/379847/124799236-f440dc80-df19-11eb-84ce-302cf274884f.png">

The aims of the project are as follows:
- Advance topic modeling of open source software conversations captured in GitHub.
- Integrate this information into clearer, more parsimonious CHAOSS metrics.
- Automate the management machine learning insights, and topic models over time.
- (Stretch Goal) Improve the operation of the overall machine learning insights pipeline in CHAOSS/augur, and generalize these capabilities.


* _Difficulty:_ Medium
* _Requirements:_ Interest in software analytics. Python programming. Conceptual understanding of machine learning, and an eagerness to learn maching learning, and SQL knowledge.
* _Recommended:_ Experience with Python
* _Mentors:_ Sean Goggins, Andrew Brain, Isaac Milarsky
37 changes: 37 additions & 0 deletions gsoc-interest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Google Summer of Code 2025 Interested Candidates

Hi potential GSoC students,

You can ask questions and meet the community on Slack here: https://join.slack.com/t/chaoss-workspace/shared_invite/zt-289zxh6tu-3oQaFlutPFY039MjKpnWcA ... look for the `wg-augur-8knot` channel.

A few details regarding the application process specific to the CHAOSS project:

1) You must complete one micro-task related to the idea you are interested in. You can find the micro-tasks on the GSoc Idea Page at: [gsoc-ideas.md](./gsoc-ideas.md)

2) Once you completed one micro-task, create a pull request on this file below to add yourself, your information, and a link to your repository of the completed micro-task. **NOTE:** This repository requires [Developer Certificate of Origin](https://developercertificate.org/) (DCO) sign-off; see [CONTRIBUTING.md](https://github.com/chaoss/governance/blob/master/CONTRIBUTING.md#code-or-document-change-contributions-github-interface) for details on how to sign your commits.

3) You are welcome to include in your repositories other information that could be of interest, such as open issues or pull requests submitted to the project to which you intend to contribute during GSoC, contributions to other projects, skills, and other related information.

4) Using and submitting other people's work as your own is not allowed. If you use other people's work, be sure to acknowledge their work in your submission.

5) Documentation of all code contributions is critical, and expected from all CHAOSS GSoC Students.

You must complete these things by the GSOC Deadline. Make sure to also [submit the information required by GSoC for applicants](https://summerofcode.withgoogle.com/) (i.e., project proposal), linking to it from your pull request to this file. Here is an [Proposal Template](https://docs.google.com/document/d/1YZez6_hgp2dBybEsMZoQ-ONB9IawK4_OPISLHe9Tjew/edit) to get you started.

Regards,
GSoC Mentors

---

## Applicants

**The applicants section will be completed as applicants are added here. At the moment, we are at the very beginning!**


**UPDATE:** This repository requires [Developer Certificate of Origin](https://developercertificate.org/) (DCO) sign-off; see [CONTRIBUTING.md](https://github.com/chaoss/governance/blob/master/CONTRIBUTING.md#code-or-document-change-contributions-github-interface) for details on how to sign your commits.


| Name | Email | Idea | Micro-Task Repository | Project Proposal | Submitted on GSOC |
| --- | --- | --- | --- | --- | --- |
| Your Name Here | Your Email Here | Idea You Hoping to Work On | Link to your Mico-task Repo | Link to Your Proposal | YES/NO |

4 changes: 2 additions & 2 deletions metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@

__short_description__ = "Python 3 package for free/libre and open-source software community metrics, models & data collection"

__version__ = "0.76.7"
__release__ = "v0.76.7 (Captain Tuttle)"
__version__ = "0.80.1"
__release__ = "v0.80.1 (Data Monster)"

__license__ = "MIT"
__copyright__ = "University of Missouri, University of Nebraska-Omaha, CHAOSS, Brian Warner & Augurlabs 2112"

0 comments on commit 8302e87

Please sign in to comment.