Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: rewrite in node for less tool clutter #7

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
test.py
.github
README.md
12 changes: 7 additions & 5 deletions .github/workflows/unit_test.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Python package
name: Node tests

on:
push:
Expand All @@ -11,9 +11,11 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
- name: Set up Node 20
uses: actions/setup-node@v4
with:
python-version: 3.12
node-version: 20
- name: Run unit tests
run: ./test.py
run: |
npm ci
npm t
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
*.json
__pycache__
node_modules
7 changes: 3 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
FROM node:20-alpine

RUN apk add --no-cache curl=8.9.0-r0 python3=3.12.3-r1 && \
npm install -g wikibase-cli@18.0.3
RUN apk add --no-cache curl=8.9.0-r0

COPY --chmod=755 ./transferbot.sh /usr/bin/transferbot
COPY --chmod=755 ./mangle_data.py /usr/bin/mangle_data
COPY . /pkg
RUN npm i -g /pkg

ENTRYPOINT ["transferbot"]
73 changes: 73 additions & 0 deletions bin/cmd.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env node

import WikibaseRepo from './../lib/repo.js'
import { pickLanguages, pickKeys } from './../lib/util.js'
import { execCmd } from './../lib/exec.js'
import { trap } from './../lib/signals.js'

const abortController = new AbortController()

Promise.race([
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I needed to quickly look up how this fancy thing worked and that's very cool. Looks suitably tidy!

(async () => {
const args = process.argv.slice(2)
if (args.length < 3) {
throw new Error(
`Expected at least 3 positional arguments to be given, but got ${args.length}, cannot continue.`
)
}
const [source, target, ...entities] = args

const oauth = {
consumer_key: process.env.TARGET_WIKI_OAUTH_CONSUMER_TOKEN,
consumer_secret: process.env.TARGET_WIKI_OAUTH_CONSUMER_SECRET,
token: process.env.TARGET_WIKI_OAUTH_ACCESS_TOKEN,
token_secret: process.env.TARGET_WIKI_OAUTH_ACCESS_SECRET
}
if (!Object.values(oauth).every(v => v)) {
throw new Error(
'Unable to find full set of OAuth credentials in environment variables, cannot continue.'
)
}

const sourceRepo = new WikibaseRepo(source, { abortController })
const targetRepo = new WikibaseRepo(target, { oauth, abortController })

const contentLanguages = await targetRepo.getContentLanguages()

let data = await sourceRepo.getEntities(...entities)
data = data
.map(e => pickKeys(e, 'type', 'labels', 'descriptions', 'aliases', 'datatype'))
.map(e => pickLanguages(e, ...contentLanguages))

await targetRepo.createEntities(...data)

console.log(
`Sucessfully transferred ${entities.length} entities from ${source} to ${target}.`
)
})(),
(async () => {
const signal = await trap('SIGINT', 'SIGTERM', 'SIGQUIT')
throw new Error(`Received ${signal}, program will exit before finishing.`)
})()
])
.then((result) => {
process.exitCode = 0
})
.catch((err) => {
console.error('An error occurred executing the command:')
console.error(err)
process.exitCode = 1
abortController.abort()
})
.then(() => {
if (process.env.CALLBACK_ON_SUCCESS && process.exitCode === 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the callbacks from in here now means we don't catch SIGINT, but I guess that's ok for our use case?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this might leave us with a larger (but still small) chance of wiki's being left in a partially imported state. In any case I don't think k8s is likely to SIGINT but rather SIGTERM. If we wanted to be super belt and braces we could trap this and then CALLBACK_ON_FAILURE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cleanup Laravel job that sweeps stalled imports every hour or so.

We could leave the trap in bash as is if we wanted to, but then we'd have a bit of mish-mash still. Which is what we are trying to leave behind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some Node based trapping in 214be4d that seems to work when double CRL+Cing the thing running in Docker

➜  transferbot git:(fr/nodeify) ✗ docker run -e TARGET_WIKI_OAUTH_CONSUMER_TOKEN="xxx" -e TARGET_WIKI_OAUTH_CONSUMER_SECRET="xxx" -e TARGET_WIKI_OAUTH_ACCESS_TOKEN="xxx" -e TARGET_WIKI_OAUTH_ACCESS_SECRET="xxx" -e CALLBACK_ON_FAILURE="echo 'failing'" --rm 4ab54cf0400349e5e0afbf06e1965046c1b7cf0f8890c505afd8b733efac3a07 https://www.wikidata.org https://updatewiki.wikibase.dev Q107819273
^CAn error occurred executing the command:
Error: Received SIGINT, program will exit before finishing.
    at process.<anonymous> (file:///pkg/lib/signals.js:5:16)
    at process.emit (node:events:519:28)
failing
^C%

Single Ctrl+C would have Node waiting for its event loop to become empty and would trigger both the transfer as well as the failure callback. I would think this is a problem we'd also have in bash.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up needing to do a load of reading the docs about process and this then inspired me to ask: "should we consider using beforeExit for all this stuff?" see https://nodejs.org/api/process.html#event-beforeexit

Copy link
Contributor Author

@m90 m90 Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a bit ugly. When using beforeExit, a forceful exit (i.e. Ctrl+C) or an unhandled exception will not be considered:

The 'beforeExit' event is not emitted for conditions causing explicit termination, such as calling process.exit() or uncaught exceptions.

In theory on('exit') would help us, however this (as opposed to beforeExit) only allows for synchronous code to be run in the callback:

Listener functions must only perform synchronous operations. The Node.js process will exit immediately after calling the 'exit' event listeners causing any additional work still queued in the event loop to be abandoned.

meaning we can't shell out to the command the user is passing.

Maybe we're already at a point where this is "as good as it gets" if we want to keep this in Node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's good to remind ourselves to not overcomplicate this and aim for a solution that ensures the given commands will ALWAYS run as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented some cancellation mechanism in 49f77f1

This works as intended now, but I am not 100% sure we should use it as it makes the code kind of hard to follow. This is mostly because the wikibase-x librarys do not expose the fetch they are using, so we cannot use the plain signal but have to add some custom wrapper, when fetch calls can use the signal as intended. I'm not really sure if the extra complexity justifies the outcome.

return execCmd(process.env.CALLBACK_ON_SUCCESS)
}
if (process.env.CALLBACK_ON_FAILURE && process.exitCode !== 0) {
return execCmd(process.env.CALLBACK_ON_FAILURE)
}
})
.catch((err) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question: if the transfer itself worked, but the callback failed, the script will now still exit 0. Is this what we want? I'm unsure.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying to think through the steps that would happen in this case I believe it's probably no worse than returning 1.

In either case because of how we specified the spec the job won't retry.

This means that it may be marginally less clear to the debugging operator but on the other hand the "main work" done by the job did succeed.

console.error('An error occurred executing the given callbacks:')
console.error(err)
})
15 changes: 15 additions & 0 deletions lib/exec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import { exec } from 'node:child_process'

export const execCmd = (cmd) => {
return new Promise((resolve, reject) => {
exec(cmd, (error, stdout, stderr) => {
process.stdout.write(stdout)
process.stderr.write(stderr)
if (error) {
reject(error)
return
}
resolve()
})
})
}
71 changes: 71 additions & 0 deletions lib/repo.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import WBEdit from 'wikibase-edit'
import { WBK } from 'wikibase-sdk'

export default class WikibaseRepository {
constructor (origin, opts = { abortController: new AbortController() }) {
this.origin = origin
this.abortController = opts.abortController

this.read = new WBK({
instance: origin
})

if (opts.oauth) {
this.edit = new WBEdit({
instance: origin,
credentials: {
oauth: opts.oauth
}
})
}
}

async withCancellationContext (thunk) {
this.abortController.signal.throwIfAborted()
return thunk()
}

getContentLanguages () {
return fetch(
`${this.origin}/w/api.php?action=query&meta=wbcontentlanguages&format=json`,
{ signal: this.abortController.signal }
)
.then(r => r.json())
.then(body => Object.keys(body.query.wbcontentlanguages))
}

createEntities (...entities) {
if (!this.edit) {
return Promise.reject(new Error('Cannot edit a read only instance.'))
}
return Promise.allSettled(entities.map(async entity => {
return this.withCancellationContext(() => this.edit.entity.create(entity))
}))
.then((results) => {
for (const result of results) {
if (result.status === 'rejected') {
throw new Error(result.reason)
}
}
return results.map(v => v.value)
})
}

getEntities (...identifiers) {
return Promise.all(identifiers.map(async identifier => {
const [entityId, revision] = identifier.split('@')
const url = revision
? await this.withCancellationContext(() => this.read.getEntityRevision({
id: entityId,
revision
}))
: await this.withCancellationContext(() => this.read.getEntities({
ids: [entityId]
}))
const { entities } = await fetch(
url, { signal: this.abortController.signal }
).then(res => res.json())
return entities[entityId]
}))
}
}
7 changes: 7 additions & 0 deletions lib/signals.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
export const trap = (...signals) => {
return Promise.race(signals.map((s) => {
return new Promise((resolve, reject) => {
process.on(s, () => resolve(s))
})
}))
}
19 changes: 19 additions & 0 deletions lib/util.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
export const pickKeys = (obj, ...keys) => {
return Object.entries(obj).reduce((acc, [key, value]) => {
if (keys.includes(key)) {
acc[key] = value
}
return acc
}, {})
}

export const pickLanguages = (obj, ...languages) => {
return Object.entries(obj).reduce((acc, [key, value]) => {
acc[key] = isObject(value) ? pickKeys(value, ...languages) : value
return acc
}, {})
}

function isObject (x) {
return Object.prototype.toString.call(x) === '[object Object]'
}
55 changes: 0 additions & 55 deletions mangle_data.py

This file was deleted.

Loading
Loading