-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: rewrite in node for less tool clutter #7
Changes from all commits
e0502d6
3e8641f
5b368bb
0a6e132
9a6ce98
9da79ce
214be4d
de0c48c
49f77f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
test.py | ||
.github | ||
README.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1 @@ | ||
*.json | ||
__pycache__ | ||
node_modules |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,8 @@ | ||
FROM node:20-alpine | ||
|
||
RUN apk add --no-cache curl=8.9.0-r0 python3=3.12.3-r1 && \ | ||
npm install -g wikibase-cli@18.0.3 | ||
RUN apk add --no-cache curl=8.9.0-r0 | ||
|
||
COPY --chmod=755 ./transferbot.sh /usr/bin/transferbot | ||
COPY --chmod=755 ./mangle_data.py /usr/bin/mangle_data | ||
COPY . /pkg | ||
RUN npm i -g /pkg | ||
|
||
ENTRYPOINT ["transferbot"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
#!/usr/bin/env node | ||
|
||
import WikibaseRepo from './../lib/repo.js' | ||
import { pickLanguages, pickKeys } from './../lib/util.js' | ||
import { execCmd } from './../lib/exec.js' | ||
import { trap } from './../lib/signals.js' | ||
|
||
const abortController = new AbortController() | ||
|
||
Promise.race([ | ||
(async () => { | ||
const args = process.argv.slice(2) | ||
if (args.length < 3) { | ||
throw new Error( | ||
`Expected at least 3 positional arguments to be given, but got ${args.length}, cannot continue.` | ||
) | ||
} | ||
const [source, target, ...entities] = args | ||
|
||
const oauth = { | ||
consumer_key: process.env.TARGET_WIKI_OAUTH_CONSUMER_TOKEN, | ||
consumer_secret: process.env.TARGET_WIKI_OAUTH_CONSUMER_SECRET, | ||
token: process.env.TARGET_WIKI_OAUTH_ACCESS_TOKEN, | ||
token_secret: process.env.TARGET_WIKI_OAUTH_ACCESS_SECRET | ||
} | ||
if (!Object.values(oauth).every(v => v)) { | ||
throw new Error( | ||
'Unable to find full set of OAuth credentials in environment variables, cannot continue.' | ||
) | ||
} | ||
|
||
const sourceRepo = new WikibaseRepo(source, { abortController }) | ||
const targetRepo = new WikibaseRepo(target, { oauth, abortController }) | ||
|
||
const contentLanguages = await targetRepo.getContentLanguages() | ||
|
||
let data = await sourceRepo.getEntities(...entities) | ||
data = data | ||
.map(e => pickKeys(e, 'type', 'labels', 'descriptions', 'aliases', 'datatype')) | ||
.map(e => pickLanguages(e, ...contentLanguages)) | ||
|
||
await targetRepo.createEntities(...data) | ||
|
||
console.log( | ||
`Sucessfully transferred ${entities.length} entities from ${source} to ${target}.` | ||
) | ||
})(), | ||
(async () => { | ||
const signal = await trap('SIGINT', 'SIGTERM', 'SIGQUIT') | ||
throw new Error(`Received ${signal}, program will exit before finishing.`) | ||
})() | ||
]) | ||
.then((result) => { | ||
process.exitCode = 0 | ||
}) | ||
.catch((err) => { | ||
console.error('An error occurred executing the command:') | ||
console.error(err) | ||
process.exitCode = 1 | ||
abortController.abort() | ||
}) | ||
.then(() => { | ||
if (process.env.CALLBACK_ON_SUCCESS && process.exitCode === 0) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Running the callbacks from in here now means we don't catch SIGINT, but I guess that's ok for our use case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if this might leave us with a larger (but still small) chance of wiki's being left in a partially imported state. In any case I don't think k8s is likely to SIGINT but rather SIGTERM. If we wanted to be super belt and braces we could trap this and then CALLBACK_ON_FAILURE There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a cleanup Laravel job that sweeps stalled imports every hour or so. We could leave the trap in bash as is if we wanted to, but then we'd have a bit of mish-mash still. Which is what we are trying to leave behind. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added some Node based trapping in 214be4d that seems to work when double CRL+Cing the thing running in Docker
Single Ctrl+C would have Node waiting for its event loop to become empty and would trigger both the transfer as well as the failure callback. I would think this is a problem we'd also have in bash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ended up needing to do a load of reading the docs about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is a bit ugly. When using
In theory
meaning we can't shell out to the command the user is passing. Maybe we're already at a point where this is "as good as it gets" if we want to keep this in Node. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it's good to remind ourselves to not overcomplicate this and aim for a solution that ensures the given commands will ALWAYS run as expected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I implemented some cancellation mechanism in 49f77f1 This works as intended now, but I am not 100% sure we should use it as it makes the code kind of hard to follow. This is mostly because the wikibase-x librarys do not expose the fetch they are using, so we cannot use the plain |
||
return execCmd(process.env.CALLBACK_ON_SUCCESS) | ||
} | ||
if (process.env.CALLBACK_ON_FAILURE && process.exitCode !== 0) { | ||
return execCmd(process.env.CALLBACK_ON_FAILURE) | ||
} | ||
}) | ||
.catch((err) => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting question: if the transfer itself worked, but the callback failed, the script will now still exit 0. Is this what we want? I'm unsure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After trying to think through the steps that would happen in this case I believe it's probably no worse than returning In either case because of how we specified the spec the job won't retry. This means that it may be marginally less clear to the debugging operator but on the other hand the "main work" done by the job did succeed. |
||
console.error('An error occurred executing the given callbacks:') | ||
console.error(err) | ||
}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import { exec } from 'node:child_process' | ||
|
||
export const execCmd = (cmd) => { | ||
return new Promise((resolve, reject) => { | ||
exec(cmd, (error, stdout, stderr) => { | ||
process.stdout.write(stdout) | ||
process.stderr.write(stderr) | ||
if (error) { | ||
reject(error) | ||
return | ||
} | ||
resolve() | ||
}) | ||
}) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
import WBEdit from 'wikibase-edit' | ||
import { WBK } from 'wikibase-sdk' | ||
|
||
export default class WikibaseRepository { | ||
constructor (origin, opts = { abortController: new AbortController() }) { | ||
this.origin = origin | ||
this.abortController = opts.abortController | ||
|
||
this.read = new WBK({ | ||
instance: origin | ||
}) | ||
|
||
if (opts.oauth) { | ||
this.edit = new WBEdit({ | ||
instance: origin, | ||
credentials: { | ||
oauth: opts.oauth | ||
} | ||
}) | ||
} | ||
} | ||
|
||
async withCancellationContext (thunk) { | ||
this.abortController.signal.throwIfAborted() | ||
return thunk() | ||
} | ||
|
||
getContentLanguages () { | ||
return fetch( | ||
`${this.origin}/w/api.php?action=query&meta=wbcontentlanguages&format=json`, | ||
{ signal: this.abortController.signal } | ||
) | ||
.then(r => r.json()) | ||
.then(body => Object.keys(body.query.wbcontentlanguages)) | ||
} | ||
|
||
createEntities (...entities) { | ||
if (!this.edit) { | ||
return Promise.reject(new Error('Cannot edit a read only instance.')) | ||
} | ||
return Promise.allSettled(entities.map(async entity => { | ||
return this.withCancellationContext(() => this.edit.entity.create(entity)) | ||
})) | ||
.then((results) => { | ||
for (const result of results) { | ||
if (result.status === 'rejected') { | ||
throw new Error(result.reason) | ||
} | ||
} | ||
return results.map(v => v.value) | ||
}) | ||
} | ||
|
||
getEntities (...identifiers) { | ||
return Promise.all(identifiers.map(async identifier => { | ||
const [entityId, revision] = identifier.split('@') | ||
const url = revision | ||
? await this.withCancellationContext(() => this.read.getEntityRevision({ | ||
id: entityId, | ||
revision | ||
})) | ||
: await this.withCancellationContext(() => this.read.getEntities({ | ||
ids: [entityId] | ||
})) | ||
const { entities } = await fetch( | ||
url, { signal: this.abortController.signal } | ||
).then(res => res.json()) | ||
return entities[entityId] | ||
})) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
export const trap = (...signals) => { | ||
return Promise.race(signals.map((s) => { | ||
return new Promise((resolve, reject) => { | ||
process.on(s, () => resolve(s)) | ||
}) | ||
})) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
export const pickKeys = (obj, ...keys) => { | ||
return Object.entries(obj).reduce((acc, [key, value]) => { | ||
if (keys.includes(key)) { | ||
acc[key] = value | ||
} | ||
return acc | ||
}, {}) | ||
} | ||
|
||
export const pickLanguages = (obj, ...languages) => { | ||
return Object.entries(obj).reduce((acc, [key, value]) => { | ||
acc[key] = isObject(value) ? pickKeys(value, ...languages) : value | ||
return acc | ||
}, {}) | ||
} | ||
|
||
function isObject (x) { | ||
return Object.prototype.toString.call(x) === '[object Object]' | ||
} |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: I needed to quickly look up how this fancy thing worked and that's very cool. Looks suitably tidy!