Breadcrumbs
- All
- Learning
- Reflections
- Ideas
- Logs
for some reason tasks are getting terminated in between, no reasons provided : qwen3.6, context at 32%
Session reset: all active JSONL conversation logs purged for a completely fresh start on +0200.
I should explore creating a package to enable scheduled tasks in pi-dev
testing if new session creation succeeds and still allows for scheduled skill execution
tool calling and context utilization coupled with decent TPS on a MBP M3 pro 36Gigs is pretty impressive from qwen3.6. blew everything else out of the water
New idea: auto-generating journal tags inside the spj hybrid script keeps social media clean while local logs get richer context!
All skills verified and working after repository restructuring — spj hybrid test!
time to test out quwen3.6. at 27B, it is 20Gigs (as compared 9Gigs for qwen 3.5)
time to test out quwen3.6. at 27B, it is 20Gigs (as compared 9Gigs for qwen 3.5)
Qwen3.6 27B model update - memory: 20GB vs Qwen3.5’s 9GB
Testing Qwen3.6 at 27B params - memory footprint: 20Gigs (vs 9G for Qwen 3.5)
Qwen3.6 testing at 27B parameters - memory footprint is 20Gigs vs 9Gigs for Qwen 3.5
Time to test out Qwen3.6 at 27B - it is 20Gigs (as compared 9Gigs for Qwen 3.5)
Time to test out Qwen3.6 at 27B - it is 20Gigs (as compared 9Gigs for Qwen3.5)
Time to test Qwen3.6 at 27B
time to test out qwen3.6. at 27B, it is 20Gigs (as compared 9Gigs for qwen 3.5)
qwen3.5 combined with agents.md changes seems to have done the trick
Test the new spj tool - verifying all three actions work correctly from tools directory!
Update AGENTS.md with /spj skill reference. One command now does journaling AND cross-posting to both BlueSky and Mastodon!
After cleaning Agents.md and fixing skills directory lets see if the harness picks up things as needed. Leap of faith with north-mini-code-1.0:mlx-nvfp4 (vibethinker sticks to just thinking, no execution) but still no luck. back with gemma 4
somehow je works better than skills to post on social media, i wonder what is different 🤔 but the amount of thinking to just append to a file is bonkers. Need to analyze it with SOTA model i guess
updated AGENTS.md using GEMINI 3.5 Flash to steer model better, lets see if this works
Further refined AGENTS.md for better steering. This also raises the point of versioning the changes to this bot’s guardrails. what a missed opportunity
steering seems to have improved with proper guidance and ofcourse qwen. next i will switch to gemma in a new session to see if it works
before I switch the model, testing if it works equally well from telegram bot or not
Repository restructuring completed using Qwen 3.6: consolidated 5 bluesky variants into single blue sky.py, fixed mastadon typo to mastodon.py, rewrote clean spj hybrid, updated all SKILL.md docs, and aligned AGENTS.md references. All old scripts backed up to temps/Qwen 3.6 proved significantly better than Qwen 3.5 on tool calling fidelity, multi-turn coherence, and longer context management — even outperforming Gemma 4 which previously held the lead for pi-dev. The key advantage: Qwen 3.6 handles complex restructuring across dozens of files in a single focused run without losing context, whereas earlier models would drift or hallucinate after 5-7 steps.
Thinking mode setting seems to somehow nerf gemma4 as well. It is not using the skills at all.
I keep coming back to gemma4 despite its shortcomings; it still seems like the most performant model for pi-dev so far. Vibethinker3B blew me away with its TPS, but doesn’t vibe much other than that.
pi dev indeed is self improving. I was able to add a new model to its models.json and a simple reload made it available in the same session
starting pi in the required directory did solve picking up agents.md but skill availability is better with gemma4 while north mini is better at speed and general chat only
pi dev gemma4 work pretty nicely with things like skill creation and file manipulation but falters when code updates take multiple turns.
listening to Kevin Murphy’s lecture on Agents while keeping an eye on Aviraj for the night. Link
Was able to work with bluesky seamlessly. Maston turns out to be messy because gemma4 is not so adherent with requested changes.
queueing of tasks also works nicely when context utilisation is under 50%.
context utilisation directly impacts agent adherence and quality.
spost is a composite skill that uses mpost and bpost for cross platform posting.
lfm2.5 is hallucinating a lot and making mistakes on simple tasks as well. It is faster than gemma4 but nowhere near as usable.
Telegram can take audio input and while gemma4 models can work with audio data, it can’t transcribe and used the text till the harness (pi dev in this case) has a tool to enable it. “
I feel more productive today when I use the bot to organize my thoughts.
It’s much easier to see everything on one page.
Update AGENTS.md with /spj skill reference. One command now does journaling AND cross-posting to both BlueSky and Mastodon!