Gemma 4 E2B โ€” Fine-tuned for Caregivers, Q4_K_M GGUF

A fine-tuned derivative of Google Gemma 4 E2B, adapted via LoRA (rank 32) and quantized to Q4_K_M for on-device inference on mobile devices through any llama.cpp-compatible runtime.

This model is a Cognitive Decline Caregiver Support Assistant. It is designed to support people caring for a loved one with a neurodegenerative disease, dementia, or severe cognitive decline โ€” helping them navigate the psychological weight of ambiguous loss (mourning someone who is still physically present). It treats a caregiver's dark moments (rage, jealousy, exhaustion, wishing for an end) as biological exhaustion rather than moral failure, and responds with a fixed, gentle four-step rhythm rather than advice or solutions.

Attribution

This model is a fine-tuned derivative of Google Gemma 4 E2B, originally released by Google DeepMind under the Apache 2.0 License.

Lineage:

Modifications by Serjio42 (2026):

  • Fine-tuned with LoRA (rank 32) for caregiver-focused use cases
  • Merged adapter weights with the base model (16-bit)
  • Quantized to Q4_K_M via llama.cpp

Files

File Size Purpose
gemma4-e2b_r32-q4_k_m.gguf ~3.4 GB Quantized model weights
inference_config.json โ€” Sampling and generation parameters
system_prompt.txt โ€” Default system prompt (use verbatim)
LICENSE โ€” Apache 2.0 license text

Integrity

gemma4-e2b_r32-q4_k_m.gguf โ€” 3,427,863,872 bytes (~3.4 GB).

SHA-256:

81ce0ae4a3fb37040faf37c6eedc0985f0d7fa291e8d17a9820937ccdab4158b

Training

  • Base model: Google Gemma 4 E2B (instruction-tuned, ~2B effective parameters), accessed via unsloth/gemma-4-E2B-it
  • Method: LoRA fine-tuning, rank 32, then merged with base weights (16-bit)
  • Chat template: Gemma 4 non-thinking template (gemma-4)
  • Quantization: Q4_K_M via llama.cpp (convert_hf_to_gguf.py โ†’ llama-quantize)
  • Training data: Curated private dataset for caregiver-focused instruction following. Dataset access available on request โ€” please open a discussion on this repository.

Conversation protocol

This model is trained for a fixed four-turn conversation, not free-form chat. Each conversation follows the same rhythm:

  1. Mirror โ€” reflect the caregiver's moment back so they feel seen.
  2. Normalize โ€” explain why their reaction is a universal human response.
  3. Self-compassion โ€” invite one small act of kindness toward themselves.
  4. Close โ€” a soft landing, no advice, no new task.

Flow:

  • The first user message is the caregiver's hard moment โ€” a story, a dark thought, a raw feeling.
  • For each of the next three turns, send the literal string Continue.
  • The model produces exactly one response per user turn, four responses total.
  • Each response is 1โ€“3 sentences (usually two), never more than ~60 words.

Stop tokens ([1, 106, 50]) are baked into the GGUF metadata โ€” no extra stop-token configuration is needed in the app.

Intended use

On-device inference in a mobile application (Android primary, iOS planned), loaded with any llama.cpp-compatible runtime. Designed for offline, privacy-preserving text generation after a one-time model download. Target use case: emotional support for caregivers of people with dementia / neurodegenerative disease, delivered through the fixed four-step rhythm described above.

Target devices

  • Android phones with 8 GB+ RAM; iPhone 15 Pro / 16 Pro (8 GB RAM) and newer
  • ~4 GB free storage for the model and working files
  • The GGUF exceeds App Store / Play bundle limits โ€” distribute via CDN / cloud storage and download on first launch.

Limitations

  • Q4_K_M quantization trades some quality for size; expect minor degradation compared to the full-precision model.
  • Fine-tune is domain-specific (caregiver emotional support, fixed four-turn protocol); out-of-domain or free-form-chat performance is not guaranteed.
  • Inherits biases and limitations of the base Gemma 4 model.
  • Not a substitute for professional medical or mental-health advice. Outputs are AI-generated and may contain errors. This model is not a crisis service. For any medical decisions, or in an emergency, consult a qualified healthcare professional or local emergency services.

Usage

Load the GGUF with any llama.cpp-compatible runtime โ€” the llama.cpp CLI/server, or any binding/wrapper on top of it. Pick whatever fits your stack; the model imposes no runtime-specific requirements.

Use the system prompt from system_prompt.txt verbatim before user messages โ€” the model was trained on this exact prompt, and any change degrades behavior. Apply the sampling parameters from inference_config.json (temperature 1.0, top-p 0.95, top-k 64, repeat-penalty 1.0, max new tokens 300, context size 2048), and follow the four-turn flow described in Conversation protocol.

License

Released under the Apache 2.0 License โ€” same terms as the base Gemma 4 model. See LICENSE for the full license text.

Downloads last month
56
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Serjio42/gemma4-e2b-finetuned-caregivers

Quantized
(204)
this model