treŔdiena, 2025. gada 3. septembris

Upscale with A.I VIDEOS

Great choice! The ShotCut + Upscayl combo gives you a GUI-based workflow to upscale videos on Ubuntu 22.04 — with no command line required for basic use.


šŸŽÆ What You'll Achieve

  1. Split your video into image frames using ShotCut (or FFmpeg, optionally).

  2. Upscale the frames using Upscayl (AI image upscaler).

  3. Reassemble the upscaled frames back into a video using ShotCut.


🧰 What You Need


🧱 Step-by-Step Guide


šŸ”§ Step 1: Install ShotCut

šŸ“¦ Install via Flatpak (Recommended)

sudo apt install flatpak -y flatpak install flathub org.shotcut.Shotcut

✅ Launch with:

flatpak run org.shotcut.Shotcut

šŸ¤– Step 2: Install Upscayl (GUI AI Image Upscaler)

šŸ“„ Download .AppImage (Easiest)

  1. Go to: https://github.com/upscayl/upscayl/releases

  2. Download the latest Upscayl-x.x.x.AppImage

  3. Make it executable:

chmod +x Upscayl-*.AppImage ./Upscayl-*.AppImage

šŸŽž️ Step 3: Extract Frames from Video using ShotCut

  1. Open your video in ShotCut.

  2. Drag the video to the timeline.

  3. Go to File > Export Frame (or use the “Export” tab).

  4. Choose an image format (e.g., PNG or JPG).

  5. Export all frames as images manually — or use ShotCut’s “Export > Advanced > Format = image sequence” option to do it automatically.

Alternatively, extract frames via FFmpeg:

mkdir frames ffmpeg -i yourvideo.mp4 frames/frame%04d.png

🧠 Step 4: Upscale Frames with Upscayl

  1. Launch Upscayl.

  2. Select the folder with the exported frames.

  3. Choose an AI model (Real-ESRGAN-based models work best).

  4. Set output folder.

  5. Click "Upscayl" and wait for processing.

Upscayl supports batch processing, so it will upscale all frames in one go.


šŸŽ¬ Step 5: Reassemble Upscaled Frames into a Video

Option 1: Use ShotCut

  1. Open ShotCut and create a new project.

  2. Drag the upscaled images (sorted in order) into the timeline.

  3. Set the frame rate to match the original video.

  4. Export as video.

Option 2: Use FFmpeg (faster)

ffmpeg -framerate 30 -i upscaled/frame%04d.png -i original_audio.aac -c:v libx264 -pix_fmt yuv420p output.mp4
  • Adjust -framerate to match original.

  • If you want to keep original audio, extract it first:

ffmpeg -i yourvideo.mp4 -q:a 0 -map a original_audio.aac

šŸŽ‰ Done!

You now have a fully upscaled version of your video using AI — all on Ubuntu, no complex CLI tools or Docker needed.


šŸ“ Bonus: Automate the Process?

Want a simple shell script to automate this (frame extraction + reassembly)? Just ask, and I’ll generate one tailored to your workflow.

Let me know if you’d like help optimizing for speed, quality, or GPU usage.

ceturtdiena, 2025. gada 6. februāris

How to Merge Video Files Using FFmpeg

join_video.txt

file /Users/Video/input1.mp4
file /Users/Video/input2.mp4 

Note : You can add more than two video files. 

Then, run the FFmpeg command.

ffmpeg -f concat -safe 0 -i join_video.txt -c copy output_demuxer.mp4

-safe 0 is added so that any file name will be accepted. The default value 1 will reject absolute paths in the text file.



 extract frames
 ffmpeg -i kickflip.mp4 '%04d.png'


 joing frames

ffmpeg -framerate 60 -pattern_type glob -i '*.png'   -c:v libx264 -pix_fmt yuv420p result4.mp4


treŔdiena, 2025. gada 8. janvāris

Python code to generate music with facebook/musicgen (facebook/musicgen-melody)

import os
import psutil
import torch
import gc
from transformers import AutoProcessor, MusicgenMelodyForConditionalGeneration, MusicgenMelodyConfig
import scipy
# https://huggingface.co/docs/transformers/main/model_doc/musicgen_melody

# Function to log memory usage
def log_memory(stage=""):
process = psutil.Process(os.getpid())
print(f"Memory Usage after {stage}: {process.memory_info().rss / 1024 ** 2} MB")

log_memory("initial load")

# Hugging Face token for authentication
token = "" // getyourself at https://huggingface.co/

# Load model configuration and manually add missing config attributes
#model_name = "facebook/musicgen-small" # Use smaller variants if available
model_name = "facebook/musicgen-melody" # For better output
config = MusicgenMelodyConfig.from_pretrained(model_name, token=token)

# Manually add the missing 'use_cache' attribute
config.use_cache = False # This should resolve the AttributeError you encountered

# Manually add the missing initializer_factor if it's required
config.initializer_factor = 1.0 # Default value for initialization

# Modify configuration parameters for debugging
config.dropout = 0.1
config.layerdrop = 0.1
config.max_position_embeddings = 512 # Reduced
config.hidden_size = 128 # Smaller hidden size
config.num_codebooks = 128 # Adjusted to a smaller number for compatibility
config.scale_embedding = True
config.vocab_size = 50257
config.num_hidden_layers = 2 # Fewer layers
config.num_attention_heads = 4 # Fewer attention heads
config.attention_dropout = 0.1
config.activation_function = "gelu"
config.activation_dropout = 0.1
config.ffn_dim = 1024

log_memory("after config")

# Load the model
model = MusicgenMelodyForConditionalGeneration.from_pretrained(model_name, config=config, token=token).eval()

log_memory("after model loaded")

# Processor for the model
processor = AutoProcessor.from_pretrained(model_name)

# Ensure proper input shape by padding to the required size
prompt = "A relaxing jazz track with piano and bass."

input_ids = processor(
text=[prompt],
padding=True,
return_tensors="pt",
).to(model.device)

# Check the shape after reshaping
print(f"Input tensor shape after reshaping: {input_ids['input_ids'].shape}")

# Generate audio based on input prompt with no_grad to save memory
with torch.no_grad():
generated_audio = model.generate(**input_ids, max_new_tokens=1024)
print(generated_audio)

log_memory("after generation")

# Check type of the audio data
print(f"Type of generated audio: {type(generated_audio)}")

# Save the generated audio to a file
if isinstance(generated_audio, torch.Tensor):
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write("generated_music.wav", rate=sampling_rate, data=generated_audio.to("cpu")[0, 0].numpy())
else:
print("Unexpected audio format, unable to save.")

# Cleanup
del generated_audio # Explicitly delete the variable
gc.collect() # Garbage collection
log_memory("after cleanup")

Photorealistic filter for video game footage

Here is a step-by-step guide for creating a photorealistic filter for video game footage using the workflow and free tools mentioned:


Step 1: Extract Video Frames

First, convert the video game footage into individual image frames.

  1. Install FFmpeg:

    • Download FFmpeg from FFmpeg Official Website.
    • Add FFmpeg to your system’s PATH for easy access via the terminal/command prompt.
  2. Extract Frames:

    • Open a terminal and navigate to the folder containing your video (input.mp4).
    • Run the following command:
      ffmpeg -i input.mp4 -vf fps=30 frames/frame_%04d.png
      
      • input.mp4: Replace this with your video file name.
      • fps=30: Set the output frame rate (30 frames per second).
      • frames/frame_%04d.png: Saves frames in the frames/ folder as frame_0001.png, frame_0002.png, etc.
  3. Verify Output:

    • Check the frames/ folder to ensure the extracted frames are saved as images.

Step 2: Process Frames

Option A: Apply Style Transfer with CycleGAN

  1. Download CycleGAN:

    • Clone the repository:
      git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git
      cd pytorch-CycleGAN-and-pix2pix
      
  2. Install Dependencies:

    • Install required Python libraries:
      pip install -r requirements.txt
      
  3. Download Pretrained Models:

    • Download a pretrained model (e.g., horse2zebra for stylistic changes or fine-tune for photorealism):
      bash ./scripts/download_cyclegan_model.sh horse2zebra
      
  4. Apply Style Transfer:

    • Use the test.py script to process frames:
      python test.py --dataroot ./frames --name horse2zebra_pretrained --model test --no_dropout
      
      • Replace horse2zebra with your pretrained model.
      • Processed frames will be saved in the results/ folder.

Option B: Apply Super-Resolution with ESRGAN

  1. Download ESRGAN:

    • Clone the repository:
      git clone https://github.com/xinntao/ESRGAN.git
      cd ESRGAN
      
  2. Install Dependencies:

    • Install required Python libraries:
      pip install -r requirements.txt
      
  3. Download Pretrained Models:

    • Download the ESRGAN model from here.
  4. Run Super-Resolution:

    • Process frames using the ESRGAN script:
      python test.py --input_folder ./frames --output_folder ./processed_frames
      
      • Input: ./frames/
      • Output: ./processed_frames/

Option C: Generate Depth Maps with MiDaS

  1. Download MiDaS:

    • Clone the MiDaS repository:
      git clone https://github.com/isl-org/MiDaS.git
      cd MiDaS
      
  2. Install Dependencies:

    • Install PyTorch and MiDaS dependencies:
      pip install torch torchvision
      pip install -r requirements.txt
      
  3. Run Depth Estimation:

    • Generate depth maps for each frame:
      python run.py --input_path ./frames --output_path ./depth_maps
      
      • Input: ./frames/
      • Output: ./depth_maps/

Step 3: Reassemble Processed Frames into a Video

  1. Ensure Processed Frames are Ordered:

    • Processed frames should be named sequentially (frame_0001.png, frame_0002.png, etc.).
  2. Merge Frames into Video:

    • Run the following FFmpeg command:
      ffmpeg -framerate 30 -i processed_frames/frame_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4
      
      • -framerate 30: Match the original frame rate (30 FPS).
      • processed_frames/frame_%04d.png: Processed frames directory.
      • output.mp4: Final video file.
  3. Verify Output:

    • Check the output.mp4 file to ensure it combines the processed frames into a smooth video.

Optimizing for GTX 1650 Ti

  • Resolution: Limit frame resolution to 720p (1280x720) to avoid GPU memory issues.
  • Batch Processing: Process frames in batches to reduce VRAM usage.
    • Modify scripts to load and process fewer images at a time.
  • Mixed Precision: Use PyTorch's torch.cuda.amp for mixed-precision training or inference to save VRAM.

Potential Enhancements

  1. Dataset Fine-Tuning:

    • Train CycleGAN or ESRGAN on a custom dataset with real-world images for better photorealism.
    • Use datasets like COCO or Flickr for training.
  2. Post-Processing:

    • Add cinematic effects using DaVinci Resolve (free) for professional video editing.

Let me know if you’d like help setting up any of these tools or troubleshooting specific issues!