trešdiena, 2025. gada 8. janvāris

Python code to generate music with facebook/musicgen (facebook/musicgen-melody)

import os
import psutil
import torch
import gc
from transformers import AutoProcessor, MusicgenMelodyForConditionalGeneration, MusicgenMelodyConfig
import scipy
# https://huggingface.co/docs/transformers/main/model_doc/musicgen_melody

# Function to log memory usage
def log_memory(stage=""):
process = psutil.Process(os.getpid())
print(f"Memory Usage after {stage}: {process.memory_info().rss / 1024 ** 2} MB")

log_memory("initial load")

# Hugging Face token for authentication
token = "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"

# Load model configuration and manually add missing config attributes
#model_name = "facebook/musicgen-small" # Use smaller variants if available
model_name = "facebook/musicgen-melody" # For better output
config = MusicgenMelodyConfig.from_pretrained(model_name, token=token)

# Manually add the missing 'use_cache' attribute
config.use_cache = False # This should resolve the AttributeError you encountered

# Manually add the missing initializer_factor if it's required
config.initializer_factor = 1.0 # Default value for initialization

# Modify configuration parameters for debugging
config.dropout = 0.1
config.layerdrop = 0.1
config.max_position_embeddings = 512 # Reduced
config.hidden_size = 128 # Smaller hidden size
config.num_codebooks = 128 # Adjusted to a smaller number for compatibility
config.scale_embedding = True
config.vocab_size = 50257
config.num_hidden_layers = 2 # Fewer layers
config.num_attention_heads = 4 # Fewer attention heads
config.attention_dropout = 0.1
config.activation_function = "gelu"
config.activation_dropout = 0.1
config.ffn_dim = 1024

log_memory("after config")

# Load the model
model = MusicgenMelodyForConditionalGeneration.from_pretrained(model_name, config=config, token=token).eval()

log_memory("after model loaded")

# Processor for the model
processor = AutoProcessor.from_pretrained(model_name)

# Ensure proper input shape by padding to the required size
prompt = "A relaxing jazz track with piano and bass."

input_ids = processor(
text=[prompt],
padding=True,
return_tensors="pt",
).to(model.device)

# Check the shape after reshaping
print(f"Input tensor shape after reshaping: {input_ids['input_ids'].shape}")

# Generate audio based on input prompt with no_grad to save memory
with torch.no_grad():
generated_audio = model.generate(**input_ids, max_new_tokens=1024)
print(generated_audio)

log_memory("after generation")

# Check type of the audio data
print(f"Type of generated audio: {type(generated_audio)}")

# Save the generated audio to a file
if isinstance(generated_audio, torch.Tensor):
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write("generated_music.wav", rate=sampling_rate, data=generated_audio.to("cpu")[0, 0].numpy())
else:
print("Unexpected audio format, unable to save.")

# Cleanup
del generated_audio # Explicitly delete the variable
gc.collect() # Garbage collection
log_memory("after cleanup")

Photorealistic filter for video game footage

Here is a step-by-step guide for creating a photorealistic filter for video game footage using the workflow and free tools mentioned:


Step 1: Extract Video Frames

First, convert the video game footage into individual image frames.

  1. Install FFmpeg:

    • Download FFmpeg from FFmpeg Official Website.
    • Add FFmpeg to your system’s PATH for easy access via the terminal/command prompt.
  2. Extract Frames:

    • Open a terminal and navigate to the folder containing your video (input.mp4).
    • Run the following command:
      ffmpeg -i input.mp4 -vf fps=30 frames/frame_%04d.png
      
      • input.mp4: Replace this with your video file name.
      • fps=30: Set the output frame rate (30 frames per second).
      • frames/frame_%04d.png: Saves frames in the frames/ folder as frame_0001.png, frame_0002.png, etc.
  3. Verify Output:

    • Check the frames/ folder to ensure the extracted frames are saved as images.

Step 2: Process Frames

Option A: Apply Style Transfer with CycleGAN

  1. Download CycleGAN:

    • Clone the repository:
      git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git
      cd pytorch-CycleGAN-and-pix2pix
      
  2. Install Dependencies:

    • Install required Python libraries:
      pip install -r requirements.txt
      
  3. Download Pretrained Models:

    • Download a pretrained model (e.g., horse2zebra for stylistic changes or fine-tune for photorealism):
      bash ./scripts/download_cyclegan_model.sh horse2zebra
      
  4. Apply Style Transfer:

    • Use the test.py script to process frames:
      python test.py --dataroot ./frames --name horse2zebra_pretrained --model test --no_dropout
      
      • Replace horse2zebra with your pretrained model.
      • Processed frames will be saved in the results/ folder.

Option B: Apply Super-Resolution with ESRGAN

  1. Download ESRGAN:

    • Clone the repository:
      git clone https://github.com/xinntao/ESRGAN.git
      cd ESRGAN
      
  2. Install Dependencies:

    • Install required Python libraries:
      pip install -r requirements.txt
      
  3. Download Pretrained Models:

    • Download the ESRGAN model from here.
  4. Run Super-Resolution:

    • Process frames using the ESRGAN script:
      python test.py --input_folder ./frames --output_folder ./processed_frames
      
      • Input: ./frames/
      • Output: ./processed_frames/

Option C: Generate Depth Maps with MiDaS

  1. Download MiDaS:

    • Clone the MiDaS repository:
      git clone https://github.com/isl-org/MiDaS.git
      cd MiDaS
      
  2. Install Dependencies:

    • Install PyTorch and MiDaS dependencies:
      pip install torch torchvision
      pip install -r requirements.txt
      
  3. Run Depth Estimation:

    • Generate depth maps for each frame:
      python run.py --input_path ./frames --output_path ./depth_maps
      
      • Input: ./frames/
      • Output: ./depth_maps/

Step 3: Reassemble Processed Frames into a Video

  1. Ensure Processed Frames are Ordered:

    • Processed frames should be named sequentially (frame_0001.png, frame_0002.png, etc.).
  2. Merge Frames into Video:

    • Run the following FFmpeg command:
      ffmpeg -framerate 30 -i processed_frames/frame_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4
      
      • -framerate 30: Match the original frame rate (30 FPS).
      • processed_frames/frame_%04d.png: Processed frames directory.
      • output.mp4: Final video file.
  3. Verify Output:

    • Check the output.mp4 file to ensure it combines the processed frames into a smooth video.

Optimizing for GTX 1650 Ti

  • Resolution: Limit frame resolution to 720p (1280x720) to avoid GPU memory issues.
  • Batch Processing: Process frames in batches to reduce VRAM usage.
    • Modify scripts to load and process fewer images at a time.
  • Mixed Precision: Use PyTorch's torch.cuda.amp for mixed-precision training or inference to save VRAM.

Potential Enhancements

  1. Dataset Fine-Tuning:

    • Train CycleGAN or ESRGAN on a custom dataset with real-world images for better photorealism.
    • Use datasets like COCO or Flickr for training.
  2. Post-Processing:

    • Add cinematic effects using DaVinci Resolve (free) for professional video editing.

Let me know if you’d like help setting up any of these tools or troubleshooting specific issues!