Photo Background - 2d Compositing｜写真背景・二次元合成

Anime StyleGirl

v1.0 [hunyuan]

Photo Background - 2d Compositing｜写真背景・二次元合成

Trained on 2d illustrations composited on a photo background.

This is a small LoRA I thought would be interesting to see how models trained on illustrations or real world images/video can produce the composite, mixed reality effect.

Extended now to a test on Hunyuan Video - please check the versions as the Hunyuan LoRA will not work with SDXL models like Illustrious/Noobai.

Metadata is included in all uploaded files, you can drag the Hunyuan generated videos into ComfyUI to use the workflow which is also described in this article: https://civitai.com/models/1092466/hunyuan-2step-t2v-and-upscale

Recommended prompt structure:

Positive prompt (trigger at the end of prompt, before quality tags for non-hunyaun versions):

{{tags}}
real world location, photo background,
masterpiece, best quality, very awa, absurdres

Negative prompt:

(worst quality, low quality, sketch:1.1), error, bad anatomy, bad hands, watermark, ugly, distorted, censored, lowres

Trained with https://github.com/tdrussell/diffusion-pipe

Training data consists of:

37 images as a combination of
- Images used from other versions this model card
- Images extracted as keyframes from several videos
23 video clips ~70 frames each
- 70 frames was too long for the 368 resolution for videos (exceeded 24gb vram)

Training configs:

dataset.toml

# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]]
# Set this to where your dataset is
path = '/mnt/d/huanvideo/training_data/images'
# Reduce as necessary
num_repeats = 5

[[directory]] # IMAGES
# Path to the directory containing images and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/images'
num_repeats = 5
resolutions = [1024]
frame_buckets = [1] # Use 1 frame for images.


[[directory]] # VIDEOS
# Path to the directory containing videos and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/videos'
num_repeats = 5
resolutions = [368] 
frame_buckets = [33, 49, 81] # Define frame buckets for videos.

config.toml

# Dataset config file.
output_dir = '/mnt/d/huanvideo/training_output'
dataset = 'dataset.toml'

# Training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100

# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 15
checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'

[model]
type = 'hunyuan-video'

transformer_path = '/mnt/d/huanvideo/models/diffusion_models/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors'
vae_path = '/mnt/d/huanvideo/models/vae/hunyuan_video_vae_bf16.safetensors'
llm_path = '/mnt/d/huanvideo/models/llm'
clip_path = '/mnt/d/huanvideo/models/clip'

dtype = 'bfloat16'
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'

[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'

[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.02
eps = 1e-8

Photo Background - 2d Compositing｜写真背景・二次元合成

Photo Background - 2d Compositing｜写真背景・二次元合成

Discussion

Gallery