Avinash Kumar | Computer Vision & Generative AI Researcher

About Me

Computer Vision researcher and AI developer with deep expertise in Generative AI, image synthesis, and automated image understanding. Specialized in designing, training, and deploying deep learning architectures for tasks such as image generation, segmentation, object detection, and image-to-image translation. Proficient in advanced frameworks including PyTorch and TensorFlow, I have hands-on experience with Neural Networks, GANs, diffusion models, and vision transformers. My research integrates technical innovation with real-world applications.

Education

M.S. Computer Science and Engineering

Visit University

Soongsil University (2022–2024)
GPA: 4.34/4.50

Thesis: Korean Font Generation using Position-based Components (YOLOv8, GANs)

B.E. Software Engineering

Visit University

MUET SZAB Campus Khairpur Mir's (2016–2021)
GPA: 3.86/4.00

Thesis: FIS Hostel (Food Internet Security)

Technical Skills

Programming & Development

Python (Proficient, production-ready)
JavaScript, Java, C++, C (Knowledgeable)
HTML, CSS – Frontend web development

Frameworks & Libraries

PyTorch, TensorFlow, Keras – Deep Learning
OpenCV, Scikit-learn, NumPy – Image Processing
Pandas, SQL – Data preparation and wrangling
Matplotlib, Seaborn, Plotly, Tableau – Data Visualization

Databases

MySQL, MongoDB, Microsoft Access

Computer Vision & AI

Image Classification, Detection, Segmentation, Generation, Tracking
GANs: GAN, CGAN, StyleGAN, DCGANs, StarGANs, FUNIT, TUNIT
Transformers: Hugging Face, Vision Transformers
ML Algorithms: SVM, Naive Bayes, Decision Trees, Random Forest
Deep Learning: CNN, RNN, LSTM, GRU, Attention Mechanisms
Computer Vision: OpenCV, Image Processing, Feature Extraction
Object Detection: YOLO, Faster R-CNN, SSD, RetinaNet
Image Segmentation: U-Net, Mask R-CNN, DeepLab
Neural Architecture Search (NAS)
Model Optimization: Quantization, Pruning, Knowledge Distillation

LLMs & Generative AI

Prompt Engineering, LLM Fine-Tuning
Diffusion Models: Stable Diffusion, DALL-E, Midjourney
Text-to-Image Generation
Image-to-Image Translation
Neural Style Transfer
Autoencoders and Variational Autoencoders (VAEs)

Tools & Operating Systems

Linux, Windows, macOS
Git, Jupyter, VS Code, Google Colab

Soft Skills

Advanced Research & Analytical Skills
Technical Writing & Documentation
LaTeX Typesetting & Academic Writing
Problem Modeling and Solution Design for AI Problems
Strong Verbal and Written Communication
Mathematical and Statistical Analysis
Team Collaboration with Diverse/Cross-disciplinary Groups
Project Management & Time Management
Critical Thinking & Problem Solving
Research Paper Writing & Publication
Presentation & Public Speaking
Data Analysis & Interpretation

Experience

Research Associate

System Software Lab, Soongsil University — Seoul, South Korea

Sep 2022 – Present

Visit Lab

Developed generative adversarial networks (GANs) for image generation and trained YOLOv8/deep learning models for segmentation, classification, and object detection.
Built deep learning services, applying expertise in generative models and CV tasks like segmentation through hands-on projects.
Published research in journals and international conferences focused on Deep Learning and Computer Vision.
Collaborated with lab team members on image processing and generation techniques using OpenCV, GANs, and Diffusion Models.

Assistant Software Engineer

Cubix — Karachi, Pakistan

Feb 2022 – Aug 2022

Visit Company

Developed smart contracts for gaming-related tokens and NFTs using Solidity with Remix IDE and Truffle framework.
Integrated blockchain solutions with JavaScript, building multiple decentralized web projects.
Wrote test cases and scenarios using Truffle to ensure quality and security of smart contracts.
Worked on decentralized exchanges, liquidity pools, and full-stack blockchain application development.

Featured Projects

Generative Handwriting Font

Developed a system to generate 2,780 Korean characters using only 43 handwritten samples. Utilized YOLOv8 for efficient character detection and segmentation, and PACGAN for high-quality font style synthesis.

Tech: YOLOv8, GANs (PACGAN), PyTorch, Korean Font Generation

View Project (mywriting.kr)

FontFusionGAN: Handwriting Enhancement

Developed a GAN-based model for enhancing handwriting quality by blending styles. Published in MDPI Electronics. Implemented novel fusion techniques for style transfer.

Tech: GANs, Style Transfer, Computer Vision, Research

View Code

Real-Time Object Detection for Smart Surveillance

I contributed to and executed this open-source project to deepen my understanding of real-time object detection. The system uses YOLOv8 and OpenCV to detect and track multiple objects in live video streams, with alerting and event logging features.
Tech: YOLOv8, OpenCV, Python, Deep Learning

View Open Source (Ultralytics YOLO)

Medical Image Segmentation for Tumor Detection

I worked on this open-source project to gain hands-on experience in medical image segmentation. Using a U-Net-based model, I segmented tumors in MRI scans, supporting diagnostic workflows.
Tech: U-Net, Medical Imaging, PyTorch

View Open Source (Brain Segmentation)

Fine-Grained Image Classification with Vision Transformers

I executed this open-source project to explore fine-grained classification using Vision Transformers (ViT). The pipeline distinguishes between visually similar bird species using transfer learning and data augmentation.
Tech: Vision Transformers, Image Classification, PyTorch

View Open Source (ViT)

Semantic Segmentation for Autonomous Driving

I contributed to this open-source project to enhance my skills in semantic segmentation for autonomous driving. The DeepLabV3+ model identifies road lanes, vehicles, and pedestrians in urban scenes.
Tech: DeepLabV3+, Semantic Segmentation, TensorFlow

View Open Source (DeepLab)

Multi-Class Object Detection in Aerial Imagery

I worked on this open-source project to develop a multi-class object detection system for aerial drone imagery using Faster R-CNN. The model detects buildings, vehicles, and infrastructure in high-resolution images.
Tech: Faster R-CNN, Aerial Imagery, Deep Learning

View Open Source (Faster R-CNN)

WAGMI Game

Blockchain-powered gaming platform developed at Cubix. Integrated smart contracts using Solidity to enable secure and transparent in-game transactions. Developed backend APIs with Node.js.

Tech: Blockchain, Smart Contracts, Solidity, Node.js

Details

Web Automation & Visualization

Developed Python tools for automated web scraping, data processing, and journal formatting. Created interactive data visualizations using D3.js, Plotly, and Matplotlib.

Tech: Python, Web Scraping, Data Visualization

View Code

FIS Hostel Finder

Android app to help students locate hostels in other cities. Developed in Java and XML with funding support from Ignite Pakistan.

Tech: Android App, Java, XML

Details

Publications

Positional Component-Guided Hangul Font Image Generation via Deep Semantic Segmentation and Adversarial Style Transfer

Avinash Kumar, Irfanullah Memon, Abdul Sami, Youngwon Jo, Jaeyoung Choi

Electronics, 14(13), 2699, 2025

Read Paper

Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation

Abdul Sami, Avinash Kumar, Youngwon Jo, Irfanullah Memon, Muhammad Rizwan, Jaeyoung Choi

ICOIN 2025, Chiang Mai, Thailand, 2025

Read Paper

Korean Handwriting Font Generation Service using Image Generation Model

Youngwon Jo, Avinash Kumar, Uijong Yang, Daeun Kim, Jaeyoung Choi

Annual Conference on Human and Language Technology, 2024, p 50-55

Read Paper

CKFONT3: Component-Based Korean Font Generation Using Positional Aware Component Decomposition

Avinash Kumar, Irfanullah Memon, Abdul Sami, Youngwon Jo, Jaeyoung Choi

SSRN, 2024

View Abstract

FontFusionGAN: Refinement of Handwritten Fonts by Font Fusion

Avinash Kumar, Kyeolhee Kang, Ammar ul Hassan, Jaeyoung Choi

MDPI Electronics, 2023

Read Paper

Deep Adaptive Feature Selection in Deep Recommender Systems

Hyston Kayange, Avinash Kumar, Yejung Lee, Hoonseo Jung, Jongsun Choi

Journal of the Korean Information Science Society, 2023

Read Paper

A Study on the Refining Handwritten Font by Mixing Font Styles

Avinash Kumar, Kyeolhee Kang, Ammar ul Hassan, Jaeyoung Choi

MITA 2023, Technical University of Ostrava, Czech Republic, 2023

Read Paper

Patents

Font verification methods that can verify style consistency and shape accuracy and computing devices to perform them

2025.04.11 | Patent Application (10-2025-0047215, submitted)

Authors: Jayoung Choi, Irfanuulah Memon, Avinash Kumar

A font image generation method that can improve the quality of handwriting and a computing device to perform it

2025.04.08 | Patent Application (10-2025-0045652, submitted)

Authors: Jayoung Choi, Avinash Kumar

Korean font image generating device and method Based on Korean Components

2025.03.18 | Patent Application (10-2025-0034957, submitted)

Authors: Jayoung Choi, Irfanuulah Memon, Avinash Kumar, Youngwon Jo

Korean font image generation device and method using the position of Korean components

2025.03.17 | Patent Application (10-2025-0034154, submitted)

Authors: Jayoung Choi, Avinash Kumar, Youngwon Jo

Honors & Awards

Best Research Paper Award

MITA Conference 2023

Recognized for innovative work on font refinement using GANs.

Fully Funded Master's Scholarship

Soongsil University (2022-2024)

Awarded Professor's Scholarship for Master's Degree.

Sakura Exchange Program

University of Tokyo, Japan (2018)

Selected as top student for participation in a cybersecurity hackathon.

HEC Scholarship

Mehran University of Engineering and Technology (2016-2021)

Higher Education Commission Scholarship for Bachelor's Degree.

Licenses & Certifications

Build Better Generative Adversarial Networks (GANs)

Coursera • Issued Dec 2022

Credential ID: UNR55BUM63YP

Show Credential

Build Basic Generative Adversarial Networks (GANs)

Coursera • Issued Nov 2022

Credential ID: JTFMNC28NUSP

Show Credential

Python (Basic)

HackerRank • Issued Jun 2020

Credential ID: DE3D247D852E

Show Credential

Python Data Structures

Coursera • Issued May 2020

Credential ID: 69LSN2ZGQ6NA

Show Credential

Programming for Everybody (Getting Started with Python)

Coursera • Issued May 2020

Credential ID: YKUGZD7B53NC

Show Credential

AI For Everyone

Coursera • Issued Apr 2020

Credential ID: NXUEEHK5NEKM

Show Credential

Problem Solving

HackerRank • Issued May 2025

Credential ID: 17DF277392C2

Show Credential

Python

HackerRank • Issued Feb 2022

Credential ID: C1BCB97E3898

Show Credential

Research Focus

Generative AI

GANs, Diffusion Models, Neural Style Transfer, Text-to-Image Synthesis, Image Generation, Font Generation.

Computer Vision

Image-to-Image Translation, Document Analysis, Object Detection, Image Classification, Semantic Segmentation.

Multimodal AI

Text-Conditioned Image Generation, Cross-modal Retrieval, Vision-Language Models.

I am particularly interested in developing novel deep learning techniques for solving challenging problems in Generative AI and Computer Vision. My current research focuses on improving the quality and diversity of generated images, enhancing the performance of generation models, and exploring the intersection of vision and language.

About Me

Education

M.S. Computer Science and Engineering

B.E. Software Engineering

Technical Skills

Programming & Development

Frameworks & Libraries

Databases

Computer Vision & AI

LLMs & Generative AI

Tools & Operating Systems

Soft Skills

Experience

Research Associate

Assistant Software Engineer

Featured Projects

Generative Handwriting Font

FontFusionGAN: Handwriting Enhancement

Real-Time Object Detection for Smart Surveillance

Medical Image Segmentation for Tumor Detection

Fine-Grained Image Classification with Vision Transformers

Semantic Segmentation for Autonomous Driving

Multi-Class Object Detection in Aerial Imagery

WAGMI Game

Web Automation & Visualization

FIS Hostel Finder

Publications

Positional Component-Guided Hangul Font Image Generation via Deep Semantic Segmentation and Adversarial Style Transfer

Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation

Korean Handwriting Font Generation Service using Image Generation Model

CKFONT3: Component-Based Korean Font Generation Using Positional Aware Component Decomposition

FontFusionGAN: Refinement of Handwritten Fonts by Font Fusion

Deep Adaptive Feature Selection in Deep Recommender Systems

A Study on the Refining Handwritten Font by Mixing Font Styles

Patents

Font verification methods that can verify style consistency and shape accuracy and computing devices to perform them

A font image generation method that can improve the quality of handwriting and a computing device to perform it

Korean font image generating device and method Based on Korean Components

Korean font image generation device and method using the position of Korean components

Honors & Awards

Best Research Paper Award

Fully Funded Master's Scholarship

Sakura Exchange Program

HEC Scholarship

Licenses & Certifications

Build Better Generative Adversarial Networks (GANs)

Build Basic Generative Adversarial Networks (GANs)

Python (Basic)

Python Data Structures

Programming for Everybody (Getting Started with Python)

AI For Everyone

Problem Solving

Python

Research Focus

Generative AI

Computer Vision

Multimodal AI

Get in Touch