Avinash Kumar

Avinash Kumar

Computer Vision & Generative AI Specialist

About Me

Computer Vision researcher and AI developer with deep expertise in Generative AI, image synthesis, and automated image understanding. Specialized in designing, training, and deploying deep learning architectures for tasks such as image generation, segmentation, object detection, and image-to-image translation. Proficient in advanced frameworks including PyTorch and TensorFlow, I have hands-on experience with Neural Networks, GANs, diffusion models, and vision transformers. My research integrates technical innovation with real-world applications.

Education

M.S. Computer Science and Engineering
Visit University

Soongsil University (2022–2024)
GPA: 4.34/4.50

Thesis: Korean Font Generation using Position-based Components (YOLOv8, GANs)

B.E. Software Engineering
Visit University

MUET SZAB Campus Khairpur Mir's (2016–2021)
GPA: 3.86/4.00

Thesis: FIS Hostel (Food Internet Security)

Technical Skills

Programming & Development
  • Python (Proficient, production-ready)
  • JavaScript, Java, C++, C (Knowledgeable)
  • HTML, CSS – Frontend web development
Frameworks & Libraries
  • PyTorch, TensorFlow, Keras – Deep Learning
  • OpenCV, Scikit-learn, NumPy – Image Processing
  • Pandas, SQL – Data preparation and wrangling
  • Matplotlib, Seaborn, Plotly, Tableau – Data Visualization
Databases
  • MySQL, MongoDB, Microsoft Access
Computer Vision & AI
  • Image Classification, Detection, Segmentation, Generation, Tracking
  • GANs: GAN, CGAN, StyleGAN, DCGANs, StarGANs, FUNIT, TUNIT
  • Transformers: Hugging Face, Vision Transformers
  • ML Algorithms: SVM, Naive Bayes, Decision Trees, Random Forest
  • Deep Learning: CNN, RNN, LSTM, GRU, Attention Mechanisms
  • Computer Vision: OpenCV, Image Processing, Feature Extraction
  • Object Detection: YOLO, Faster R-CNN, SSD, RetinaNet
  • Image Segmentation: U-Net, Mask R-CNN, DeepLab
  • Neural Architecture Search (NAS)
  • Model Optimization: Quantization, Pruning, Knowledge Distillation
LLMs & Generative AI
  • Prompt Engineering, LLM Fine-Tuning
  • Diffusion Models: Stable Diffusion, DALL-E, Midjourney
  • Text-to-Image Generation
  • Image-to-Image Translation
  • Neural Style Transfer
  • Autoencoders and Variational Autoencoders (VAEs)
Tools & Operating Systems
  • Linux, Windows, macOS
  • Git, Jupyter, VS Code, Google Colab
Soft Skills
  • Advanced Research & Analytical Skills
  • Technical Writing & Documentation
  • LaTeX Typesetting & Academic Writing
  • Problem Modeling and Solution Design for AI Problems
  • Strong Verbal and Written Communication
  • Mathematical and Statistical Analysis
  • Team Collaboration with Diverse/Cross-disciplinary Groups
  • Project Management & Time Management
  • Critical Thinking & Problem Solving
  • Research Paper Writing & Publication
  • Presentation & Public Speaking
  • Data Analysis & Interpretation

Experience

Research Associate

System Software Lab, Soongsil University — Seoul, South Korea

Sep 2022 – Present
Visit Lab
  • Developed generative adversarial networks (GANs) for image generation and trained YOLOv8/deep learning models for segmentation, classification, and object detection.
  • Built deep learning services, applying expertise in generative models and CV tasks like segmentation through hands-on projects.
  • Published research in journals and international conferences focused on Deep Learning and Computer Vision.
  • Collaborated with lab team members on image processing and generation techniques using OpenCV, GANs, and Diffusion Models.
Assistant Software Engineer

Cubix — Karachi, Pakistan

Feb 2022 – Aug 2022
Visit Company
  • Developed smart contracts for gaming-related tokens and NFTs using Solidity with Remix IDE and Truffle framework.
  • Integrated blockchain solutions with JavaScript, building multiple decentralized web projects.
  • Wrote test cases and scenarios using Truffle to ensure quality and security of smart contracts.
  • Worked on decentralized exchanges, liquidity pools, and full-stack blockchain application development.

Featured Projects

Generative Handwriting Font

Developed a system to generate 2,780 Korean characters using only 43 handwritten samples. Utilized YOLOv8 for efficient character detection and segmentation, and PACGAN for high-quality font style synthesis.

Tech: YOLOv8, GANs (PACGAN), PyTorch, Korean Font Generation

View Project (mywriting.kr)
FontFusionGAN: Handwriting Enhancement

Developed a GAN-based model for enhancing handwriting quality by blending styles. Published in MDPI Electronics. Implemented novel fusion techniques for style transfer.

Tech: GANs, Style Transfer, Computer Vision, Research

View Code
Real-Time Object Detection for Smart Surveillance

I contributed to and executed this open-source project to deepen my understanding of real-time object detection. The system uses YOLOv8 and OpenCV to detect and track multiple objects in live video streams, with alerting and event logging features.
Tech: YOLOv8, OpenCV, Python, Deep Learning

View Open Source (Ultralytics YOLO)
Medical Image Segmentation for Tumor Detection

I worked on this open-source project to gain hands-on experience in medical image segmentation. Using a U-Net-based model, I segmented tumors in MRI scans, supporting diagnostic workflows.
Tech: U-Net, Medical Imaging, PyTorch

View Open Source (Brain Segmentation)
Fine-Grained Image Classification with Vision Transformers

I executed this open-source project to explore fine-grained classification using Vision Transformers (ViT). The pipeline distinguishes between visually similar bird species using transfer learning and data augmentation.
Tech: Vision Transformers, Image Classification, PyTorch

View Open Source (ViT)
Semantic Segmentation for Autonomous Driving

I contributed to this open-source project to enhance my skills in semantic segmentation for autonomous driving. The DeepLabV3+ model identifies road lanes, vehicles, and pedestrians in urban scenes.
Tech: DeepLabV3+, Semantic Segmentation, TensorFlow

View Open Source (DeepLab)
Multi-Class Object Detection in Aerial Imagery

I worked on this open-source project to develop a multi-class object detection system for aerial drone imagery using Faster R-CNN. The model detects buildings, vehicles, and infrastructure in high-resolution images.
Tech: Faster R-CNN, Aerial Imagery, Deep Learning

View Open Source (Faster R-CNN)
WAGMI Game

Blockchain-powered gaming platform developed at Cubix. Integrated smart contracts using Solidity to enable secure and transparent in-game transactions. Developed backend APIs with Node.js.

Tech: Blockchain, Smart Contracts, Solidity, Node.js

Details
Web Automation & Visualization

Developed Python tools for automated web scraping, data processing, and journal formatting. Created interactive data visualizations using D3.js, Plotly, and Matplotlib.

Tech: Python, Web Scraping, Data Visualization

View Code
FIS Hostel Finder

Android app to help students locate hostels in other cities. Developed in Java and XML with funding support from Ignite Pakistan.

Tech: Android App, Java, XML

Details

Publications

Positional Component-Guided Hangul Font Image Generation via Deep Semantic Segmentation and Adversarial Style Transfer

Avinash Kumar, Irfanullah Memon, Abdul Sami, Youngwon Jo, Jaeyoung Choi

Electronics, 14(13), 2699, 2025

Read Paper
Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation

Abdul Sami, Avinash Kumar, Youngwon Jo, Irfanullah Memon, Muhammad Rizwan, Jaeyoung Choi

ICOIN 2025, Chiang Mai, Thailand, 2025

Read Paper
Korean Handwriting Font Generation Service using Image Generation Model

Youngwon Jo, Avinash Kumar, Uijong Yang, Daeun Kim, Jaeyoung Choi

Annual Conference on Human and Language Technology, 2024, p 50-55

Read Paper
CKFONT3: Component-Based Korean Font Generation Using Positional Aware Component Decomposition

Avinash Kumar, Irfanullah Memon, Abdul Sami, Youngwon Jo, Jaeyoung Choi

SSRN, 2024

View Abstract
FontFusionGAN: Refinement of Handwritten Fonts by Font Fusion

Avinash Kumar, Kyeolhee Kang, Ammar ul Hassan, Jaeyoung Choi

MDPI Electronics, 2023

Read Paper
Deep Adaptive Feature Selection in Deep Recommender Systems

Hyston Kayange, Avinash Kumar, Yejung Lee, Hoonseo Jung, Jongsun Choi

Journal of the Korean Information Science Society, 2023

Read Paper
A Study on the Refining Handwritten Font by Mixing Font Styles

Avinash Kumar, Kyeolhee Kang, Ammar ul Hassan, Jaeyoung Choi

MITA 2023, Technical University of Ostrava, Czech Republic, 2023

Read Paper

Patents

Font verification methods that can verify style consistency and shape accuracy and computing devices to perform them

2025.04.11  |  Patent Application (10-2025-0047215, submitted)

Authors: Jayoung Choi, Irfanuulah Memon, Avinash Kumar

A font image generation method that can improve the quality of handwriting and a computing device to perform it

2025.04.08  |  Patent Application (10-2025-0045652, submitted)

Authors: Jayoung Choi, Avinash Kumar

Korean font image generating device and method Based on Korean Components

2025.03.18  |  Patent Application (10-2025-0034957, submitted)

Authors: Jayoung Choi, Irfanuulah Memon, Avinash Kumar, Youngwon Jo

Korean font image generation device and method using the position of Korean components

2025.03.17  |  Patent Application (10-2025-0034154, submitted)

Authors: Jayoung Choi, Avinash Kumar, Youngwon Jo

Honors & Awards

Best Research Paper Award

MITA Conference 2023

Recognized for innovative work on font refinement using GANs.

Fully Funded Master's Scholarship

Soongsil University (2022-2024)

Awarded Professor's Scholarship for Master's Degree.

Sakura Exchange Program

University of Tokyo, Japan (2018)

Selected as top student for participation in a cybersecurity hackathon.

HEC Scholarship

Mehran University of Engineering and Technology (2016-2021)

Higher Education Commission Scholarship for Bachelor's Degree.

Licenses & Certifications

Build Better Generative Adversarial Networks (GANs)

Coursera • Issued Dec 2022

Credential ID: UNR55BUM63YP

Show Credential
Build Basic Generative Adversarial Networks (GANs)

Coursera • Issued Nov 2022

Credential ID: JTFMNC28NUSP

Show Credential
Python (Basic)

HackerRank • Issued Jun 2020

Credential ID: DE3D247D852E

Show Credential
Python Data Structures

Coursera • Issued May 2020

Credential ID: 69LSN2ZGQ6NA

Show Credential
Programming for Everybody (Getting Started with Python)

Coursera • Issued May 2020

Credential ID: YKUGZD7B53NC

Show Credential
AI For Everyone

Coursera • Issued Apr 2020

Credential ID: NXUEEHK5NEKM

Show Credential
Problem Solving

HackerRank • Issued May 2025

Credential ID: 17DF277392C2

Show Credential
Python

HackerRank • Issued Feb 2022

Credential ID: C1BCB97E3898

Show Credential

Research Focus

Generative AI

GANs, Diffusion Models, Neural Style Transfer, Text-to-Image Synthesis, Image Generation, Font Generation.

Computer Vision

Image-to-Image Translation, Document Analysis, Object Detection, Image Classification, Semantic Segmentation.

Multimodal AI

Text-Conditioned Image Generation, Cross-modal Retrieval, Vision-Language Models.

I am particularly interested in developing novel deep learning techniques for solving challenging problems in Generative AI and Computer Vision. My current research focuses on improving the quality and diversity of generated images, enhancing the performance of generation models, and exploring the intersection of vision and language.

Get in Touch