308x Filetype PDF File size 0.73 MB Source: www.nagarjunauniversity.ac.in
B.Sc DATA SCIENCE SUBJECTS
MEMBERS OF THE BOARD SIGNATURES
External Member: Prof Ch. Haritha
HOD, Dept of CSE
JNTUK Kakinada
1. Dr.M.KamalaKumari - Chairman
Dept of CSE, AKNU, RJY
2. Dr.P.Venkateswara Rao – Member
Dept of CSE, AKNU, RJY
3. Mr.M. Simhadri – Member
Lecturer, Aditya Degree College, Kakinada
4. Mr.B N S Gupta – Member
Lecturer, SVKP & Dr. K.S Raju Arts & Science College Penugonda
PAPER 1: INTRODUCTION TO DATA SCIENCE AND R PROGRAMMING
Objective
Data Science is a fast-growing interdisciplinary field, focusing on the analysis of data to extract
knowledge and insight. This course will introduce students to the collection. Preparation, analysis,
modelling and visualization of data, covering both conceptual and practical issues. Examples and
case studies from diverse fields will be presented, and hands-on use of statistical and data
manipulation software will be included.
Outcomes
i. Recognize the various discipline that contribute to a successful data science effort.
ii. Understand the processes of data science identifying the problem to be solved, data collection,
preparation, modelling, evaluation and visualization.
iii. Be aware of the challenges that arise in data sciences.
iv. Be able to identify the application of the type of algorithm based on the type of the problem.
v. Be comfortable using commercial and open source tools such as the R/python language and
its associated libraries for data analytics and visualization.
Unit-I
Defining Data Science and Big data, Benefits and Uses, facets of Data, Data Science Process.
History and Overview of R, Getting Started with R, R Nuts and Bolts
Unit-II
The Data Science Process: Overview of the Data Science Process-Setting the research goal,
Retrieving Data, Data Preparation, Exploration, Modeling, data Presentation and Automation.
Getting Data in and out of R, Using readr package, Interfaces to the outside world.
Unit-III
Machine Learning: Understanding why data scientists use machine learning-What is machine
learning and why we should care about, Applications of machine learning in data science, Where it is
used in data science, The modeling process, Types of Machine Learning-Supervised and
Unsupervised.
Unit-IV
Handling large Data on a Single Computer: The problems we face when handling large data, General
Techniques for handling large volumes of data, Generating programming tips for dealing with large
datasets. Case study- Predicting malicious URLs(This can be implemented in R)
Unit-V
Subsetting R objects, Vectorised Operations, Managing Data Frames with the dplyr, Control
structures, functions, Scoping rules of R, Coding Standards in R, Loop Functions, Debugging,
Simulation
References
1. DavyCielen, Arno.D.B.Maysman, Mohamed Ali, “Introducing Data Science” Manning
Publications, 2016.
2. Roger D. Peng, “R Programming for DataScience” Lean Publishing, 2015.
3. Nina Zumel, John Mount, “Practical Data Science with R”, Manning Publications, 2014.
4. Mark Gardener, “Beginning R - The Statistical Programming Language”, John Wiley &
Sons, Inc., 2012.
5. W. N. Venables, D. M. Smith and the R Core Team, “An Introduction to R”, 2013.
6.Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, AbhijitDasgupta, “Practical Data
Science Cookbook”, Packt Publishing Ltd., 2014.
Student Activity
Students should be able to create a database and read and write from it. Transfer data to and from csv
and different types of files.
Should clean data and make it consistent for any sort of analysis in R
Perform statistical analysis on variety of data
Perform appropriate statistical tests using R and visualize the outcome
Continuous assessment:
Let the students be tested in the following questions from each unit
1. Define Data Science. Discuss any application as an example
2. What are the main components of R and explain basic R commands
3. Explain the phases in Data Science Process
4. What is machine learning. What are the differences between machine learning, artificial
intelligence and data science
5. What are the general techniques to handle large volumes of data
6. Develop any data visualisation ion application by creating data frames and applying operations on
it and using relevant packages
BASICS OF R LAB
1) Installing R and R studio
2) Basic operations in r
3) Getting data into R, Basic data manipulation, Loading Data into R
4) Basic plotting
5) Loops and functions
6) Create Vectors, Lists, Arrays, Matrices, Data frames and operations on them.
7) Demonstrate the visualization and graphics using visualization packages.
8) Implement Loop functions with lappy(), sapply(), tapply(), apply(), mapply().
9) Explore data using Single Variables: Unimodal, Bimodal, Histograms, Density Plots, Bar charts
10) Explore data using two Variables: Line plots, Scatter Plots, smoothing cures, Bar charts
11) Explore and implement commands usinfdplyr package
12) Generate random numbers and set seed
PAPER 1: INTRODUCTION TO DATA SCIENCE AND R PROGRAMMING
MODEL QUESTION PAPER
Part - A
Answer Any FIVE Questions 5*5=25M
1. What is data science and its benefits?
2. Explain role and stages in data science?
3. What are the goals of data science?
4. How to retering the data in data science?
5. Explain supervised and unsupervised machine Learning?
6. Why we need the machine Learning in data science?
7. What is cluster Analysis?
8. Explain case studies in R Language?
9. How to declare functions in R Language?
10. Explain vectorized operations in R Language?
Part - B
Answer Any FIVE Questions 5*10=50M
11. How to Install the R-studio?
12. What are input and output in R-Language?
13. Explain different stages of data Science?
14. How to getting the data in and out of R-Language?
15. What is machine learning? What is its role in data Science?
16. What are the applications of machine Learning in data science?
17. Explain general techniques for handling volumes of data?
18. What are the problems face when handling large data?
19. What are the data frames? Write its significance in R-Language?
20. Explain R Objects?
no reviews yet
Please Login to review.